[WIP] Fix [RegExp|String].prototype methods and Regex matcher#6615
[WIP] Fix [RegExp|String].prototype methods and Regex matcher#6615MadProbe wants to merge 14 commits intochakra-core:masterfrom
Conversation
|
I'm not keen on killing all the fast paths - I suggested you'd need to check them but hopefully we can update conditions so they're only used when they won't cause a problem rather than just deleting them. |
…into regex-fixes
lib/Runtime/Library/RegexHelper.cpp
Outdated
| if (string->GetLength() > (0 > index || (uint64_t)index + (uint64_t)1 > UINT32_MAX ? UINT32_MAX : (uint32_t)(index + 1)) && | ||
| NumberUtilities::IsSurrogateLowerPart(string->GetString()[index]) && | ||
| NumberUtilities::IsSurrogateUpperPart(string->GetString()[index + 1])) | ||
| NumberUtilities::IsSurrogateUpperPart(string->GetString()[index + 1]) && isUnicode) |
There was a problem hiding this comment.
nit: ideally should do the isUnicode check first - it's much quicker, than the IsSurrogateLowerPart and IsSuurrogateUpperPart checks; if we do it first can skip the slower bits when it's false.
| Field(codepoint_t*) m_codePointString; // Code points of the string, may be nullptr if not initialized yet by GetCodePoints call | ||
| Field(charcount_t) m_codePointsLength; // Length of the codepoints, may be k_InvalidCharCount if not initialized yet by GetCodePoints call |
There was a problem hiding this comment.
This will add two pointers to every single JS string - which isn't ideal - could we instead subclass JavascriptString with a new type like JavascriptStringWithCodePoints or something?
(Note I think this change is causing the compile failures as there is a static assert based on the size of a different subclass of JavascriptString)
There was a problem hiding this comment.
So I will need to create a new instance of JavascriptStringWithCodePoints on each SimpleMatch call if pattern shows us that it's unicode one.
Honestly I cannot imagine how i will cache the codepoint strings.
There was a problem hiding this comment.
You're right this is not an ideal solution - ugh - I didn't think this through did I. I think cache'ing is a no go.
Either a) we massively increase the size of all our strings OR b) we come up with a way to convert the type whenever we need to cache - which would not be simple as there could be many existing pointers to the string.
This PR will fix spec deviations in:
Also: