[PowerRename] Fix Unicode characters and non-breaking spaces not being correctly normalized before matching #43972
+109
−15
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary of the Pull Request
Fixes PowerRename failing to normalise different Unicode forms before matching. This results in filenames containing visually identical characters to the search term from failing to match because their underlying binary representations differ.
This affects renaming files created on macOS which names files in NFD (decomposed form) rather than Windows' NFC (precomposed form).
Additionally, this fixes matching to filenames containing non-breaking space characters, which can be created by automated systems and web downloaders. Previously, the NBSP character would fail to match a normal space.
PR Checklist
Detailed Description of the Pull Request / Additional comments
The underlying issue is a binary mismatch between:
U+0439-й.U+0438U+0306-и+̆.U+0020) versus non-breaking spaces (U+00A0).Updates to PowerRenameRegex.cpp
I added a
SanitizeAndNormalizefunction which replaces all non-breaking spaces with standard spaces and normalises the string to Normalization Form C using Win32'sNormalizeString.PutSearchTermandPutReplaceTermnow normalise input immediately before performing any other processing.Replacenow normalises thesourcefilename before processing.I updated the RegEx path to ensure it runs against the normalised
sourceToUsestring instead of the rawsourcestring; otherwise regex matches would fail.Validation Steps Performed
Manually tested the use case detailed in #43971 with the following filenames:
Testй NFC.txtTestй NFD.txtResult:

There are two new unit tests which exercise both the non-breaking space and Unicode form normalisation issues. These run on both the Boost- and non-Boost test paths, adding four tests to the total. All new tests fail as expected on the prior code and all PowerRename tests pass successfully with the changes in this PR: