[PowerRename] Fix Unicode characters and non-breaking spaces not being correctly normalized before matching #43972

daverayment · 2025-11-29T19:48:52Z

Summary of the Pull Request

Fixes PowerRename failing to normalise different Unicode forms before matching. This results in filenames containing visually identical characters to the search term from failing to match because their underlying binary representations differ.

This affects renaming files created on macOS which names files in NFD (decomposed form) rather than Windows' NFC (precomposed form).

Additionally, this fixes matching to filenames containing non-breaking space characters, which can be created by automated systems and web downloaders. Previously, the NBSP character would fail to match a normal space.

PR Checklist

Detailed Description of the Pull Request / Additional comments

The underlying issue is a binary mismatch between:

Precomposed characters (NFC) typed by Windows users, e.g. U+0439 - й.
Decomposed characters (NFD) found in filenames from other platforms (or copied from text), e.g. U+0438 U+0306 - и + ̆ .
Standard spaces (U+0020) versus non-breaking spaces (U+00A0).

Updates to PowerRenameRegex.cpp

I added a SanitizeAndNormalize function which replaces all non-breaking spaces with standard spaces and normalises the string to Normalization Form C using Win32's NormalizeString.

PutSearchTerm and PutReplaceTerm now normalise input immediately before performing any other processing.

Replace now normalises the source filename before processing.

I updated the RegEx path to ensure it runs against the normalised sourceToUse string instead of the raw source string; otherwise regex matches would fail.

Validation Steps Performed

Manually tested the use case detailed in #43971 with the following filenames:

Testй NFC.txt
Testй NFD.txt

Result:

There are two new unit tests which exercise both the non-breaking space and Unicode form normalisation issues. These run on both the Boost- and non-Boost test paths, adding four tests to the total. All new tests fail as expected on the prior code and all PowerRename tests pass successfully with the changes in this PR:

…rmalized before matching.

src/modules/powerrename/lib/PowerRenameRegEx.cpp


+/// <summary>
+/// Sanitizes the input string by replacing non-breaking spaces with regular spaces and
+/// normalizes it to Unicode NFC (precomposed) form.


src/modules/powerrename/lib/PowerRenameRegEx.cpp

+    // Replace non-breaking spaces (0xA0) with regular spaces (0x20).
+    std::replace(sanitized.begin(), sanitized.end(), L'\u00A0', L' ');
+
+    // Normalize to NFC (Precomposed).


github-actions · 2025-11-29T19:53:02Z

@check-spelling-bot Report

🔴 Please review

See the 📂 files view, the 📜action log, or 📝 job summary for details.

Unrecognized words (3)

icf
IRDP
precomposed

These words are not needed and should be removed

irdp

To accept these unrecognized words as correct and remove the previously acknowledged and now absent words, you could run the following commands

... in a clone of the [email protected]:daverayment/PowerToys.git repository
on the powerrename-normalizestrings branch (ℹ️ how do I use this?):

curl -s -S -L 'https://raw.githubusercontent.com/check-spelling/check-spelling/c635c2f3f714eec2fcf27b643a1919b9a811ef2e/apply.pl' |
perl - 'https://github.com/microsoft/PowerToys/actions/runs/19788524302/attempts/1' &&
git commit -m 'Update check-spelling metadata'

Errors ❌ (1)

See the 📂 files view, the 📜action log, or 📝 job summary for details.

❌ Errors	Count
❌ check-file-path	1

See ❌ Event descriptions for more information.

If the flagged items are 🤯 false positives

If items relate to a ...

binary file (or some other file you wouldn't want to check at all).

Please add a file path to the excludes.txt file matching the containing file.

File paths are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your files.

^ refers to the file's path from the root of the repository, so ^README\.md$ would exclude README.md (on whichever branch you're using).
well-formed pattern.

If you can write a pattern that would match it,
try adding it to the patterns.txt file.

Patterns are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your lines.

Note that patterns can't match multiline strings.

daverayment added 2 commits November 29, 2025 19:00

Fix Unicode characters and non-breaking spaces not being correctly no…

17555ff

…rmalized before matching.

Minor comment clarification.

7c6c525

daverayment added the Product-PowerRename Refers to the PowerRename PowerToy label Nov 29, 2025

github-advanced-security bot found potential problems Nov 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PowerRename] Fix Unicode characters and non-breaking spaces not being correctly normalized before matching #43972

[PowerRename] Fix Unicode characters and non-breaking spaces not being correctly normalized before matching #43972

daverayment commented Nov 29, 2025

Uh oh!

Check failure

Check failure

github-actions bot commented Nov 29, 2025

See the 📂 files view, the 📜action log, or 📝 job summary for details.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[PowerRename] Fix Unicode characters and non-breaking spaces not being correctly normalized before matching #43972

Are you sure you want to change the base?

[PowerRename] Fix Unicode characters and non-breaking spaces not being correctly normalized before matching #43972

Conversation

daverayment commented Nov 29, 2025

Summary of the Pull Request

PR Checklist

Detailed Description of the Pull Request / Additional comments

Updates to PowerRenameRegex.cpp

Validation Steps Performed

Uh oh!

Check failure

Check failure

github-actions bot commented Nov 29, 2025

@check-spelling-bot Report

🔴 Please review

See the 📂 files view, the 📜action log, or 📝 job summary for details.

Unrecognized words (3)

See the 📂 files view, the 📜action log, or 📝 job summary for details.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant