Skip to content

Fix text mapping for special characters#11390

Open
etvorun wants to merge 2 commits intodotnet:mainfrom
etvorun:fix_text_layout
Open

Fix text mapping for special characters#11390
etvorun wants to merge 2 commits intodotnet:mainfrom
etvorun:fix_text_layout

Conversation

@etvorun
Copy link

@etvorun etvorun commented Jan 26, 2026

Fixes #11386

Problem

PR #6857 introduced a script comparison check to prevent combining marks from different scripts from staying with their base character during font fallback. While this fixed a legitimate issue, it inadvertently broke emoji sequences because:

  1. Emoji keycap sequences like "1️⃣" consist of:

    • A digit (e.g., '1') - Script: Common/Digit
    • Variation Selector 16 (U+FE0F) - Script: Inherited
    • Combining Enclosing Keycap (U+20E3) - Script: Common/Symbol
  2. The script check saw these as different scripts and broke the combining relationship, causing the text itemizer to split the sequence incorrectly, which led to a crash in Line Services.

Solution

Introduced the concept of script-agnostic combining marks - characters that are designed to modify any base character regardless of script. These include:

  • Zero Width Joiner (ZWJ) - U+200D
  • Variation Selectors - VS1-VS16 (U+FE00-U+FE0F) and IVS (U+E0100-U+E01EF)
  • Combining Diacritical Marks Extended - U+1AB0-U+1AFF
  • Combining Diacritical Marks Supplement - U+1DC0-U+1DFF
  • Combining Diacritical Marks for Symbols - U+20D0-U+20FF (includes U+20E3 keycap)
  • Combining Half Marks - U+FE20-U+FE2F
  • Emoji Modifiers (Skin tones) - U+1F3FB-U+1F3FF

The fix ensures these script-agnostic marks always stay with their base character, while the original PR #6857 script check still applies to regular combining marks.

Changes

Native Code (DirectWriteForwarder)

File Change
IClassification.h Added isExtended out parameter to GetCharAttribute and IsSameScript method
TextAnalyzer.cpp Updated to use isExtended parameter and skip script check for script-agnostic marks

Managed Code (PresentationCore)

File Change
Classification.cs Added IsScriptAgnosticCombining() method, updated GetCharAttribute() with isExtended parameter, added IsSameScript() to ClassificationUtility
PhysicalFontFamily.cs Updated font mapping to use IsScriptAgnosticCombining for proper emoji sequence handling

Testing

Manual Testing

  • Verified "1️⃣" and other keycap sequences render correctly
  • Verified skin tone emoji modifiers work (e.g., "👋🏽")
  • Verified ZWJ sequences work (e.g., family emoji)
  • Verified the original issue from PR Check script of combining marks during font fallback #6857 is still fixed (combining marks from different scripts still get proper font fallback)

Test Cases

"1️⃣" - Keycap sequence (digit + VS16 + combining enclosing keycap)
"A\u0650test" - Latin with Arabic combining mark (should trigger font fallback - PR #6857 fix)
"👋🏽" - Emoji with skin tone modifier

Risk

Low - The change is additive and only affects the specific case of script-agnostic combining marks. The existing script check from PR #6857 remains in place for all other combining marks.

Related Issues/PRs

Microsoft Reviewers: Open in CodeFlow

@etvorun etvorun requested review from a team and Copilot January 26, 2026 23:25
@dotnet-policy-service dotnet-policy-service bot added the PR metadata: Label to tag PRs, to facilitate with triage label Jan 26, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a regression from the combining-mark script check introduced in #6857 that incorrectly split certain emoji/keycap sequences (e.g. 1️⃣), leading to crashes during line breaking/font fallback.

Changes:

  • Adds a “script-agnostic combining” classification and an IsSameScript helper for script comparisons.
  • Updates font mapping logic to keep script-agnostic marks with their base character while preserving the cross-script combining fallback behavior from #6857.
  • Updates DirectWriteForwarder itemization/character attribute analysis to incorporate the new script comparison behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
src/Microsoft.DotNet.Wpf/src/PresentationCore/MS/internal/FontFace/PhysicalFontFamily.cs Adjusts font mapping logic to keep script-agnostic combining marks with the base character (emoji/keycap sequences).
src/Microsoft.DotNet.Wpf/src/PresentationCore/MS/internal/Classification.cs Adds script-agnostic combining detection and an IsSameScript helper; updates classification API surface.
src/Microsoft.DotNet.Wpf/src/DirectWriteForwarder/CPP/DWriteWrapper/TextAnalyzer.cpp Adds combining-mark script comparison behavior during analysis and plumbs new classification info.
src/Microsoft.DotNet.Wpf/src/DirectWriteForwarder/CPP/DWriteWrapper/IClassification.h Extends the classification interface with a new out parameter and IsSameScript.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PR metadata: Label to tag PRs, to facilitate with triage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

App crashes while line breaking text for rendering Characters with different font after fallback shaped together

2 participants