Bug 1436077
| Summary: | Some emoji which should render as one character with the “Noto Color Emoji” font render as several characters | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Mike FABIAN <mfabian> | ||||||||||||||
| Component: | pango | Assignee: | Akira TAGOH <tagoh> | ||||||||||||||
| Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||||||
| Severity: | unspecified | Docs Contact: | |||||||||||||||
| Priority: | unspecified | ||||||||||||||||
| Version: | 25 | CC: | fonts-bugs, i18n-bugs, mfabian, pwu, tagoh, tfujiwar | ||||||||||||||
| Target Milestone: | --- | ||||||||||||||||
| Target Release: | --- | ||||||||||||||||
| Hardware: | Unspecified | ||||||||||||||||
| OS: | Unspecified | ||||||||||||||||
| Whiteboard: | |||||||||||||||||
| Fixed In Version: | pango-1.40.7-1.fc26 pango-1.40.7-1.fc25 | Doc Type: | If docs needed, set a value | ||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||
| Last Closed: | 2017-07-23 03:57:29 UTC | Type: | Bug | ||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||
| Embargoed: | |||||||||||||||||
| Attachments: |
|
||||||||||||||||
|
Description
Mike FABIAN
2017-03-27 05:43:48 UTC
Created attachment 1266552 [details]
pango-view-fedora-25.png
Broken display of the test-text.txt by
pango-view --font='Noto Color Emoji 48' ~/test-text.txt
on Fedora 25.
Created attachment 1266553 [details]
hb-view-fedora-25-correct.png
Correct display of the same test text using:
hb-view --font-size=24 --text-file=/home/mfabian/test-text.txt --output-file=/tmp/hb.png --output-format=png /usr/share/fonts/google-noto-emoji/NotoColorEmoji.ttf
Created attachment 1266555 [details]
pango-view-fedora-24.png
Correct display of the same test-text.txt on Fedora 24 using
pango-view --font='Noto Color Emoji 48' ~/test-text.txt
hb-view --font-size=24 --text-file=/home/mfabian/test-text.txt --output-file=/tmp/hb.png --output-format=png /usr/share/fonts/google-noto-emoji/NotoColorEmoji.ttf also work correctly on Fedora 24. So it seems the problem is not in harfbuzz. OK, the problem is caused by the width changes in glibc: Fedora 24: (gdb) p g_unichar_iswide(0x1f469) $4 = 0 (gdb) p g_unichar_iswide(0x200d) $5 = 0 (gdb) p g_unichar_iswide(0x2708) $6 = 0 (gdb) Fedora 25: (gdb) p g_unichar_iswide(0x1f469) $39 = 1 (gdb) p g_unichar_iswide(0x200d) $40 = 0 (gdb) p g_unichar_iswide(0x2708) $41 = 0 (gdb) 👩 U+1F469 WOMAN U+200D ZERO WIDTH JOINER ✈ U+2708 AIRPLANE In Fedora 24, all these 3 characters are “narrow”. In Fedora 25, 👩 is “wide” (because of the Unicode 9.0.0 update of glibc). Pango breaks the run when the character width changes. Here is the function in pango-context.c which causes the break of the run: /* g_unichar_iswide() uses EastAsianWidth, which is broken. * We should switch to using VerticalTextLayout: * http://www.unicode.org/reports/tr50/#Data50 * * In the mean time, fixup Hangul jamo to be all wide so we * don't break run in the middle. The EastAsianWidth has * 'W' for L-jamo, and 'N' for T and V jamo! * * https://bugzilla.gnome.org/show_bug.cgi?id=705727 */ static gboolean width_iter_iswide (gunichar ch) { if ((0x1100u <= ch && ch <= 0x11FFu) || (0xA960u <= ch && ch <= 0xA97Cu) || (0xD7B0u <= ch && ch <= 0xD7FBu)) return TRUE; return g_unichar_iswide (ch); } Maybe I should rewrite the function width_iter_next(PangoWidthIter* iter) not to break emoji-zwj sequences: https://git.gnome.org/browse/pango/tree/pango/pango-context.c#n866 maybe good to file a bug to the upstream bugzilla too. Created attachment 1267272 [details]
0001-Bug-780669-Do-not-start-a-new-run-at-a-zero-width-jo.patch
Patch to fix the problem
Created attachment 1267444 [details]
Updated patch
I updated my patch a bit, pango should not break a run at
skin tone modifiers either, otherwise pango would break between
U+270C VICTORY HAND (which is single width)
and U+1F3FF EMOJI MODIFIER FITZPATRICK TYPE-5
which is double width.
Looks good. E.g. A + 0x200d + B I think it works only when the width of A and width of B are same. (In reply to fujiwara from comment #11) > Looks good. > E.g. A + 0x200d + B > I think it works only when the width of A and width of B are same. Sorry, correction. width of A >= width of B I think it works also when the width of A and B are the same and
also when width of A < width of B.
Because the patch resets the iterator remembering the current width
to the width of the next character after the zwj or skin tone modifier:
iter->wide = width_iter_iswide (ch)
So Pango would only break if yet another width change is encountered
and that new width change is *not* at a zwj or skin tone modifier.
Maybe it is even necessary to add 0x1f3f4 to the exceptions where
a break at a width change is prevented:
diff --git a/pango/pango-context.c b/pango/pango-context.c
index f0cea73..2739b2d 100644
--- a/pango/pango-context.c
+++ b/pango/pango-context.c
@@ -876,7 +876,18 @@ width_iter_next(PangoWidthIter* iter)
while (iter->end < iter->text_end)
{
gunichar ch = g_utf8_get_char (iter->end);
- if (width_iter_iswide (ch) != iter->wide)
+ if (ch == 0x200d || ch == 0x1f3f4 || (ch >= 0x1f3fb && ch <= 0x1f3ff))
+ {
+ /* do not break at a zero-width-joiner or skin tone modifiers*/
+ iter->end = g_utf8_next_char (iter->end);
+ if (iter-> end < iter->text_end)
+ {
+ ch = g_utf8_get_char (iter->end);
+ iter->wide = width_iter_iswide (ch);
+ }
+ continue;
+ }
+ else if (width_iter_iswide (ch) != iter->wide)
break;
iter->end = g_utf8_next_char (iter->end);
}
Because flag sequences like this one:
1F3F4 E0067 E0062 E0073 E0063 E0074 E007F; Emoji_Tag_Sequence; Scotland
start with U+1F3F4 WAVING BLACK FLAG, which is a wide character
and the rest of the sequence is narrow characters.
(In reply to Mike FABIAN from comment #13) > I think it works also when the width of A and B are the same and > also when width of A < width of B. > > Because the patch resets the iterator remembering the current width > to the width of the next character after the zwj or skin tone modifier: > > iter->wide = width_iter_iswide (ch) > > So Pango would only break if yet another width change is encountered > and that new width change is *not* at a zwj or skin tone modifier. OK, I think iter->wide is used the return value and mean what value is applied to iter->wide finally and wide is the member of PangoWidthIter instead of a local variable. But you believe it's used in locally only so it's ok. Probably I think width_iter_next() also needs to have the exception of 0xfe00 - 0xfe0f. And I'm thinking another patch: --- pango-1.40.4/pango/pango-context.c.orig 2017-03-30 19:03:28.081488378 +0900 +++ pango-1.40.4/pango/pango-context.c 2017-03-31 18:@@ -1395,11 +1407,13 @@ itemize_state_process_run (ItemizeState * characters if they don't, HarfBuzz will compatibility-decompose them * to ASCII space... * See bugs #355987 and #701652. + * U+0023 FE0F 20E3 has G_UNICODE_NON_SPACING_MARK */ type = g_unichar_type (wc); if (G_UNLIKELY (type == G_UNICODE_CONTROL || type == G_UNICODE_FORMAT || type == G_UNICODE_SURROGATE || + type == G_UNICODE_NON_SPACING_MARK || (type == G_UNICODE_SPACE_SEPARATOR && wc != 0x1680u /* OGHAM SPACE MARK */))) { shape_engine = NULL; 58:21.526336181 +0900 (In reply to fujiwara from comment #15) > > Probably I think width_iter_next() also needs to have the exception of > 0xfe00 - 0xfe0f. Yes. Also, my patch currently disallows break before *and* after the special characters. But for 0xfe00 -0xfe0f and 0x1f3fb - 0x1f3ff a break after the special character might be OK. And for the flag sequence starter 0x1f3f4, a break before that character might be OK. I’ll improve that patch a bit. Thinking this again, maybe 0xfe0e and 0xfe0f are enough instead of 0xfe00 - 0xfe0f.
I'd liked to ask pango people about the following suggestion.
--- pango-1.40.4/pango/pango-context.c.orig 2017-03-30 19:03:28.081488378 +0900
+++ pango-1.40.4/pango/pango-context.c 2017-04-04 13:35:18.446935865 +0900
@@ -1404,6 +1404,15 @@ itemize_state_process_run (ItemizeState
{
shape_engine = NULL;
font = NULL;
+ }
+ /* If an emoji font does not include emoji presentation, let
+ * harfbuzz handle the characters.
+ * http://www.unicode.org/emoji/charts/emoji-variants.html
+ */
+ else if (G_UNLIKELY (wc == 0xfe0fu || wc == 0xfe0eu))
+ {
+ shape_engine = NULL;
+ font = NULL;
}
else
{
(In reply to fujiwara from comment #17) > Thinking this again, maybe 0xfe0e and 0xfe0f are enough instead of 0xfe00 - > 0xfe0f. > > I'd liked to ask pango people about the following suggestion. > --- pango-1.40.4/pango/pango-context.c.orig 2017-03-30 19:03:28.081488378 > +0900 > +++ pango-1.40.4/pango/pango-context.c 2017-04-04 13:35:18.446935865 +0900 > @@ -1404,6 +1404,15 @@ itemize_state_process_run (ItemizeState > { > shape_engine = NULL; > font = NULL; > + } > + /* If an emoji font does not include emoji presentation, let > + * harfbuzz handle the characters. > + * http://www.unicode.org/emoji/charts/emoji-variants.html > + */ > + else if (G_UNLIKELY (wc == 0xfe0fu || wc == 0xfe0eu)) > + { > + shape_engine = NULL; > + font = NULL; > } > else > { Yes, this works, it makes the fully-qualified emoji sequences work for me! For example for this fully-qualified sequcence for the male golfer 🏌️♂️ U+1F3CC U+FE0F U+200D U+2642 U+FE0F works for me (= renders as a single glyph) only when using the above patch, without that patch is is rendered as several glyphs. Without that patch, only the non-fully-qualified sequence 🏌♂ U+1F3CC U+200D U+2642 works. As using the fully-qualified sequences is recommended I think that patch is needed. (In reply to Mike FABIAN from comment #16) > (In reply to fujiwara from comment #15) > > > > Probably I think width_iter_next() also needs to have the exception of > > 0xfe00 - 0xfe0f. > > Yes. > > Also, my patch currently disallows break before *and* after > the special characters. But for 0xfe00 -0xfe0f and 0x1f3fb - 0x1f3ff a break > after the special character might be OK. And for the flag sequence starter > 0x1f3f4, a break before that character might be OK. > > I’ll improve that patch a bit. Peng Wu already did that, his patch is here: https://bugzilla.gnome.org/show_bug.cgi?id=780669#c14 pango-1.40.7-1.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-79637b77e0 pango-1.40.7-1.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2017-f55b21a811 pango-1.40.7-1.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-f55b21a811 I think the upstream bug 780669 is not fixed yet? pango-1.40.7-1.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-79637b77e0 I tried pango-1.40.7-1.fc26, it seems the upstream bug 780669 is not fixed yet. pango-1.40.7-1.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report. pango-1.40.7-1.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report. |