Created attachment 1266551 [details] test-text.txt Test text attached. Showing the test text like this pango-view --font='Noto Color Emoji 48' ~/test-text.txt on Fedora 25 shows the emoji not as a single character but as two. On Fedora 24 (and openSUSE Leap 42.2 and Ubuntu 16.04) this works. The version of “Noto Color Emoji” used in all these tests is the latest one from https://www.google.com/get/noto/ which has this file size: -rw-r-----. 1 mfabian mfabian 5987004 10月 20 11:46 NotoColorEmoji.ttf The problem is the same when using the “Emoji One” font from: https://github.com/Ranks/emojione/blob/master/assets/fonts/emojione-android.ttf
Created attachment 1266552 [details] pango-view-fedora-25.png Broken display of the test-text.txt by pango-view --font='Noto Color Emoji 48' ~/test-text.txt on Fedora 25.
Created attachment 1266553 [details] hb-view-fedora-25-correct.png Correct display of the same test text using: hb-view --font-size=24 --text-file=/home/mfabian/test-text.txt --output-file=/tmp/hb.png --output-format=png /usr/share/fonts/google-noto-emoji/NotoColorEmoji.ttf
Created attachment 1266555 [details] pango-view-fedora-24.png Correct display of the same test-text.txt on Fedora 24 using pango-view --font='Noto Color Emoji 48' ~/test-text.txt
hb-view --font-size=24 --text-file=/home/mfabian/test-text.txt --output-file=/tmp/hb.png --output-format=png /usr/share/fonts/google-noto-emoji/NotoColorEmoji.ttf also work correctly on Fedora 24. So it seems the problem is not in harfbuzz.
OK, the problem is caused by the width changes in glibc: Fedora 24: (gdb) p g_unichar_iswide(0x1f469) $4 = 0 (gdb) p g_unichar_iswide(0x200d) $5 = 0 (gdb) p g_unichar_iswide(0x2708) $6 = 0 (gdb) Fedora 25: (gdb) p g_unichar_iswide(0x1f469) $39 = 1 (gdb) p g_unichar_iswide(0x200d) $40 = 0 (gdb) p g_unichar_iswide(0x2708) $41 = 0 (gdb) 👩 U+1F469 WOMAN U+200D ZERO WIDTH JOINER ✈ U+2708 AIRPLANE In Fedora 24, all these 3 characters are “narrow”. In Fedora 25, 👩 is “wide” (because of the Unicode 9.0.0 update of glibc). Pango breaks the run when the character width changes.
Here is the function in pango-context.c which causes the break of the run: /* g_unichar_iswide() uses EastAsianWidth, which is broken. * We should switch to using VerticalTextLayout: * http://www.unicode.org/reports/tr50/#Data50 * * In the mean time, fixup Hangul jamo to be all wide so we * don't break run in the middle. The EastAsianWidth has * 'W' for L-jamo, and 'N' for T and V jamo! * * https://bugzilla.gnome.org/show_bug.cgi?id=705727 */ static gboolean width_iter_iswide (gunichar ch) { if ((0x1100u <= ch && ch <= 0x11FFu) || (0xA960u <= ch && ch <= 0xA97Cu) || (0xD7B0u <= ch && ch <= 0xD7FBu)) return TRUE; return g_unichar_iswide (ch); }
Maybe I should rewrite the function width_iter_next(PangoWidthIter* iter) not to break emoji-zwj sequences: https://git.gnome.org/browse/pango/tree/pango/pango-context.c#n866
maybe good to file a bug to the upstream bugzilla too.
Created attachment 1267272 [details] 0001-Bug-780669-Do-not-start-a-new-run-at-a-zero-width-jo.patch Patch to fix the problem
Created attachment 1267444 [details] Updated patch I updated my patch a bit, pango should not break a run at skin tone modifiers either, otherwise pango would break between U+270C VICTORY HAND (which is single width) and U+1F3FF EMOJI MODIFIER FITZPATRICK TYPE-5 which is double width.
Looks good. E.g. A + 0x200d + B I think it works only when the width of A and width of B are same.
(In reply to fujiwara from comment #11) > Looks good. > E.g. A + 0x200d + B > I think it works only when the width of A and width of B are same. Sorry, correction. width of A >= width of B
I think it works also when the width of A and B are the same and also when width of A < width of B. Because the patch resets the iterator remembering the current width to the width of the next character after the zwj or skin tone modifier: iter->wide = width_iter_iswide (ch) So Pango would only break if yet another width change is encountered and that new width change is *not* at a zwj or skin tone modifier.
Maybe it is even necessary to add 0x1f3f4 to the exceptions where a break at a width change is prevented: diff --git a/pango/pango-context.c b/pango/pango-context.c index f0cea73..2739b2d 100644 --- a/pango/pango-context.c +++ b/pango/pango-context.c @@ -876,7 +876,18 @@ width_iter_next(PangoWidthIter* iter) while (iter->end < iter->text_end) { gunichar ch = g_utf8_get_char (iter->end); - if (width_iter_iswide (ch) != iter->wide) + if (ch == 0x200d || ch == 0x1f3f4 || (ch >= 0x1f3fb && ch <= 0x1f3ff)) + { + /* do not break at a zero-width-joiner or skin tone modifiers*/ + iter->end = g_utf8_next_char (iter->end); + if (iter-> end < iter->text_end) + { + ch = g_utf8_get_char (iter->end); + iter->wide = width_iter_iswide (ch); + } + continue; + } + else if (width_iter_iswide (ch) != iter->wide) break; iter->end = g_utf8_next_char (iter->end); } Because flag sequences like this one: 1F3F4 E0067 E0062 E0073 E0063 E0074 E007F; Emoji_Tag_Sequence; Scotland start with U+1F3F4 WAVING BLACK FLAG, which is a wide character and the rest of the sequence is narrow characters.
(In reply to Mike FABIAN from comment #13) > I think it works also when the width of A and B are the same and > also when width of A < width of B. > > Because the patch resets the iterator remembering the current width > to the width of the next character after the zwj or skin tone modifier: > > iter->wide = width_iter_iswide (ch) > > So Pango would only break if yet another width change is encountered > and that new width change is *not* at a zwj or skin tone modifier. OK, I think iter->wide is used the return value and mean what value is applied to iter->wide finally and wide is the member of PangoWidthIter instead of a local variable. But you believe it's used in locally only so it's ok. Probably I think width_iter_next() also needs to have the exception of 0xfe00 - 0xfe0f. And I'm thinking another patch: --- pango-1.40.4/pango/pango-context.c.orig 2017-03-30 19:03:28.081488378 +0900 +++ pango-1.40.4/pango/pango-context.c 2017-03-31 18:@@ -1395,11 +1407,13 @@ itemize_state_process_run (ItemizeState * characters if they don't, HarfBuzz will compatibility-decompose them * to ASCII space... * See bugs #355987 and #701652. + * U+0023 FE0F 20E3 has G_UNICODE_NON_SPACING_MARK */ type = g_unichar_type (wc); if (G_UNLIKELY (type == G_UNICODE_CONTROL || type == G_UNICODE_FORMAT || type == G_UNICODE_SURROGATE || + type == G_UNICODE_NON_SPACING_MARK || (type == G_UNICODE_SPACE_SEPARATOR && wc != 0x1680u /* OGHAM SPACE MARK */))) { shape_engine = NULL; 58:21.526336181 +0900
(In reply to fujiwara from comment #15) > > Probably I think width_iter_next() also needs to have the exception of > 0xfe00 - 0xfe0f. Yes. Also, my patch currently disallows break before *and* after the special characters. But for 0xfe00 -0xfe0f and 0x1f3fb - 0x1f3ff a break after the special character might be OK. And for the flag sequence starter 0x1f3f4, a break before that character might be OK. I’ll improve that patch a bit.
Thinking this again, maybe 0xfe0e and 0xfe0f are enough instead of 0xfe00 - 0xfe0f. I'd liked to ask pango people about the following suggestion. --- pango-1.40.4/pango/pango-context.c.orig 2017-03-30 19:03:28.081488378 +0900 +++ pango-1.40.4/pango/pango-context.c 2017-04-04 13:35:18.446935865 +0900 @@ -1404,6 +1404,15 @@ itemize_state_process_run (ItemizeState { shape_engine = NULL; font = NULL; + } + /* If an emoji font does not include emoji presentation, let + * harfbuzz handle the characters. + * http://www.unicode.org/emoji/charts/emoji-variants.html + */ + else if (G_UNLIKELY (wc == 0xfe0fu || wc == 0xfe0eu)) + { + shape_engine = NULL; + font = NULL; } else {
(In reply to fujiwara from comment #17) > Thinking this again, maybe 0xfe0e and 0xfe0f are enough instead of 0xfe00 - > 0xfe0f. > > I'd liked to ask pango people about the following suggestion. > --- pango-1.40.4/pango/pango-context.c.orig 2017-03-30 19:03:28.081488378 > +0900 > +++ pango-1.40.4/pango/pango-context.c 2017-04-04 13:35:18.446935865 +0900 > @@ -1404,6 +1404,15 @@ itemize_state_process_run (ItemizeState > { > shape_engine = NULL; > font = NULL; > + } > + /* If an emoji font does not include emoji presentation, let > + * harfbuzz handle the characters. > + * http://www.unicode.org/emoji/charts/emoji-variants.html > + */ > + else if (G_UNLIKELY (wc == 0xfe0fu || wc == 0xfe0eu)) > + { > + shape_engine = NULL; > + font = NULL; > } > else > { Yes, this works, it makes the fully-qualified emoji sequences work for me! For example for this fully-qualified sequcence for the male golfer 🏌️♂️ U+1F3CC U+FE0F U+200D U+2642 U+FE0F works for me (= renders as a single glyph) only when using the above patch, without that patch is is rendered as several glyphs. Without that patch, only the non-fully-qualified sequence 🏌♂ U+1F3CC U+200D U+2642 works. As using the fully-qualified sequences is recommended I think that patch is needed.
(In reply to Mike FABIAN from comment #16) > (In reply to fujiwara from comment #15) > > > > Probably I think width_iter_next() also needs to have the exception of > > 0xfe00 - 0xfe0f. > > Yes. > > Also, my patch currently disallows break before *and* after > the special characters. But for 0xfe00 -0xfe0f and 0x1f3fb - 0x1f3ff a break > after the special character might be OK. And for the flag sequence starter > 0x1f3f4, a break before that character might be OK. > > I’ll improve that patch a bit. Peng Wu already did that, his patch is here: https://bugzilla.gnome.org/show_bug.cgi?id=780669#c14
pango-1.40.7-1.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-79637b77e0
pango-1.40.7-1.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2017-f55b21a811
pango-1.40.7-1.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-f55b21a811
I think the upstream bug 780669 is not fixed yet?
pango-1.40.7-1.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-79637b77e0
I tried pango-1.40.7-1.fc26, it seems the upstream bug 780669 is not fixed yet.
pango-1.40.7-1.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.
pango-1.40.7-1.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.