Bug 1436077

Summary:

Some emoji which should render as one character with the “Noto Color Emoji” font render as several characters

Product:

[Fedora] Fedora

Reporter:

Mike FABIAN <mfabian>

Component:

pango

Assignee:

Akira TAGOH <tagoh>

Status:

CLOSED ERRATA

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

CC:

fonts-bugs, i18n-bugs, mfabian, pwu, tagoh, tfujiwar

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

pango-1.40.7-1.fc26 pango-1.40.7-1.fc25

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2017-07-23 03:57:29 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
test-text.txt	none
pango-view-fedora-25.png	none
hb-view-fedora-25-correct.png	none
pango-view-fedora-24.png	none
0001-Bug-780669-Do-not-start-a-new-run-at-a-zero-width-jo.patch	none
Updated patch	none

Description Mike FABIAN 2017-03-27 05:43:48 UTC

Created attachment 1266551 [details]
test-text.txt

Test text attached. Showing the test text like this

pango-view --font='Noto Color Emoji 48' ~/test-text.txt

on Fedora 25 shows the emoji not as a single character but as two.

On Fedora 24 (and openSUSE Leap 42.2 and Ubuntu 16.04) this works.

The version of “Noto Color Emoji” used in all these tests is
the latest one from https://www.google.com/get/noto/

which has this file size:

-rw-r-----. 1 mfabian mfabian 5987004 10月 20 11:46 NotoColorEmoji.ttf

The problem is the same when using the “Emoji One” font from:
https://github.com/Ranks/emojione/blob/master/assets/fonts/emojione-android.ttf

Comment 1 Mike FABIAN 2017-03-27 05:45:30 UTC

Created attachment 1266552 [details]
pango-view-fedora-25.png

Broken display of the test-text.txt by 

pango-view --font='Noto Color Emoji 48' ~/test-text.txt

on Fedora 25.

Comment 2 Mike FABIAN 2017-03-27 05:55:29 UTC

Created attachment 1266553 [details]
hb-view-fedora-25-correct.png

Correct display of the same test text using:

hb-view --font-size=24 --text-file=/home/mfabian/test-text.txt --output-file=/tmp/hb.png --output-format=png /usr/share/fonts/google-noto-emoji/NotoColorEmoji.ttf

Comment 3 Mike FABIAN 2017-03-27 06:12:05 UTC

Created attachment 1266555 [details]
pango-view-fedora-24.png

Correct display of the same test-text.txt on Fedora 24 using

pango-view --font='Noto Color Emoji 48' ~/test-text.txt

Comment 4 Mike FABIAN 2017-03-27 06:15:56 UTC

hb-view --font-size=24 --text-file=/home/mfabian/test-text.txt --output-file=/tmp/hb.png --output-format=png /usr/share/fonts/google-noto-emoji/NotoColorEmoji.ttf

also work correctly on Fedora 24.

So it seems the problem is not in harfbuzz.

Comment 5 Mike FABIAN 2017-03-28 07:53:13 UTC

OK, the problem is caused by the width changes in glibc:

Fedora 24:

(gdb) p g_unichar_iswide(0x1f469)
$4 = 0
(gdb) p g_unichar_iswide(0x200d)
$5 = 0
(gdb) p g_unichar_iswide(0x2708)
$6 = 0
(gdb) 

Fedora 25:

(gdb) p g_unichar_iswide(0x1f469)
$39 = 1
(gdb) p g_unichar_iswide(0x200d)
$40 = 0
(gdb) p g_unichar_iswide(0x2708)
$41 = 0
(gdb) 

👩 U+1F469 WOMAN
U+200D ZERO WIDTH JOINER
✈ U+2708 AIRPLANE

In Fedora 24, all these 3 characters are “narrow”.
In Fedora 25, 👩 is “wide” (because of the Unicode 9.0.0 update of glibc).
Pango breaks the run when the character width changes.

Comment 6 Mike FABIAN 2017-03-28 08:06:55 UTC

Here is the function in pango-context.c which causes the break of the run:

/* g_unichar_iswide() uses EastAsianWidth, which is broken.
 * We should switch to using VerticalTextLayout:
 * http://www.unicode.org/reports/tr50/#Data50
 *
 * In the mean time, fixup Hangul jamo to be all wide so we
 * don't break run in the middle.  The EastAsianWidth has
 * 'W' for L-jamo, and 'N' for T and V jamo!
 *
 * https://bugzilla.gnome.org/show_bug.cgi?id=705727
 */
static gboolean
width_iter_iswide (gunichar ch)
{
  if ((0x1100u <= ch && ch <= 0x11FFu) ||
      (0xA960u <= ch && ch <= 0xA97Cu) ||
      (0xD7B0u <= ch && ch <= 0xD7FBu))
    return TRUE;

  return g_unichar_iswide (ch);
}

Comment 7 Mike FABIAN 2017-03-28 16:21:10 UTC

Maybe I should rewrite the function width_iter_next(PangoWidthIter* iter)
not to break emoji-zwj sequences:

https://git.gnome.org/browse/pango/tree/pango/pango-context.c#n866

Comment 8 Akira TAGOH 2017-03-29 02:17:33 UTC

maybe good to file a bug to the upstream bugzilla too.

Comment 9 Mike FABIAN 2017-03-29 10:28:06 UTC

Created attachment 1267272 [details]
0001-Bug-780669-Do-not-start-a-new-run-at-a-zero-width-jo.patch

Patch to fix the problem

Comment 10 Mike FABIAN 2017-03-30 08:44:57 UTC

Created attachment 1267444 [details]
Updated patch

I updated my patch a bit, pango should not break a run at 
skin tone modifiers either, otherwise pango would break between 
U+270C VICTORY HAND (which is single width)
and U+1F3FF EMOJI MODIFIER FITZPATRICK TYPE-5
which is double width.

Comment 11 fujiwara 2017-03-30 10:26:53 UTC

Looks good.
E.g. A + 0x200d + B
I think it works only when the width of A and width of B are same.

Comment 12 fujiwara 2017-03-30 10:40:05 UTC

(In reply to fujiwara from comment #11)
> Looks good.
> E.g. A + 0x200d + B
> I think it works only when the width of A and width of B are same.

Sorry, correction.
width of A >= width of B

Comment 13 Mike FABIAN 2017-03-30 10:58:08 UTC

I think it works also when the width of A and B are the same and
also when width of A < width of B. 

Because the patch resets the iterator remembering the current width
to the width of the next character after the zwj or skin tone modifier:

    iter->wide = width_iter_iswide (ch)

So Pango would only break if yet another width change is encountered
and that new width change is *not* at a zwj or skin tone modifier.

Comment 14 Mike FABIAN 2017-03-30 12:03:59 UTC

Maybe it is even necessary to add 0x1f3f4 to the exceptions where
a break at a width change is prevented:

diff --git a/pango/pango-context.c b/pango/pango-context.c
index f0cea73..2739b2d 100644
--- a/pango/pango-context.c
+++ b/pango/pango-context.c
@@ -876,7 +876,18 @@ width_iter_next(PangoWidthIter* iter)
   while (iter->end < iter->text_end)
     {
       gunichar ch = g_utf8_get_char (iter->end);
-      if (width_iter_iswide (ch) != iter->wide)
+      if (ch == 0x200d || ch == 0x1f3f4 || (ch >= 0x1f3fb && ch <= 0x1f3ff))
+        {
+          /* do not break at a zero-width-joiner or skin tone modifiers*/
+          iter->end = g_utf8_next_char (iter->end);
+          if (iter-> end < iter->text_end)
+            {
+              ch = g_utf8_get_char (iter->end);
+              iter->wide = width_iter_iswide (ch);
+            }
+          continue;
+        }
+      else if (width_iter_iswide (ch) != iter->wide)
         break;
       iter->end = g_utf8_next_char (iter->end);
     }

Because flag sequences like this one:

1F3F4 E0067 E0062 E0073 E0063 E0074 E007F; Emoji_Tag_Sequence; Scotland

start with U+1F3F4 WAVING BLACK FLAG, which is a wide character
and the rest of the sequence is narrow characters.

Comment 15 fujiwara 2017-03-31 10:05:56 UTC

(In reply to Mike FABIAN from comment #13)
> I think it works also when the width of A and B are the same and
> also when width of A < width of B. 
> 
> Because the patch resets the iterator remembering the current width
> to the width of the next character after the zwj or skin tone modifier:
> 
>     iter->wide = width_iter_iswide (ch)
> 
> So Pango would only break if yet another width change is encountered
> and that new width change is *not* at a zwj or skin tone modifier.

OK, I think iter->wide is used the return value and mean what value is applied to iter->wide finally and wide is the member of PangoWidthIter instead of a local variable.
But you believe it's used in locally only so it's ok.

Probably I think width_iter_next() also needs to have the exception of 0xfe00 - 0xfe0f.

And I'm thinking another patch:

--- pango-1.40.4/pango/pango-context.c.orig	2017-03-30 19:03:28.081488378 +0900
+++ pango-1.40.4/pango/pango-context.c	2017-03-31 18:@@ -1395,11 +1407,13 @@ itemize_state_process_run (ItemizeState
        * characters if they don't, HarfBuzz will compatibility-decompose them
        * to ASCII space...
        * See bugs #355987 and #701652.
+       * U+0023 FE0F 20E3 has G_UNICODE_NON_SPACING_MARK
        */
       type = g_unichar_type (wc);
       if (G_UNLIKELY (type == G_UNICODE_CONTROL ||
                       type == G_UNICODE_FORMAT ||
                       type == G_UNICODE_SURROGATE ||
+                      type == G_UNICODE_NON_SPACING_MARK ||
                       (type == G_UNICODE_SPACE_SEPARATOR && wc != 0x1680u /* OGHAM SPACE MARK */)))
         {
 	  shape_engine = NULL;
58:21.526336181 +0900

Comment 16 Mike FABIAN 2017-04-03 13:21:08 UTC

(In reply to fujiwara from comment #15)
> 
> Probably I think width_iter_next() also needs to have the exception of
> 0xfe00 - 0xfe0f.

Yes.

Also, my patch currently disallows break before *and* after
the  special characters. But for 0xfe00 -0xfe0f and 0x1f3fb - 0x1f3ff a break
after the special character might be OK. And for the flag sequence starter
0x1f3f4, a break before that character might be OK.

I’ll improve that patch a bit.

Comment 17 fujiwara 2017-04-04 04:43:10 UTC

Thinking this again, maybe 0xfe0e and 0xfe0f are enough instead of 0xfe00 - 0xfe0f.

I'd liked to ask pango people about the following suggestion.
--- pango-1.40.4/pango/pango-context.c.orig	2017-03-30 19:03:28.081488378 +0900
+++ pango-1.40.4/pango/pango-context.c	2017-04-04 13:35:18.446935865 +0900
@@ -1404,6 +1404,15 @@ itemize_state_process_run (ItemizeState
         {
 	  shape_engine = NULL;
 	  font = NULL;
+        }
+      /* If an emoji font does not include emoji presentation, let
+       * harfbuzz handle the characters.
+       * http://www.unicode.org/emoji/charts/emoji-variants.html
+       */
+      else if (G_UNLIKELY (wc == 0xfe0fu || wc == 0xfe0eu))
+        {
+	  shape_engine = NULL;
+	  font = NULL;
         }
       else
         {

Comment 18 Mike FABIAN 2017-04-05 10:17:05 UTC

(In reply to fujiwara from comment #17)
> Thinking this again, maybe 0xfe0e and 0xfe0f are enough instead of 0xfe00 -
> 0xfe0f.
> 
> I'd liked to ask pango people about the following suggestion.
> --- pango-1.40.4/pango/pango-context.c.orig	2017-03-30 19:03:28.081488378
> +0900
> +++ pango-1.40.4/pango/pango-context.c	2017-04-04 13:35:18.446935865 +0900
> @@ -1404,6 +1404,15 @@ itemize_state_process_run (ItemizeState
>          {
>  	  shape_engine = NULL;
>  	  font = NULL;
> +        }
> +      /* If an emoji font does not include emoji presentation, let
> +       * harfbuzz handle the characters.
> +       * http://www.unicode.org/emoji/charts/emoji-variants.html
> +       */
> +      else if (G_UNLIKELY (wc == 0xfe0fu || wc == 0xfe0eu))
> +        {
> +	  shape_engine = NULL;
> +	  font = NULL;
>          }
>        else
>          {

Yes, this works, it makes the fully-qualified emoji sequences work for me!

For example for this fully-qualified sequcence
for the male golfer

🏌️‍♂️ U+1F3CC U+FE0F U+200D U+2642 U+FE0F

works for me (= renders as a single glyph) only when using the above
patch, without that patch is is rendered as several glyphs.

Without that patch, only the  non-fully-qualified sequence

🏌‍♂ U+1F3CC U+200D U+2642

works.

As using the fully-qualified sequences is recommended I think that
patch is needed.

Comment 19 Mike FABIAN 2017-04-05 10:19:25 UTC

(In reply to Mike FABIAN from comment #16)
> (In reply to fujiwara from comment #15)
> > 
> > Probably I think width_iter_next() also needs to have the exception of
> > 0xfe00 - 0xfe0f.
> 
> Yes.
> 
> Also, my patch currently disallows break before *and* after
> the  special characters. But for 0xfe00 -0xfe0f and 0x1f3fb - 0x1f3ff a break
> after the special character might be OK. And for the flag sequence starter
> 0x1f3f4, a break before that character might be OK.
> 
> I’ll improve that patch a bit.

Peng Wu already did that, his patch is here:

https://bugzilla.gnome.org/show_bug.cgi?id=780669#c14

Comment 20 Fedora Update System 2017-07-18 09:29:35 UTC

pango-1.40.7-1.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-79637b77e0

Comment 21 Fedora Update System 2017-07-18 09:29:43 UTC

pango-1.40.7-1.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2017-f55b21a811

Comment 22 Fedora Update System 2017-07-19 04:26:10 UTC

pango-1.40.7-1.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-f55b21a811

Comment 23 fujiwara 2017-07-19 06:43:00 UTC

I think the upstream bug 780669 is not fixed yet?

Comment 24 Fedora Update System 2017-07-20 00:25:48 UTC

pango-1.40.7-1.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-79637b77e0

Comment 25 Peng Wu 2017-07-20 05:23:50 UTC

I tried pango-1.40.7-1.fc26, it seems the upstream bug 780669 is not fixed yet.

Comment 26 Fedora Update System 2017-07-23 03:57:29 UTC

pango-1.40.7-1.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.

Comment 27 Fedora Update System 2017-07-25 21:28:14 UTC

pango-1.40.7-1.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.