Bug 1529984

Summary: Give Chinese users a sans, a serif, and a monospace
Product: [Fedora] Fedora Reporter: Mingye Wang <arthur200126>
Component: google-noto-cjk-fontsAssignee: Peng Wu <pwu>
Status: CLOSED NEXTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 28CC: alick9188, pswo10680, pwu, sztsian, tiansworld, wun00+redhatbug
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-03-14 03:15:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Source Han Serif used as sans in KDE, Fedora 27, "百度一下你就知道"
none
fc-match on Fedora 27 Workstation Live DVD
none
screenshot of Source Han Sans HW SC for gnome-terminal
none
Gedit on Fedora 27 Workstation (Live).
none
screenshot of Fedora 27 for gnome-terminal with default fonts
none
Sarasa Mono SC Regular in gedit
none
gedit with Sarasa Mono misalignment when font size is 16 none

Description Mingye Wang 2017-12-31 16:32:37 UTC
Description of problem:

Fedora since version 21 explicitly fails to discriminate between sans-serif, serif, and monospace for Chinese language. Rather than providing proper defaults, Fedora opted to make "monospace" a Chinese font with proportional latin text and "serif" sans serifs;[1] it chose Source Han Sans for all cases. 

[1]: https://lists.fedoraproject.org/archives/list/chinese@lists.fedoraproject.org/message/LC543WGADZMLLWZQLLK2D57IJK7WPHSE/

Some users have recently reported that Source Han Serif has now become the default font for, once again, everything including the UI. Perhaps "sans-serif" has serifs now.

This issue affects all Chinese variants.

Version-Release number of selected component (if applicable):

Since Fedora 21. 

How reproducible:

Universally

Steps to Reproduce:
1. LC_ALL=zh_CN.UTF-8 fc-match -a Sans
2. LC_ALL=zh_CN.UTF-8 fc-match -a Serif
3. LC_ALL=zh_CN.UTF-8 fc-match -a Monospace

Actual results:
Same thing for Han

Expected results:
Different things

Additional info:

Comment 1 Mingye Wang 2017-12-31 16:37:16 UTC
Moreover, in the linked email the i18n person (Peng Wu) fails to actually see the problem. Wu defends himself by suggesting using the {Song,Ming}/Kai (printed/script) contradiction in favor of {Song,Ming}/Hei (sans/serif) system, not knowing that:

* CJK sans-serifs were developed precisely to match and to mimic Latin sans-serifs; they are called "Gothic" in Japan for a reason
* There is a generic family called "script" in fontconfig and CSS speak

Wu probably didn't even know a damn about generic family names, and y'all at RH probably shouldn't assume that speaking a language natively means being an okay typographer. Hell, we grew from the ages of SimSun bitmap as UI font and WordArt.

Comment 2 Mingye Wang 2017-12-31 16:46:08 UTC
Oh and the monospace business.

Wu:
> 中文字体大部分都是 dual width 或 proportional,很少是 monospace 的。
> "Most Chinese fonts are either dual width or proportional; few are monospaced."

It's time to introduce this "wcwidth" magic, a function that tells you how many columns a character takes up. Being monospaced doesn't mean everything is the same width in this big new world with emojis and ideographs; you just have to keep everything's width a simple multiple of the base latin width. (wcwidth and Chinese convention uses 2 cols for CJK text; Japanese convention uses 1.5.) All so-called "dual-width" stuff, like Source Han Sans HW, is therefore monospaced.

(I'm not saying you should use Source Han Sans HW for monospace, since it's terrible for coding with its hard-to-tell 0/O group. But it is monospaced.)

Comment 3 Mingye Wang 2017-12-31 17:01:18 UTC
The "new user" report regarding Source Han Serif comes from Fedora 27.

Comment 4 Mingye Wang 2017-12-31 17:06:11 UTC
Created attachment 1375011 [details]
Source Han Serif used as sans in KDE, Fedora 27, "百度一下你就知道"

The Source Han Serif report affects only instances installed from the KDE live image and the live session itself. Uninstalling SHSerif appears to help. (Screenshot and testing courtesy of Ted Cynx.)

Comment 5 Alick Zhao 2017-12-31 17:40:35 UTC
Hi Mingye, thanks for reporting the bug.

I changed the component of the bug so hopefully @pwu will get notified.

I think there might be two bugs/issues here actually:

1. As the bug summary says, there used to be only one Chinese font installed by default on Fedora. Whatever it is, it will be treated as Chinese font for all sans/serif/monospace styles. @pwu has mentioned this before in the same thread:

https://lists.fedoraproject.org/archives/list/chinese@lists.fedoraproject.org/message/JHEGZWB32R5GSXVXRPS677QKAOZ5N7IZ/

The reason I guess might be some fontconfig packaging guideline/policy/convention. @pwu should have more definitive answer.

Personally I'd like to treat this as a bug, and update existing policy, and have different Chinese fonts for sans/serif/monospace.

Besides, it seems to me on Fedora 27, if google-noto-cjk-fonts is installed by default, then at least we have both sans and serif for Chinese. So this issue is partially (but not fully) resolved.

2. KDE picks up Source Han Serif while other DEs use Sans font on Fedora 27.

I suspect this is a different issue, and it might lie in the interplay between KDE and fontconfig. More investigation is needed. But this seems a different issue. Please open a new bug for this issue.

Last, please keep the bug report focus on software issues. Never ever do personal attacks.

Comment 6 Zamir SUN 2017-12-31 23:49:55 UTC
I think this is a pretty meaningful bug.
I can reproduce the sans/serif/monospace issue on Fedora 27. Everything is SourceHanSans. I know there are engineers coding in Chinese (易语言), for them I believe monospace should be something that is really more clear.

[zsun@x240 ~]$ LC_ALL=zh_CN.UTF-8 fc-match -a Sans | grep -i han
SourceHanSansCN-Regular.otf: "思源黑体 CN" "Regular"
SourceHanSansCN-Medium.otf: "思源黑体 CN" "Medium"
SourceHanSansCN-Normal.otf: "思源黑体 CN" "Normal"
SourceHanSansCN-Light.otf: "思源黑体 CN" "Light"
SourceHanSansCN-ExtraLight.otf: "思源黑体 CN" "ExtraLight"
SourceHanSansCN-Bold.otf: "思源黑体 CN" "Bold"
SourceHanSansCN-Heavy.otf: "思源黑体 CN" "Heavy"
SourceHanSansTW-Regular.otf: "思源黑體 TW" "Regular"
SourceHanSansTW-Medium.otf: "思源黑體 TW" "Medium"
SourceHanSansTW-Normal.otf: "思源黑體 TW" "Normal"
SourceHanSansTW-Light.otf: "思源黑體 TW" "Light"
SourceHanSansTW-ExtraLight.otf: "思源黑體 TW" "ExtraLight"
SourceHanSansTW-Bold.otf: "思源黑體 TW" "Bold"
SourceHanSansTW-Heavy.otf: "思源黑體 TW" "Heavy"
[zsun@x240 ~]$ LC_ALL=zh_CN.UTF-8 fc-match -a Serif | grep -i han
SourceHanSansCN-Regular.otf: "思源黑体 CN" "Regular"
SourceHanSansTW-Regular.otf: "思源黑體 TW" "Regular"
SourceHanSansCN-Medium.otf: "思源黑体 CN" "Medium"
SourceHanSansTW-Medium.otf: "思源黑體 TW" "Medium"
SourceHanSansCN-Normal.otf: "思源黑体 CN" "Normal"
SourceHanSansTW-Normal.otf: "思源黑體 TW" "Normal"
SourceHanSansCN-Light.otf: "思源黑体 CN" "Light"
SourceHanSansTW-Light.otf: "思源黑體 TW" "Light"
SourceHanSansCN-ExtraLight.otf: "思源黑体 CN" "ExtraLight"
SourceHanSansTW-ExtraLight.otf: "思源黑體 TW" "ExtraLight"
SourceHanSansCN-Bold.otf: "思源黑体 CN" "Bold"
SourceHanSansTW-Bold.otf: "思源黑體 TW" "Bold"
SourceHanSansCN-Heavy.otf: "思源黑体 CN" "Heavy"
SourceHanSansTW-Heavy.otf: "思源黑體 TW" "Heavy"
[zsun@x240 ~]$ LC_ALL=zh_CN.UTF-8 fc-match -a Monospace | grep -i han
SourceHanSansCN-Regular.otf: "思源黑体 CN" "Regular"
SourceHanSansCN-Medium.otf: "思源黑体 CN" "Medium"
SourceHanSansCN-Normal.otf: "思源黑体 CN" "Normal"
SourceHanSansCN-Light.otf: "思源黑体 CN" "Light"
SourceHanSansCN-ExtraLight.otf: "思源黑体 CN" "ExtraLight"
SourceHanSansCN-Bold.otf: "思源黑体 CN" "Bold"
SourceHanSansCN-Heavy.otf: "思源黑体 CN" "Heavy"
SourceHanSansTW-Regular.otf: "思源黑體 TW" "Regular"
SourceHanSansTW-Medium.otf: "思源黑體 TW" "Medium"
SourceHanSansTW-Normal.otf: "思源黑體 TW" "Normal"
SourceHanSansTW-Light.otf: "思源黑體 TW" "Light"
SourceHanSansTW-ExtraLight.otf: "思源黑體 TW" "ExtraLight"
SourceHanSansTW-Bold.otf: "思源黑體 TW" "Bold"
SourceHanSansTW-Heavy.otf: "思源黑體 TW" "Heavy"

Comment 7 Zamir SUN 2017-12-31 23:53:54 UTC
(In reply to Alick Zhao from comment #5)
> 2. KDE picks up Source Han Serif while other DEs use Sans font on Fedora 27.
> 
> I suspect this is a different issue, and it might lie in the interplay
> between KDE and fontconfig. More investigation is needed. But this seems a
> different issue. Please open a new bug for this issue.

Hi Alick,

I believe this is specific to KDE. I cannot reproduce it on XFCE. All Chinese on XFCE (with en_US.utf8 as locale) are sans font. And this seems not related with the origin report. Let's track it separately.

Comment 8 Mingye Wang 2017-12-31 23:58:24 UTC
Just for a heads up, I just filed the KDE issue separately as bug 1530006. The component assigned probably needs correcting though.

Comment 9 Mingye Wang 2018-01-01 00:07:47 UTC
From Zamir's output it looks like Fedora is using individual OTF files for some noto fonts, which might add some complexities to simply using the HW variant for monospace as you guys will need to re-introduce it somehow. My personal recommendation here is to use the weight-grouped OTC files so you get to reuse glyphs across families: the HW version differs from the normal ones generally only in Latin glyphs; the CN/TW/KR/JP difference isn't that grave in terms of space either. 

Recent versions of fontconfig seem to handle the "SuperOTC" giants pretty well too, so you may look into that as long as you aren't something old like Debian. There's even a Noto/Source combination experiment made by Ken Lunde: since Noto CJK is just another way you spell Source Han, why not put them in the same file? [1]

[1]: https://github.com/adobe-fonts/source-han-super-otc

Comment 10 Mingye Wang 2018-01-01 00:20:22 UTC
I am suggesting the grouped-by-weights TTCs largely because you don't get a lot to reuse grouping stuff across weights, at least not before those variable font interpolation magic becomes common. Among different variants of the same weight the savings become significant though: all eight of NotoSans{,Mono}CJK{sc,tc,kr,jp}-Regular.otf are around 15.7 MB each, but grouped up into NotoSansCJK-Regular.ttc they only take up 17.9 MB of space. The SuperOTC (NotoSansCJK.ttc.zip) takes up 110 MB inflated, roughly the sum of all the weight-grouped TTCs.

Comment 11 Alick Zhao 2018-01-01 03:11:39 UTC
Seems regional subset OTF files are recommended by the upstream for Linux:

https://raw.githubusercontent.com/adobe-fonts/source-han-sans/release/SourceHanSansReadMe.pdf

Comment 12 Mingye Wang 2018-01-01 03:31:38 UTC
It's recommended because many Linux (TM) deployments run on Debian Stale (R)-level software, and they are super conservative about this thing. This document is outdated anyway: it states that only Mac folks should use ye mighty SuperOTC, but current Linux and Windows infrastructure supports that as well. And since it even recommends against using OTC in Windows except for specific design applications, it's obvious that the document isn't written with disk space usage in mind and is too conservative w/r/t the ghost of compatibility. OTC/TTC isn't even a new thing! Just test it out and decide for yourselves.

Comment 13 Mingye Wang 2018-01-01 03:38:01 UTC
Sorry for misreading the Windows flowchart. Seems like they don't even want to acknowledge that OTC/TTC is good at all on Windows either. The "subset... InDesign" part likely arises from the following assumptions:

* Only InDesign users care about characters beyonds a "language charset"
* Only InDesign uses the LOCL feature.

The latter is blatantly false as your everyday browser, via Pango, does LOCL a lot. Space wasted by the former isn't a factor when you are getting fonts in OTC/TTC bundles and saving all the space.

Comment 14 Cheng-Chia Tseng 2018-01-02 05:49:06 UTC
Super OTC file is supported by any flavor of Linux that uses fontconfig and FreeType Version 2.5.0.1 or greater. 

See: https://github.com/adobe-fonts/source-han-serif/tree/release

I suggest switching to Super OTC file to save the space and take the advantages of locl Opentype features. That is useful in LibreOffice, you can change the "Language" of the text to change to glyphs to fit different regions (CN, TW, JP or KR).

Comment 15 Peng Wu 2018-01-02 06:02:53 UTC
We already have Sans and Serif fonts for Chinese users since Fedora 27.
And we work around the monospace issue.

See: https://fedoraproject.org/wiki/Changes/ChineseSerifFonts

Please check whether adobe-source-han-serif-*-fonts are installed on Fedora 27.

Comment 16 Mingye Wang 2018-01-03 04:13:43 UTC
Wu's Changes page does seem to get that properly. Zamir, can you... check for source han serif on your Fedora instance? Please also do a fc-match without the grep part (make it an attachment) so we can see how high up those unwanted stuff are.

Regarding the "monospace" part, yes you can do a workaround by just putting a western monospace before a normal CJK font and let fontconfig's fallback chain do the job. This holds up well-ish as long as users are happy to use the blob called "monospace". Unfortunately it also requires applications to adjust character spacing for something wcwidth() compliant, as most western monospace fonts have characters wider than 1/2 em; good terminal emulators do that, but few browsers support this feature. Why not go for an OTC bundle and settle it once and for all? You will end up saving space that way.

Comment 17 Zamir SUN 2018-01-03 10:49:49 UTC
Created attachment 1376263 [details]
fc-match on Fedora 27 Workstation Live DVD

Oh sorry, I believe I posted some misleading information in comment 6. In comment 6 I was testing on my own laptop which is a XFCE spin and was upgraded from older releases. It result in that I don't have serif font installed. This is absolutely another issue. I will file a bug for the issue found on my own laptop once I figured out a clear and easy way to reproduce.

I just tested on a Fedora 27 Workstation Live DVD, the output is attached. From the output you can see sans and serif both matches with right fonts.

Comment 18 Cheng-Chia Tseng 2018-01-04 15:09:27 UTC
(In reply to Mingye Wang from comment #16)
> Wu's Changes page does seem to get that properly. Zamir, can you... check
> for source han serif on your Fedora instance? Please also do a fc-match
> without the grep part (make it an attachment) so we can see how high up
> those unwanted stuff are.
> 
> Regarding the "monospace" part, yes you can do a workaround by just putting
> a western monospace before a normal CJK font and let fontconfig's fallback
> chain do the job. This holds up well-ish as long as users are happy to use
> the blob called "monospace". Unfortunately it also requires applications to
> adjust character spacing for something wcwidth() compliant, as most western
> monospace fonts have characters wider than 1/2 em; good terminal emulators
> do that, but few browsers support this feature. Why not go for an OTC bundle
> and settle it once and for all? You will end up saving space that way.

Agree with the monospace problem of the binding workaround. In a Chinese monospace font, the width of the Latin part character should be HALF of the Chinese part. This is used a lot in terminal or BBS system (still popular and widely used in Taiwan) context.

Technically, it would be better to use 思源黑體 HW as the Chinese monospace font directly instead of binding to DejaVu Sans Mono for REAL monospace usage. Or it can be done to tweak the width of the DejaVu Mono to be half size of the Chinese characters of 思源黑體.

Comment 19 Peng Wu 2018-01-05 07:06:57 UTC
Created attachment 1377329 [details]
screenshot of Source Han Sans HW SC for gnome-terminal

Comment 20 Peng Wu 2018-01-05 07:08:12 UTC
This is the screenshot of Source Han Sans HW SC for gnome-terminal.

Please notice that the height of glyph is much higher than Fedora 27 default.
Because of the glyph height it can't be used directly.

If you think the current Fedora 27 Chinese fonts has problem,
please provides screenshots of the problem.

Actually the monospace workaround is common for Chinese fonts.

Comment 21 Peng Wu 2018-01-05 07:19:06 UTC
I think we had better to follow upstream suggestions to use the OTF fonts.

Comment 22 Mingye Wang 2018-01-07 02:27:29 UTC
Created attachment 1378021 [details]
Gedit on Fedora 27 Workstation (Live).

Yep, the monospace thing isn't working in gedit and things dumber than terminals. gnome-terminal is doing an excellent job spacing it up though.

Comment 23 Mingye Wang 2018-01-07 02:40:07 UTC
The upstream suggestion not only does not specifically mention Linux ("other"), but also seriously underestimates Windows's ability to handle OTC. Windows uses OTC to handle Microsoft YaHei{,' UI'} and {N,}SimSun -- two famous defaults for zh_CN Windows. Likewise, Fedora has repeatly seen application of OTC before: remember how those AR PL cjkuni things are released in ttc files and you just packaged them as ttc?

> height

gnome-terminal might be using the first priority font to determinal line height, thus masking the problem when Source Han Sans CN is a secondary font for supplementing DJV Sans Mono. Aesthetically I don't consider that spacing too crazy, but you may as well take a try with the "Noto Sans CJK Mono" thing since Noto's supposed to be height-compatible with other Noto stuff. (What compatibility does not following DJV height break, may I ask?)

Comment 24 Peng Wu 2018-01-08 04:37:45 UTC
Noto Sans Mono CJK SC has the same problem as the screenshot of Source Han Sans HW SC for gnome-terminal.

Because of the glyph height it can't be used directly.

Comment 25 Mingye Wang 2018-01-08 05:26:01 UTC
I still do not get your "height" argument. Are you referring to the absolute line height (glyph + 1x vertical spacing), which is significantly taller for Adobe and Noto stuff, or the ratio of the glyph, which is also significantly slimer for some proper monospacing? Assuming you are referring to the former, what aspect of Fedora does changing the font, and consequently the height of a line, break? Some terminal OCR program tuned to DJV Sans Mono? Window size for a 80x24 display?

* * *

Moving into not-very-official-but-this-is-what-foss-is-about territory, We can theoretically ditch the whole Noto/Source Han Sans thing altogether for a derivative Sans+Mono typeface called Sarasa Gothic[1]. It differs from the original Source Han Sans{,\ HW} pair in those ways:

* It uses TrueType instead of OpenType for bytecode hinting.
* It uses Iosevka (SIL OFL) for monospace.[2]
* Iosevka makes 0 and O look different.
* Iosevka has a more normal line height.
* It's licensed under 3-clause BSD, which is inconvenient. This may be circumvented with its pre-rename release.[3]
* Like the mono fonts in those pairs, all of Sarasa only has 2 weights. You will still need parts of Source Han Sans and a ton of fontconfig job if you want the other weights.

So yeah, that's a mess. Iosevka[2] without the CJK part might be interesting for the current fontconfig job, but its line height probably isn't gonna be the same as DJV either. There's also a "CC" variant of Iosevka if you want to see all those box-drawing characters in 2 columns like all the old CJK monospaces; it breaks normal boxes like those old things do too.

  [1]: https://github.com/be5invis/Sarasa-Gothic
  [2]: https://github.com/be5invis/Iosevka
  [3]: https://be5invis.github.io/Iosevka/inziu.html

Comment 26 Peng Wu 2018-01-09 08:03:15 UTC
Created attachment 1378913 [details]
screenshot of Fedora 27 for gnome-terminal with default fonts

Comment 27 Peng Wu 2018-01-09 08:05:03 UTC
Please notice the height of gnome-terminal is better than the previous screenshot.

I think gedit is not required to align the glyph.

Comment 28 Mingye Wang 2018-01-10 15:03:08 UTC
> better

What I have been asking you is: why does it need to be like that? Since you have given me a more subjective/aesthetic judgement I happen to agree to, well I guess you should really try the Iosevka thing I just linked to now.

> not required

Although terminals are arguably more sensitive to column issues -- so much that, like I said, terminal emulators adjust spacing to fit -- many use cases of gedit, or say firefox, still require column alignment. For gedit there's reading ASCII/SHIFT-JIS "art" in code comments or just as art; for firefox there are tons of "modern" javascript virtual terminals that rely on firefox and the user to pick a font that works. (Why don't we combine these two cases while we are at it? Pick one modern code editor written in JavaScript...)

Comment 29 Alick Zhao 2018-01-15 04:12:10 UTC
About the height issue with HW font, what are the side effects except aesthetic looking? Does it make a 80x24 terminal have less lines?

Comment 30 Mingye Wang 2018-01-15 05:03:04 UTC
80*24 is always going to be 24 lines by definition. But 640px*480px terminal will have less lines because lines are pixel-taller.

Comment 31 Alick Zhao 2018-01-15 18:49:40 UTC
By "80x24" terminal, I meant opening a terminal window in its default setting, not maximized etc. The window size in pixel is irrelevant, since changing font size will change the window size accordingly.

I want to know that whether the direct use of HW font will cause the default terminal window to have less than 24 lines, which I cannot test easily myself. If it does, then it "breaks" things IMO. Otherwise, I don't think the height issue is a real issue (although it can annoy some people).

Even if HW font does not break things, I do NOT think HW font is the right (or better say perfect) font  for monospace, since it does not differentiate O and 0 clearly and is thus terrible for coding and terminal work.

I think now this issue boils down to set a sane default for monospace Chinese font. I'll try to list the candidates and their pros/cons in the following comment.

Comment 32 Alick Zhao 2018-01-15 19:50:31 UTC
Now that we have sans and serif Chinese fonts by default on Fedora 27+, I think this bug boils down to set a sane default for monospace Chinese font. I'll try to list the candidates and their pros/cons here. Please correct me if I am wrong and provide additional information if any.

1. Dejavu Sans Mono + Source Han Sans CN/TW. I guess this is the current approach Peng Wu follows.

Pros: status quo. Works well in terminal.
Cons: The character width of Dejavu is larger than 1/2 em. fontconfig magic not well supported by web browsers, gedit, BBS app etc.

Question: Is it difficult to tweak the width of the DejaVu Mono to be half size of the Chinese characters of 思源黑體? Anyone has done that or is willing to do that?

2. Noto Sans Mono CJK SC or Source Han Sans HW SC

Pros: single font consistent with Source Han Sans/Serif parts.
Cons: no clear difference between O and 0. Not good for coding/terminal.

3. Sarasa Gothic (based on Iosevka and Source Han Sans)

Pros: designed for programming.
Cons: ?

Comments: I don't think BSD-3 license is an issue. Not sure if bundling Chinese and Japanese together is an issue or not.

4. Droid Sans Mono Dotted/Slashed/Py

Pros: It should work. Updated version has GB18030 coverage.
Cons: inactive. Superseded by Noto Sans CJK. Need third-party modification for dotted/slashed zero or Python underscore.

Comment 33 Alick Zhao 2018-01-15 20:28:15 UTC
Unfortunately, in my simple test, even Sarasa Gothic (esp. Sarasa Mono SC) do not have the property that Latin characters should be half width of Chinese characters in Gedit.

Comment 34 Mingye Wang 2018-01-16 06:22:15 UTC
Pretty strange for Sarasa Mono/Iosevka to fail on that mark. Rigging up the hypervisor for a quick test…

* * *

I don't quite think bundling multiple languages together is an issue (look at all those Latin stuff), and I think doing so should be preferred if it saves space. By the way, Sarasa should include Korean like Noto does too -- look for K or CL.

(In principle I would prefer to have the monospace font to be bundle-able with the current Sans, but yeah HW is not good enough, and Sarasa doesn't have those cool weights. Nah.)

Comment 35 Mingye Wang 2018-01-16 06:37:39 UTC
Created attachment 1381833 [details]
Sarasa Mono SC Regular in gedit

Alick, I am pretty sure Sarasa Mono is satifying the 2-col East Asian Width requirement in gedit here.

Comment 36 Mingye Wang 2018-01-16 06:44:24 UTC
Tested the 80x24 problem too. gnome-terminal sets initial window size by column/row count as opposed to pixel size in its profiles, so it's always safe to change fonts.

Comment 37 Mingye Wang 2018-01-16 07:01:45 UTC
(In reply to Mingye Wang from comment #35)
> satifying the 2-col East Asian Width

So Alick, could you please provide your test input to gedit for your test? If you are using some legacy SHIFT-JIS/Telnet BBS/whatever art for testing, it might as well be some East Asian Width ambiguous[1] characters messing up the width here. That's also why Iosevka (the Latin-only thing) provides a CC variant. (Use Iosevka "pack"-s to test all the variants.[2])

  [1]: https://unicode.org/reports/tr11/#Ambiguous
  [2]: https://github.com/be5invis/Iosevka/releases/download/v1.13.4/iosevka-pack-1.13.4.zip

The story basically goes like this: Legacy CJK monospace fonts implemented a bunch of characters not specific to CJK as East Asian Width (2-col), creating a departure from more typical monospace fonts. Affected characters include box-drawing stuff I mentioned before, a lot of geometric things, and a lot of math symbols like the RATIO (U+2236) sign[^1].

  [^1]: This is also why GNOME's time display looks strange in Source Han Sans. See GNOME #792488.

I personally am not a fan of breaking box-drawing characters, so we probably should still default to a "normal" monospace for everyday terminal use. BBS users can use a "CC" (CJK compatibility) variant shipped together, probably in a TTC.

PS: I wonder how glibc's wcwidth() handles those ambiguous characters. I vaguely recall seeing something about it in the locale file for zh-*...

Comment 38 Alick Zhao 2018-01-16 17:12:47 UTC
Created attachment 1382095 [details]
gedit with Sarasa Mono misalignment when font size is 16

Please see the attached screenshot. Notice the misalignment of the last two lines.

Interestingly, the misalignment does not occur when the font size is set to be 12 or 24.

Comment 39 Mingye Wang 2018-01-17 07:54:42 UTC
Comment on attachment 1382095 [details]
gedit with Sarasa Mono misalignment when font size is 16

Hmm, we might be looking at a Pango/Cairo/FreeType/something bug about advancing the cursor here. Sarasa Mono internally has the correct advance widths with latin=500 UPM and CJK=1000 UPM, so I am going to blame this bug on integer rounding or some similar magic.

Alick, how wide in pixels are each character at 16 point size?

Comment 40 Mingye Wang 2018-01-17 08:12:54 UTC
> 500/1000 UPM

Apologies for the confusion here. UPM refers to "units per em", a constant that sets the size of an outline font's internal unit; Sarasa uses 1000. Should've said "units" here.

> how wide at 16pt
> 24 and 12 are ok

I might actually have a theory about how did rounding error happened. By definition 1 pt = 1/72 in, and assuming a display of 96 pixel/in (GNOME always uses integer factors of this value), 1 em/1000 units would end up having the following values:

12pt: (12/72) * 96 * (1000/1000) = 16
16pt: (16/72) * 96 * (1000/1000) = 21.3333
24pt: (24/72) * 96 * (1000/1000) = 32

On the other hand, 0.5 em/500 units would be:

12pt: (12/72) * 96 * (500/1000) = 8
16pt: (16/72) * 96 * (500/1000) = 10.6667
24pt: (24/72) * 96 * (500/1000) = 16

Assuming a round-to-nearest "int" conversion for those sizes, at 16pt latin and CJK would be 11 and 21 pixels wide respectively. Hence the misalignment. Alick, could you please screenshot the text below in gedit to validate my guess?

|abcdefghijklmnopqrstu|
|上面二十一字下面十一字|

Comment 41 Mingye Wang 2018-01-17 08:20:34 UTC
Measured the width of a "0O" and a "中文" from Alick's screenshot, and it turns out to be 22 and 42 pixels respectively. So the guess is... correct. Now it's time to either find something to blame on or accept it as a truth of integer-land.

Comment 42 Alick Zhao 2018-01-18 03:24:19 UTC
(In reply to Mingye Wang from comment #40)
> > 500/1000 UPM
> 
> Apologies for the confusion here. UPM refers to "units per em", a constant
> that sets the size of an outline font's internal unit; Sarasa uses 1000.
> Should've said "units" here.
> 
> > how wide at 16pt
> > 24 and 12 are ok
> 
> I might actually have a theory about how did rounding error happened. By
> definition 1 pt = 1/72 in, and assuming a display of 96 pixel/in (GNOME
> always uses integer factors of this value), 1 em/1000 units would end up
> having the following values:
> 
> 12pt: (12/72) * 96 * (1000/1000) = 16
> 16pt: (16/72) * 96 * (1000/1000) = 21.3333
> 24pt: (24/72) * 96 * (1000/1000) = 32
> 
> On the other hand, 0.5 em/500 units would be:
> 
> 12pt: (12/72) * 96 * (500/1000) = 8
> 16pt: (16/72) * 96 * (500/1000) = 10.6667
> 24pt: (24/72) * 96 * (500/1000) = 16
> 
> Assuming a round-to-nearest "int" conversion for those sizes, at 16pt latin
> and CJK would be 11 and 21 pixels wide respectively. Hence the misalignment.
> Alick, could you please screenshot the text below in gedit to validate my
> guess?
> 
> |abcdefghijklmnopqrstu|
> |上面二十一字下面十一字|

Yes, these two lines align well. Your theory should be right. I won't screenshot it since I don't think it is necessary any more.

I cannot see a real fix for this, except switching to a HiDPI display... Until then, stick to the magic font size 12.

Comment 43 Alick Zhao 2018-01-18 03:33:23 UTC
I see that Noto Sans Mono CJK is proposed as the default monospace Chinese font for Fedora 28: https://fedoraproject.org/wiki/Changes/ChineseDefaultFontsToNoto

Any chance we opt for Sarasa Mono or other alternative instead? Or somehow push Noto Sans Mono CJK to update their '0' character etc?

Comment 44 Mingye Wang 2018-01-18 04:04:52 UTC
> except switching to a HiDPI display

Well, on GNOME you generally get integer scaling factors. At 192dpi (2x) you get 42.6667 (43) and 21.3333 (21) px, which is still broken. 3x fixes this very specific case, but I'm pretty sure we can still find ways to break it.

> Or somehow push Noto Sans Mono CJK to update their '0' character etc?

Noto's Mono is functionally the same as Source Han Sans's, and it's managed by the same folks who made Source Han Sans HW. I don't see such pushes as realistic as Noto/SHS's CJK head wants to maintain look-compatibility with old CJK monospace fonts, which we all know are crappy for coding due to precisely this reason (among others). We should be still able to do some 3rd-party modifications like on Droid Sans Mono, preserving space advantages from OTCs, however.

Personally I have a strong preference for anything Iosevka, but since I enjoy OTCs and "cool font weights" a lot too, the modification-free way to do it while saving space would be binding two fonts together in fontconfig as "monospace", negating my request for an actual font entry. Yes, we can take the third-party modification step even further and replace ALL of Noto Sans Mono CJK's latin with Iosevka while also fixing the way-too-tall line height to Sarasa Mono's values, but that sounds like way too huge a project to take on.

PS. Sarasa Gothic/Mono does not have many weights because it started off as a project to add good TrueType hinting instructions to Source Han Sans, and the author only had time to tweak parameters for two weights. Those bytecode instructions work well for low-dpi screens and wine in particular; they can be enabled with hintmedium/hintfull. (Keep those customizations to your own box though.)

Comment 45 Mingye Wang 2018-01-18 04:09:11 UTC
Oh, things look pretty hopeful on the Noto (Non-CJK) side! It's a good thing that they aren't limiting the filed issue to CJK, like with https://github.com/googlei18n/noto-cjk/issues/116 and https://github.com/adobe-fonts/source-han-sans/issues/175. I wonder whether the changes will come to CJK too.

Comment 46 Cheng-Chia Tseng 2018-01-21 16:42:25 UTC
I noticed that the default fonts of Chinese are going to switch to Noto CJK fonts.

https://fedoraproject.org/wiki/Changes/ChineseDefaultFontsToNoto

Although it is fine to switch to Noto CJK fonts that they share the same glyph design with Source Han fonts, I think it is still better to use Source Han fonts for localized font family names, such as "思源黑体", "思源黑體", or "思源宋體", "思源宋体."

Users who can read Chinese can choose the fonts more easily in the font selection list. Noto CJK fonts only display the English names, which is not quite friendly for normal users.

Comment 47 Peng Wu 2018-03-14 03:15:38 UTC
Thanks for the comments!

We decide to change all default CJK fonts to Google Noto CJK fonts for Fedora 28.