Red Hat Bugzilla – Bug 126265
can't print out UTF-8 page
Last modified: 2007-11-30 17:10:44 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040510
Description of problem:
The PostScript file which was generated from UTF-8 html can't be
printed out correctly. the result of the printing shows some boxes
instead of proper glyphs.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.show UTF-8 html, such as README-ja.html in RHEL3
2.print out or generate to the file, and look at it on ghostscript
Actual Results: some boxes are shown/printed out instead of proper glyphs
Expected Results: proper glyphs are shown/printed out
mozilla has some tables to convert the string. if depending on
ghostscript or PostScript level 3 printers is no problem, mozilla can
uses UTF-8 CMaps directly to print out the UTF-8 strings instead of
complex PostScript code.
It seems it is depending on the rendering font. If you run
LANG=ja_JP.UTF-8 mozilla (which uses the correct font) and print out this:
It shows in ggv correctly.
However if you run LANG=zh_TW.UTF-8 mozilla (which zh_TW font only
contains several glyphs in the page, other uses bitmap font or some
sort) and print out the above url and it only shows the glyphs that
render with zh_TW font.
Looked at the release notes for ja_JP, seems it is not render by ja font.
The behavior is much better in the mozilla test build.
It prints out glyphs correctly.
However when the printing monospace font, the roman glyphs are
overlapping with cjk glyphs.
Created attachment 103854 [details]
glyphs overlapped in the "tree view selection"
Japanese release notes preview well with fc4test1 firefox now at least.
but the ps just renders the Japanese as whiteboxes again...
Still happening with fc4test3.
Ah, as Leon noted in comment 1, it depends on the page in question.
eg http://google.co.jp/ which is utf-8 prints out ok.
Another datapoint: printing out say http://google.co.kr/ in a ja_JP.UTF-8
session or vice versa http://google.co.jp/ in a ko_KR.UTF-8 session
gives whiteboxes in the postscript output. (http://google.co.kr/ prints
fine for ko_KR.UTF-8.)
Oops, our translated release notes and readme's (docbook generated)
are all erroneously tagged with lang="en" attributes (bug 158736)... :-(
Once the tags are removed they print fine.