From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040510 Description of problem: The PostScript file which was generated from UTF-8 html can't be printed out correctly. the result of the printing shows some boxes instead of proper glyphs. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1.show UTF-8 html, such as README-ja.html in RHEL3 2.print out or generate to the file, and look at it on ghostscript 3. Actual Results: some boxes are shown/printed out instead of proper glyphs Expected Results: proper glyphs are shown/printed out Additional info: mozilla has some tables to convert the string. if depending on ghostscript or PostScript level 3 printers is no problem, mozilla can uses UTF-8 CMaps directly to print out the UTF-8 strings instead of complex PostScript code.
It seems it is depending on the rendering font. If you run LANG=ja_JP.UTF-8 mozilla (which uses the correct font) and print out this: http://www.geocities.jp/yasushi_suzudo/ It shows in ggv correctly. However if you run LANG=zh_TW.UTF-8 mozilla (which zh_TW font only contains several glyphs in the page, other uses bitmap font or some sort) and print out the above url and it only shows the glyphs that render with zh_TW font. Looked at the release notes for ja_JP, seems it is not render by ja font.
The behavior is much better in the mozilla test build. It prints out glyphs correctly. However when the printing monospace font, the roman glyphs are overlapping with cjk glyphs.
Created attachment 103854 [details] glyphs overlapped in the "tree view selection"
Japanese release notes preview well with fc4test1 firefox now at least.
but the ps just renders the Japanese as whiteboxes again...
Still happening with fc4test3.
Ah, as Leon noted in comment 1, it depends on the page in question. eg http://google.co.jp/ which is utf-8 prints out ok.
Another datapoint: printing out say http://google.co.kr/ in a ja_JP.UTF-8 session or vice versa http://google.co.jp/ in a ko_KR.UTF-8 session gives whiteboxes in the postscript output. (http://google.co.kr/ prints fine for ko_KR.UTF-8.)
Oops, our translated release notes and readme's (docbook generated) are all erroneously tagged with lang="en" attributes (bug 158736)... :-( Once the tags are removed they print fine.