Bug 106479

Summary: mkisofs doesn't encode utf-8 filenames correctly for joliet extension
Product: [Fedora] Fedora Reporter: Jaakko Heinonen <jheinonen>
Component: cdrtoolsAssignee: Harald Hoyer <harald>
Status: CLOSED RAWHIDE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: jshin, mitr
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-01-29 13:54:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jaakko Heinonen 2003-10-07 16:21:42 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030703

Description of problem:
On utf-8 filesystem it's not possible to create iso-images with correctly
encoded Joliet filenames. mkisofs has -input-charset option but utf-8 is not
supported.

How reproducible:
Always

Steps to Reproduce:
Create an iso image with mkisofs -J on filesystem with utf-8 filenames.
(Filenames must have non-ASCII characters.)

Actual Results:  Filenames in joliet extension are incorrectly encoded.

Additional info:

There is a patch available on
http://joerghaeger.de/webCDwriter/mkisofs+UTF-8.html . However this patch causes
mkisofs to work incorrectly on non-utf-8 systems.

Comment 1 Harald Hoyer 2003-10-24 14:28:05 UTC
please report this upstream to the author of cdrtools... I will consider this
patch for Fedora Core 2

Comment 2 Jungshik Shin 2003-10-30 18:05:03 UTC
See a thread of articles 

at http://mail.nl.linux.org/linux-utf8/2002-10/msg00050.html

I loved to see this fixed in the upstream, but the response hasn't been
positive. No response wouldn't be considered as positive, would it? 

Anyway, the patch by Ilya Konstantinov (built upon my patch) works not only for
UTF-8 but also for other character encodings. 

We can further improve it to make it independent of iconv(3) used. Currently, it
assumes that the name of the codeset for UTF-16LE is 'UTF-16LE', but different
iconv(3) implementations use different names for UTF-16LE. To avoid this
problem, we can convert the input charset to UTF-8 first and then use our
own(built-in) UTF-8 -> UTF-16LE conversion routine.

Of course, if we just want to fix this on Linux or where Bruno's libiconv is
used, we don't have to worry about the portability. 

BTW, would anyone give yet another try at persuading the maintainer of
mkisofs(cdrtools) to support multibyte character encodings with iconv? 

Comment 3 Jaakko Heinonen 2003-11-04 07:48:46 UTC
As Jungshik Shin said the upstream author seems not to be interested.
I have ported the patch by Shin & Konstantinov to the latest version
of cdrtools and added automatic detection of UTF-8 encoding. The patch
is made for cdrtools 2.01a19 but applies also to version 2.0. It is
found here:

http://users.utu.fi/jahhein/mkisofs/mkisofs-iconv-2.patch

Known issue is that mkisofs won't print character encodings available
through iconv with "-input-charset help".

Comment 4 Harald Hoyer 2003-11-04 10:53:28 UTC
cool! many thanx!

Comment 5 Jaakko Heinonen 2003-11-13 15:17:07 UTC
The patch was broken. New version is found from:

http://users.utu.fi/jahhein/mkisofs/

I may make more changes later. Latest version will be found in the
directory above.

Comment 6 Jaakko Heinonen 2004-02-03 12:27:40 UTC
Version 9 of the patch which was included to Rawhide had a bug related
to sorting of the Joliet items. (Details:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=230725) Version 10
fixes this bug.