Bug 87613 - mkisofs cannot produce ISO images with cyrillic characters
Summary: mkisofs cannot produce ISO images with cyrillic characters
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: cdrtools
Version: 8.0
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Harald Hoyer
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-03-30 17:30 UTC by Alexey Neyman
Modified: 2007-04-18 16:52 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-07-28 12:22:34 UTC
Embargoed:


Attachments (Terms of Use)
The patch is attached for convenience. (6.95 KB, patch)
2003-03-30 17:49 UTC, Alexey Neyman
no flags Details | Diff

Description Alexey Neyman 2003-03-30 17:30:49 UTC
Description of problem: 
mkisofs cannot handle UTF-8 local (input) encoding. A transcript of a shell session 
follows: 
 
[root@vagabond mp3]# ls -la "ÐоÑнÑе ÑнайпеÑÑ" 
иÑого 12 
drwxrwxr-x    3 avn      avn          4096 ÐÐ°Ñ 30 19:30 . 
drwxrwxr-x    9 avn      avn          4096 ÐÐ°Ñ 30 21:00 .. 
drwxrwxr-x    2 avn      avn          4096 ÐÐ°Ñ 30 19:33 2002 ЦÑнами 
 
// First try the old-way (that worked on 7.x RedHat Linuxes) 
[root@vagabond mp3]# mkisofs -o q.iso -J -jcharset koi8-r "ÐоÑнÑе ÑнайпеÑÑ" 
[root@vagabond mp3]# mount -t iso9660 -o ro,loop=/dev/loop0 q.iso q 
[root@vagabond mp3]# ls -la q 
иÑого 8 
dr-xr-xr-x    1 root     root         2048 ÐÐ°Ñ 30 19:30 . 
drwxrwxr-x    9 avn      avn          4096 ÐÐ°Ñ 30 21:00 .. 
dr-xr-xr-x    1 root     root         2048 ÐÐ°Ñ 30 19:33 2002 ???????????? 
 
// Try 'default' conversion with 1:1 mapping of local file names 
[root@vagabond mp3]# mkisofs -o q.iso -J -jcharset default "ÐоÑнÑе ÑнайпеÑÑ" 
[root@vagabond mp3]# mount -t iso9660 -o ro,loop=/dev/loop0 q.iso q 
[root@vagabond mp3]# ls -la q 
иÑого 8 
dr-xr-xr-x    1 root     root         2048 ÐÐ°Ñ 30 19:30 . 
drwxrwxr-x    9 avn      avn          4096 ÐÐ°Ñ 30 21:00 .. 
dr-xr-xr-x    1 root     root         2048 ÐÐ°Ñ 30 19:33 2002 Ц??нами 
[root@vagabond mp3]# ls -la q 
q      q.iso 
[root@vagabond mp3]# ls -la q/2002\ Ц�?нами/ 
иÑого 63650 
dr-xr-xr-x    1 root     root         2048 ÐÐ°Ñ 30 19:33 . 
dr-xr-xr-x    1 root     root         2048 ÐÐ°Ñ 30 19:30 .. 
-r-xr-xr-x    1 root     root      4058601 Ðек  9 15:10 01.??а??о??од??.mp3 
-r-xr-xr-x    1 root     root      4872313 Ðек  9 15:10 
02.??а??а??????о??и??е??ки.mp3 
(the rest of output skipped) 
 
// Try to omit the -jcharset at all: 
[root@vagabond mp3]# mkisofs -o q.iso -J "ÐоÑнÑе ÑнайпеÑÑ" 
[root@vagabond mp3]# mount -t iso9660 -o ro,loop=/dev/loop0 q.iso q 
[root@vagabond mp3]# ls -la q 
иÑого 8 
dr-xr-xr-x    1 root     root         2048 ÐÐ°Ñ 30 19:30 . 
drwxrwxr-x    9 avn      avn          4096 ÐÐ°Ñ 30 21:00 .. 
dr-xr-xr-x    1 root     root         2048 ÐÐ°Ñ 30 19:33 2002 Ц?_нами 
[root@vagabond mp3]# ls -la q/2002\ Ц�нами/ 
иÑого 63650 
dr-xr-xr-x    1 root     root         2048 ÐÐ°Ñ 30 19:33 . 
dr-xr-xr-x    1 root     root         2048 ÐÐ°Ñ 30 19:30 .. 
-r-xr-xr-x    1 root     root      4058601 Ðек  9 15:10 01.?_а?_о?_од?_.mp3 
-r-xr-xr-x    1 root     root      4872313 Ðек  9 15:10 
02.?_а?_а?_?_?_о?_и?_е?_ки.mp3 
 
You see, in first way the file names are scrambled beyond measure (those 
characters represented above with question marks and underscores are cyrillic 
characters, in fact). The 2nd and 3rd ways preserve some characters, though some 
characters are scrabled also. 
 
This link 
 
http://mail.nl.linux.org/linux-utf8/2002-03/msg00022.html 
 
has a long discussion considering what is wrong with mkisofs (the message above 
even has a patch, though I didn't try it out. Yet.) In brief, it seems that 
iso9660+joliet system make use of UCS-2 encoding, not of UTF-8. The mkisofs 
itself cannot convert from UTF-8 to UCS-2; the patch above claims adding 
iconv(3) support to mkisofs.

Comment 1 Alexey Neyman 2003-03-30 17:49:19 UTC
Created attachment 90793 [details]
The patch is attached for convenience.

Comment 2 Alexey Neyman 2003-03-30 18:00:42 UTC
Actually, I also tried adding "iocharset=utf8,utf8" to mount options (which is the 
way the CD-ROM is mounted to enable cyrillic characters), the result is the same: 
the cyrillic characters are scrambled. 
 
I tried rebuilding the source RPM for cdrtools with the patch above applied. It 
builds ok, and the resulting mkisofs produces nice ISO images: 
 
[root@vagabond mp3]# ~avn/tmp/mkisofs -o q.iso -jcharset utf-8 "ÐоÑнÑе 
ÑнайпеÑÑ" 
[root@vagabond mp3]# mount -t iso9660 -o 
ro,loop=/dev/loop0,iocharset=utf8,utf8 q.iso q 
[root@vagabond mp3]# ls -la q 
иÑого 8 
dr-xr-xr-x    1 root     root         2048 ÐÐ°Ñ 30 19:30 . 
drwxrwxr-x    9 avn      avn          4096 ÐÐ°Ñ 30 21:58 .. 
dr-xr-xr-x    1 root     root         2048 ÐÐ°Ñ 30 19:33 2002 ЦÑнами 
[root@vagabond mp3]# ls -la q/2002\ ЦÑнами 
иÑого 63650 
dr-xr-xr-x    1 root     root         2048 ÐÐ°Ñ 30 19:33 . 
dr-xr-xr-x    1 root     root         2048 ÐÐ°Ñ 30 19:30 .. 
-r-xr-xr-x    1 root     root      4058601 Ðек  9 15:10 01.ÐаÑоÑодÑ.mp3 
-r-xr-xr-x    1 root     root      4872313 Ðек  9 15:10 02.ÐаÑаÑÑÑоÑиÑеÑки.mp3 
 
Right now I'm updating my mkisofs RPM with this hand-made one ;) 

Comment 3 Harald Hoyer 2003-04-03 07:34:19 UTC
thx


Note You need to log in before you can comment on or make changes to this bug.