Description of problem: When I try to encode a file to ogg/vorbis using oggenc and specify title/artist etc. with UTF-8 encoded strings (i.e. non-ASCII characters in the strings), or when I try to change those attributes with vorbiscomment, they are recorded in the file as "#" for each byte that is not ASCII. Version-Release number of selected component (if applicable): vorbis-tools-1.1.1-1.2.1 libvorbis-1.1.2-1.2 (and everything else that may be relevant, the current up-to-date versions from FC5) How reproducible: Always. Steps to Reproduce: 1. oggenc -a Laïs -t Laïs -o file.ogg file.wav (that is La-i diaeresis-s) 2. vorbiscomment file.ogg and strings file.ogg 3. Actual results: vorbiscomment reports artist and title as La##s. Doing strings file.ogg also shows La##s. Expected results: In both cases I expect to see an i-diaeresis instead of ##. Additional info: My i18n settings are LANG=en_US.UTF-8 and no environment variables that start with LC. The contents of /etc/sysconfig/i18n is LANG="en_US.UTF-8" SYSFONT="latarcyrheb-sun16"
With files that I encoded under Fedora Core 4, also with UTF-8 in titles and artist names, I see under FC5 that vorbiscomment lists those characters as a question mark. If I look inside the file (with less), I see that the characters are there. It seems vorbiscomment translates the UTF-8 characters to latin-1 or something like that and then can't display them.
I found out a workaround: setting the environment variable CHARSET=UTF-8 will cause the tools to Do The Right Thing. It looks like charset is not set properly in convert_set_charset() in share/utf8.c. When running under the debugger, I see that the call to nl_langinfo (and the if statement that governs the call) is skipped, so HAVE_LANGINFO_CODESET was not set during compilation. See the gdb steps here: (gdb) b convert_set_charset Breakpoint 1 at 0x804ad57: file utf8.c, line 232. (gdb) run Breakpoint 1, convert_set_charset (charset=0x0) at utf8.c:232 232 { (gdb) p charset $1 = 0x0 (gdb) n 234 if (!charset) (gdb) 235 charset = getenv("CHARSET"); (gdb) 242 free(current_charset); (gdb) p charset $2 = 0x0 (gdb) Between lines 235 and 242 above there are the lines #ifdef HAVE_LANGINFO_CODESET if (!charset) charset = nl_langinfo(CODESET); #endif which are not executed.
Duplicate of bug 98816. See that bug, where I posted the fix. Apparently config.h is not being properly included in some source files. It's been fixed upstream, but they haven't had a new release yet. *** This bug has been marked as a duplicate of 98816 ***