Red Hat Bugzilla – Bug 187649
oggenc and vorbiscomment screw up UTF-8 characters
Last modified: 2013-10-20 18:41:46 EDT
Description of problem:
When I try to encode a file to ogg/vorbis using oggenc and specify title/artist
etc. with UTF-8 encoded strings (i.e. non-ASCII characters in the strings), or
when I try to change those attributes with vorbiscomment, they are recorded in
the file as "#" for each byte that is not ASCII.
Version-Release number of selected component (if applicable):
(and everything else that may be relevant, the current up-to-date versions from FC5)
Steps to Reproduce:
1. oggenc -a LaÃ¯s -t LaÃ¯s -o file.ogg file.wav (that is La-i diaeresis-s)
2. vorbiscomment file.ogg and strings file.ogg
vorbiscomment reports artist and title as La##s. Doing strings file.ogg also
In both cases I expect to see an i-diaeresis instead of ##.
My i18n settings are LANG=en_US.UTF-8 and no environment variables that start
with LC. The contents of /etc/sysconfig/i18n is
With files that I encoded under Fedora Core 4, also with UTF-8 in titles and
artist names, I see under FC5 that vorbiscomment lists those characters as a
question mark. If I look inside the file (with less), I see that the characters
are there. It seems vorbiscomment translates the UTF-8 characters to latin-1 or
something like that and then can't display them.
I found out a workaround: setting the environment variable CHARSET=UTF-8 will
cause the tools to Do The Right Thing.
It looks like charset is not set properly in convert_set_charset() in
share/utf8.c. When running under the debugger, I see that the call to
nl_langinfo (and the if statement that governs the call) is skipped, so
HAVE_LANGINFO_CODESET was not set during compilation. See the gdb steps here:
(gdb) b convert_set_charset
Breakpoint 1 at 0x804ad57: file utf8.c, line 232.
Breakpoint 1, convert_set_charset (charset=0x0) at utf8.c:232
(gdb) p charset
$1 = 0x0
234 if (!charset)
235 charset = getenv("CHARSET");
(gdb) p charset
$2 = 0x0
Between lines 235 and 242 above there are the lines
charset = nl_langinfo(CODESET);
which are not executed.
Duplicate of bug 98816. See that bug, where I posted the fix. Apparently
config.h is not being properly included in some source files. It's been fixed
upstream, but they haven't had a new release yet.
*** This bug has been marked as a duplicate of 98816 ***