Bug 187649 - oggenc and vorbiscomment screw up UTF-8 characters
oggenc and vorbiscomment screw up UTF-8 characters
Status: CLOSED DUPLICATE of bug 98816
Product: Fedora
Classification: Fedora
Component: vorbis-tools (Show other bugs)
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Monty
Depends On:
  Show dependency treegraph
Reported: 2006-04-02 06:02 EDT by Sjoerd Mullender
Modified: 2013-10-20 18:41 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2006-10-29 08:11:15 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Sjoerd Mullender 2006-04-02 06:02:44 EDT
Description of problem:

When I try to encode a file to ogg/vorbis using oggenc and specify title/artist
etc. with UTF-8 encoded strings (i.e. non-ASCII characters in the strings), or
when I try to change those attributes with vorbiscomment, they are recorded in
the file as "#" for each byte that is not ASCII.

Version-Release number of selected component (if applicable):
(and everything else that may be relevant, the current up-to-date versions from FC5)

How reproducible:

Steps to Reproduce:
1. oggenc -a Laïs -t Laïs -o file.ogg file.wav (that is La-i diaeresis-s)
2. vorbiscomment file.ogg and strings file.ogg
Actual results:
vorbiscomment reports artist and title as La##s.  Doing strings file.ogg also
shows La##s.

Expected results:
In both cases I expect to see an i-diaeresis instead of ##.

Additional info:
My i18n settings are LANG=en_US.UTF-8 and no environment variables that start
with LC.  The contents of /etc/sysconfig/i18n is
Comment 1 Sjoerd Mullender 2006-04-02 07:35:59 EDT
With files that I encoded under Fedora Core 4, also with UTF-8 in titles and
artist names, I see under FC5 that vorbiscomment lists those characters as a
question mark.  If I look inside the file (with less), I see that the characters
are there.  It seems vorbiscomment translates the UTF-8 characters to latin-1 or
something like that and then can't display them.
Comment 2 Sjoerd Mullender 2006-04-10 16:21:27 EDT
I found out a workaround: setting the environment variable CHARSET=UTF-8 will
cause the tools to Do The Right Thing.

It looks like charset is not set properly in convert_set_charset() in
share/utf8.c.  When running under the debugger, I see that the call to
nl_langinfo (and the if statement that governs the call) is skipped, so
HAVE_LANGINFO_CODESET was not set during compilation.  See the gdb steps here:
(gdb) b convert_set_charset
Breakpoint 1 at 0x804ad57: file utf8.c, line 232.
(gdb) run
Breakpoint 1, convert_set_charset (charset=0x0) at utf8.c:232
232     {
(gdb) p charset
$1 = 0x0
(gdb) n
234       if (!charset)
235         charset = getenv("CHARSET");
242       free(current_charset);
(gdb) p charset
$2 = 0x0

Between lines 235 and 242 above there are the lines
  if (!charset)
    charset = nl_langinfo(CODESET);
which are not executed.
Comment 3 John Thacker 2006-10-29 08:11:15 EST
Duplicate of bug 98816.  See that bug, where I posted the fix.  Apparently
config.h is not being properly included in some source files.  It's been fixed
upstream, but they haven't had a new release yet.

*** This bug has been marked as a duplicate of 98816 ***

Note You need to log in before you can comment on or make changes to this bug.