Bug 187649 - oggenc and vorbiscomment screw up UTF-8 characters
Summary: oggenc and vorbiscomment screw up UTF-8 characters
Keywords:
Status: CLOSED DUPLICATE of bug 98816
Alias: None
Product: Fedora
Classification: Fedora
Component: vorbis-tools
Version: 5
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Monty
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-04-02 10:02 UTC by Sjoerd Mullender
Modified: 2013-10-20 22:41 UTC (History)
1 user (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2006-10-29 13:11:15 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Sjoerd Mullender 2006-04-02 10:02:44 UTC
Description of problem:

When I try to encode a file to ogg/vorbis using oggenc and specify title/artist
etc. with UTF-8 encoded strings (i.e. non-ASCII characters in the strings), or
when I try to change those attributes with vorbiscomment, they are recorded in
the file as "#" for each byte that is not ASCII.

Version-Release number of selected component (if applicable):
vorbis-tools-1.1.1-1.2.1
libvorbis-1.1.2-1.2
(and everything else that may be relevant, the current up-to-date versions from FC5)

How reproducible:
Always.

Steps to Reproduce:
1. oggenc -a Laïs -t Laïs -o file.ogg file.wav (that is La-i diaeresis-s)
2. vorbiscomment file.ogg and strings file.ogg
3.
  
Actual results:
vorbiscomment reports artist and title as La##s.  Doing strings file.ogg also
shows La##s.


Expected results:
In both cases I expect to see an i-diaeresis instead of ##.

Additional info:
My i18n settings are LANG=en_US.UTF-8 and no environment variables that start
with LC.  The contents of /etc/sysconfig/i18n is
LANG="en_US.UTF-8"
SYSFONT="latarcyrheb-sun16"

Comment 1 Sjoerd Mullender 2006-04-02 11:35:59 UTC
With files that I encoded under Fedora Core 4, also with UTF-8 in titles and
artist names, I see under FC5 that vorbiscomment lists those characters as a
question mark.  If I look inside the file (with less), I see that the characters
are there.  It seems vorbiscomment translates the UTF-8 characters to latin-1 or
something like that and then can't display them.

Comment 2 Sjoerd Mullender 2006-04-10 20:21:27 UTC
I found out a workaround: setting the environment variable CHARSET=UTF-8 will
cause the tools to Do The Right Thing.

It looks like charset is not set properly in convert_set_charset() in
share/utf8.c.  When running under the debugger, I see that the call to
nl_langinfo (and the if statement that governs the call) is skipped, so
HAVE_LANGINFO_CODESET was not set during compilation.  See the gdb steps here:
(gdb) b convert_set_charset
Breakpoint 1 at 0x804ad57: file utf8.c, line 232.
(gdb) run
Breakpoint 1, convert_set_charset (charset=0x0) at utf8.c:232
232     {
(gdb) p charset
$1 = 0x0
(gdb) n
234       if (!charset)
(gdb)
235         charset = getenv("CHARSET");
(gdb)
242       free(current_charset);
(gdb) p charset
$2 = 0x0
(gdb)

Between lines 235 and 242 above there are the lines
#ifdef HAVE_LANGINFO_CODESET
  if (!charset)
    charset = nl_langinfo(CODESET);
#endif
which are not executed.

Comment 3 John Thacker 2006-10-29 13:11:15 UTC
Duplicate of bug 98816.  See that bug, where I posted the fix.  Apparently
config.h is not being properly included in some source files.  It's been fixed
upstream, but they haven't had a new release yet.

*** This bug has been marked as a duplicate of 98816 ***


Note You need to log in before you can comment on or make changes to this bug.