Bug 98816 - vorbiscomment can't write UTF-8 comments
Summary: vorbiscomment can't write UTF-8 comments
Alias: None
Product: Fedora
Classification: Fedora
Component: vorbis-tools
Version: 6
Hardware: All
OS: Linux
Target Milestone: ---
Assignee: Monty
QA Contact:
: 187649 (view as bug list)
Depends On:
Blocks: FC6Update 212871
TreeView+ depends on / blocked
Reported: 2003-07-09 07:06 UTC by James Ralston
Modified: 2013-10-20 22:41 UTC (History)
3 users (show)

Clone Of:
Last Closed: 2006-10-31 03:43:04 UTC

Attachments (Terms of Use)
A sample UTF-8 encoded Ogg Vorbis file. (32.34 KB, application/octet-stream)
2003-07-09 07:09 UTC, James Ralston
no flags Details
Patch which fixes the charset conversion (1.27 KB, patch)
2006-10-28 16:19 UTC, John Thacker
no flags Details | Diff

Description James Ralston 2003-07-09 07:06:21 UTC
Description of problem:

According to Xiph.org Foundation:


...a comment vector consists of an ASCII field name, a "=" character,
and "8 bit clean UTF-8 encoded field contents to the end of the field."

But libvorbis completely bombs if the encoded field contents contains
8-bit UTF-8-encoded data.

Version-Release number of selected component (if applicable):


How reproducible:

Create an Ogg Vorbis file with text in a comment field which requires
UTF-8 encoding.  (You can't use vorbiscomment; it replaces all
characters greater than ASCII value 0x7F with "#".)  Try to play back
the Ogg Vorbis file.

Actual results:

You will get an error message similar to "Error opening [file] using
the oggvorbis module.  The file may be corrupted.".

Expected results:

libvorbis should correctly parse and decode the file.

Comment 1 James Ralston 2003-07-09 07:09:34 UTC
Created attachment 92817 [details]
A sample UTF-8 encoded Ogg Vorbis file.

Here's an example.  libvorbis will complain that this Ogg Vorbis file is
corrupted, but as far as I can tell, it is completely valid.

Comment 2 Colin Walters 2004-09-27 19:14:29 UTC
How did you create this file?

Comment 3 James Ralston 2004-10-21 01:25:19 UTC
Emacs, I think.  (It's been a while.)

But from using XMMS to test editing comments, I see now that other
data in the file changes when the comment changes.  (In particular, 4
bytes at offset 80 change.)  I'm guessing that the data in question is
a checksum of some sort that includes the comments in its calculation,
so by hand editing the comments, I corrupted the file.

By using Emacs to look at the file whose comments I edited with XMMS,
I see that the comments are indeed UTF-8 encoded, and libvorbis deals
with it just fine.  Therefore, the only real problem here is that
vorbiscomment isn't UTF-8 aware.

(Which is still a bug, but not a serious one.)

Comment 4 Colin Walters 2004-10-21 17:41:43 UTC
It looks to me like vorbiscomment has a --raw switch for passing UTF-8 directly,
instead of recoding for your locale.  Is your locale not UTF-8?

Comment 5 John Thacker 2006-10-28 15:18:17 UTC
This bug still occurs in FC6.  It seems that all the vorbis-tools "automatic
recoding for your locale" is broken.  Looks like it assumes that en_US means
ISO8859-1 or something.  I'll do some more digging.

$ env | grep LANG

$ vorbiscomment 01\ -\ 天使の休息.ogg 

$ vorbiscomment -R 01\ -\ 天使の休息.ogg 

Comment 6 John Thacker 2006-10-28 15:23:36 UTC
Exporting LC_ALL=en_US.UTF-8 sees no change.

However, this works:
$ export CHARSET=UTF-8
$ vorbiscomment 01\ -\ 天使の休息.ogg 

Definitely something broken in the automatic charset conversion.  Breaks for
ogg123 as a result too.

Comment 7 John Thacker 2006-10-28 15:50:46 UTC
Something's wrong with their configure and automagic stuff.  If I remove the
#ifdef HAVE_LANGINFO_CODSET lines from share/utf8.c, all works fine.  It looks
as though the configure script is picking it up correctly, but it's not being
compiled properly.

Comment 8 John Thacker 2006-10-28 16:00:17 UTC
Err, HAVE_LANGINFO_CODESET clearly.  But there definitely seems to be some sort
of mistake in the build script.  #define HAVE_LANGINFO_CODESET 1 is getting set
in config.h, but somehow not going through to the rest of the build.

Comment 9 John Thacker 2006-10-28 16:11:29 UTC
Ah, got it.  It's fixed upstream.
Several files, like utf8.c, need to have a

# include <config.h>

block in them, but don't.  See this link for the changeset:


and this upstream bug:


A new version of vorbis-tools hasn't been released for a long time (since these
fixes are a year old.)

Comment 10 John Thacker 2006-10-28 16:19:09 UTC
Created attachment 139640 [details]
Patch which fixes the charset conversion

This fixes the charset conversion and use of iconv() by properly including
config.h in some files that need it.  This is a copy of a changeset which has
been applied upstream.

Comment 11 John Thacker 2006-10-29 13:11:29 UTC
*** Bug 187649 has been marked as a duplicate of this bug. ***

Comment 12 James Ralston 2006-10-29 19:10:32 UTC
I rebuilt vorbis-tools-1.1.1-2.src.rpm from Fedora Development with the patch in
comment 10; I can confirm that the patch fixes the problem:

$ vorbiscomment *01*
ALBUM=Operation Mindcrime II
TITLE=Freiheit Ouvertüre
LICENSE=all rights reserved
ENCODING=normalize 0.7.7 amplitude=0.30; OggEnc v1.0.2 quality=3

(Thanks for tracking this down, John--I never got back to this issue.)

Comment 14 Fedora Update System 2006-10-30 21:40:16 UTC
vorbis-tools-1.1.1-3.fc6 has been pushed for fc6, which should resolve this issue.  If these problems are still present in this version, then please make note of it in this bug report.

Comment 15 John Thacker 2006-10-31 03:43:04 UTC
Works great, thanks for moving so quickly on this after I posted the patch!
Now if you could only do the same for bug 141592, where there's also a patch. :)

Comment 16 Fedora Update System 2006-11-06 19:53:20 UTC
vorbis-tools-1.1.1-3.fc6 has been pushed for fc6, which should resolve this issue.  If these problems are still present in this version, then please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.