Description of problem:
According to Xiph.org Foundation:
...a comment vector consists of an ASCII field name, a "=" character,
and "8 bit clean UTF-8 encoded field contents to the end of the field."
But libvorbis completely bombs if the encoded field contents contains
8-bit UTF-8-encoded data.
Version-Release number of selected component (if applicable):
Create an Ogg Vorbis file with text in a comment field which requires
UTF-8 encoding. (You can't use vorbiscomment; it replaces all
characters greater than ASCII value 0x7F with "#".) Try to play back
the Ogg Vorbis file.
You will get an error message similar to "Error opening [file] using
the oggvorbis module. The file may be corrupted.".
libvorbis should correctly parse and decode the file.
Created attachment 92817 [details]
A sample UTF-8 encoded Ogg Vorbis file.
Here's an example. libvorbis will complain that this Ogg Vorbis file is
corrupted, but as far as I can tell, it is completely valid.
How did you create this file?
Emacs, I think. (It's been a while.)
But from using XMMS to test editing comments, I see now that other
data in the file changes when the comment changes. (In particular, 4
bytes at offset 80 change.) I'm guessing that the data in question is
a checksum of some sort that includes the comments in its calculation,
so by hand editing the comments, I corrupted the file.
By using Emacs to look at the file whose comments I edited with XMMS,
I see that the comments are indeed UTF-8 encoded, and libvorbis deals
with it just fine. Therefore, the only real problem here is that
vorbiscomment isn't UTF-8 aware.
(Which is still a bug, but not a serious one.)
It looks to me like vorbiscomment has a --raw switch for passing UTF-8 directly,
instead of recoding for your locale. Is your locale not UTF-8?
This bug still occurs in FC6. It seems that all the vorbis-tools "automatic
recoding for your locale" is broken. Looks like it assumes that en_US means
ISO8859-1 or something. I'll do some more digging.
$ env | grep LANG
$ vorbiscomment 01\ -\ 天使の休息.ogg
$ vorbiscomment -R 01\ -\ 天使の休息.ogg
Exporting LC_ALL=en_US.UTF-8 sees no change.
However, this works:
$ export CHARSET=UTF-8
$ vorbiscomment 01\ -\ 天使の休息.ogg
Definitely something broken in the automatic charset conversion. Breaks for
ogg123 as a result too.
Something's wrong with their configure and automagic stuff. If I remove the
#ifdef HAVE_LANGINFO_CODSET lines from share/utf8.c, all works fine. It looks
as though the configure script is picking it up correctly, but it's not being
Err, HAVE_LANGINFO_CODESET clearly. But there definitely seems to be some sort
of mistake in the build script. #define HAVE_LANGINFO_CODESET 1 is getting set
in config.h, but somehow not going through to the rest of the build.
Ah, got it. It's fixed upstream.
Several files, like utf8.c, need to have a
# include <config.h>
block in them, but don't. See this link for the changeset:
and this upstream bug:
A new version of vorbis-tools hasn't been released for a long time (since these
fixes are a year old.)
Created attachment 139640 [details]
Patch which fixes the charset conversion
This fixes the charset conversion and use of iconv() by properly including
config.h in some files that need it. This is a copy of a changeset which has
been applied upstream.
*** Bug 187649 has been marked as a duplicate of this bug. ***
I rebuilt vorbis-tools-1.1.1-2.src.rpm from Fedora Development with the patch in
comment 10; I can confirm that the patch fixes the problem:
$ vorbiscomment *01*
ALBUM=Operation Mindcrime II
LABEL=Rhino / WEA
LICENSE=all rights reserved
ENCODING=normalize 0.7.7 amplitude=0.30; OggEnc v1.0.2 quality=3
(Thanks for tracking this down, John--I never got back to this issue.)
vorbis-tools-1.1.1-3.fc6 has been pushed for fc6, which should resolve this issue. If these problems are still present in this version, then please make note of it in this bug report.
Works great, thanks for moving so quickly on this after I posted the patch!
Now if you could only do the same for bug 141592, where there's also a patch. :)