98816 – vorbiscomment can't write UTF-8 comments

Bug 98816 - vorbiscomment can't write UTF-8 comments

Summary: vorbiscomment can't write UTF-8 comments

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	vorbis-tools
Sub Component:
Version:	6
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Monty
QA Contact:
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	187649 (view as bug list)
Depends On:
Blocks:	FC6Update 212871
TreeView+	depends on / blocked

Reported:	2003-07-09 07:06 UTC by James Ralston
Modified:	2013-10-20 22:41 UTC (History)
CC List:	3 users (show)
Fixed In Version:	1.1.1-3
Clone Of:
Environment:
Last Closed:	2006-10-31 03:43:04 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
A sample UTF-8 encoded Ogg Vorbis file. (32.34 KB, application/octet-stream) 2003-07-09 07:09 UTC, James Ralston	no flags	Details
Patch which fixes the charset conversion (1.27 KB, patch) 2006-10-28 16:19 UTC, John Thacker	no flags	Details \| Diff
View All

Description James Ralston 2003-07-09 07:06:21 UTC

Description of problem:

According to Xiph.org Foundation:

http://xiph.org/ogg/vorbis/doc/v-comment.html

...a comment vector consists of an ASCII field name, a "=" character,
and "8 bit clean UTF-8 encoded field contents to the end of the field."

But libvorbis completely bombs if the encoded field contents contains
8-bit UTF-8-encoded data.

Version-Release number of selected component (if applicable):

libvorbis-1.0-7

How reproducible:

Create an Ogg Vorbis file with text in a comment field which requires
UTF-8 encoding.  (You can't use vorbiscomment; it replaces all
characters greater than ASCII value 0x7F with "#".)  Try to play back
the Ogg Vorbis file.

Actual results:

You will get an error message similar to "Error opening [file] using
the oggvorbis module.  The file may be corrupted.".

Expected results:

libvorbis should correctly parse and decode the file.

Comment 1 James Ralston 2003-07-09 07:09:34 UTC

Created attachment 92817 [details]
A sample UTF-8 encoded Ogg Vorbis file.

Here's an example.  libvorbis will complain that this Ogg Vorbis file is
corrupted, but as far as I can tell, it is completely valid.

Comment 2 Colin Walters 2004-09-27 19:14:29 UTC

How did you create this file?

Comment 3 James Ralston 2004-10-21 01:25:19 UTC

Emacs, I think.  (It's been a while.)

But from using XMMS to test editing comments, I see now that other
data in the file changes when the comment changes.  (In particular, 4
bytes at offset 80 change.)  I'm guessing that the data in question is
a checksum of some sort that includes the comments in its calculation,
so by hand editing the comments, I corrupted the file.

By using Emacs to look at the file whose comments I edited with XMMS,
I see that the comments are indeed UTF-8 encoded, and libvorbis deals
with it just fine.  Therefore, the only real problem here is that
vorbiscomment isn't UTF-8 aware.

(Which is still a bug, but not a serious one.)

Comment 4 Colin Walters 2004-10-21 17:41:43 UTC

It looks to me like vorbiscomment has a --raw switch for passing UTF-8 directly,
instead of recoding for your locale.  Is your locale not UTF-8?

Comment 5 John Thacker 2006-10-28 15:18:17 UTC

This bug still occurs in FC6.  It seems that all the vorbis-tools "automatic
recoding for your locale" is broken.  Looks like it assumes that en_US means
ISO8859-1 or something.  I'll do some more digging.

$ env | grep LANG
LANG=en_US.UTF-8

$ vorbiscomment 01\ -\ 天使の休息.ogg 
TITLE=?????
ARTIST=????
TRACKNUMBER=1
TRACKTOTAL=4
ALBUM=?????
MUSICBRAINZ_SORTNAME=????

$ vorbiscomment -R 01\ -\ 天使の休息.ogg 
TITLE=天使の休息
ARTIST=奥井雅美
TRACKNUMBER=1
TRACKTOTAL=4
ALBUM=天使の休息
MUSICBRAINZ_SORTNAME=奥井雅美

Comment 6 John Thacker 2006-10-28 15:23:36 UTC

Exporting LC_ALL=en_US.UTF-8 sees no change.

However, this works:
$ export CHARSET=UTF-8
$ vorbiscomment 01\ -\ 天使の休息.ogg 
TITLE=天使の休息
ARTIST=奥井雅美
TRACKNUMBER=1
TRACKTOTAL=4
ALBUM=天使の休息
MUSICBRAINZ_SORTNAME=奥井雅美

Definitely something broken in the automatic charset conversion.  Breaks for
ogg123 as a result too.

Comment 7 John Thacker 2006-10-28 15:50:46 UTC

Something's wrong with their configure and automagic stuff.  If I remove the
#ifdef HAVE_LANGINFO_CODSET lines from share/utf8.c, all works fine.  It looks
as though the configure script is picking it up correctly, but it's not being
compiled properly.

Comment 8 John Thacker 2006-10-28 16:00:17 UTC

Err, HAVE_LANGINFO_CODESET clearly.  But there definitely seems to be some sort
of mistake in the build script.  #define HAVE_LANGINFO_CODESET 1 is getting set
in config.h, but somehow not going through to the rest of the build.

Comment 9 John Thacker 2006-10-28 16:11:29 UTC

Ah, got it.  It's fixed upstream.
Several files, like utf8.c, need to have a

#ifdef HAVE_CONFIG_H
# include <config.h>
#endif

block in them, but don't.  See this link for the changeset:

https://trac.xiph.org/changeset/10080

and this upstream bug:

https://trac.xiph.org/ticket/685

A new version of vorbis-tools hasn't been released for a long time (since these
fixes are a year old.)

Comment 10 John Thacker 2006-10-28 16:19:09 UTC

Created attachment 139640 [details]
Patch which fixes the charset conversion

This fixes the charset conversion and use of iconv() by properly including
config.h in some files that need it.  This is a copy of a changeset which has
been applied upstream.

Comment 11 John Thacker 2006-10-29 13:11:29 UTC

*** Bug 187649 has been marked as a duplicate of this bug. ***

Comment 12 James Ralston 2006-10-29 19:10:32 UTC

I rebuilt vorbis-tools-1.1.1-2.src.rpm from Fedora Development with the patch in
comment 10; I can confirm that the patch fixes the problem:

$ vorbiscomment *01*
TRACKNUMBER=1
ARTIST=Queensrÿche
ALBUM=Operation Mindcrime II
DATE=2006
GENRE=rock
TITLE=Freiheit Ouvertüre
LABEL=Rhino / WEA
LICENSE=all rights reserved
ENCODING=normalize 0.7.7 amplitude=0.30; OggEnc v1.0.2 quality=3

(Thanks for tracking this down, John--I never got back to this issue.)

Comment 14 Fedora Update System 2006-10-30 21:40:16 UTC

vorbis-tools-1.1.1-3.fc6 has been pushed for fc6, which should resolve this issue.  If these problems are still present in this version, then please make note of it in this bug report.

Comment 15 John Thacker 2006-10-31 03:43:04 UTC

Works great, thanks for moving so quickly on this after I posted the patch!
Now if you could only do the same for bug 141592, where there's also a patch. :)

Comment 16 Fedora Update System 2006-11-06 19:53:20 UTC

vorbis-tools-1.1.1-3.fc6 has been pushed for fc6, which should resolve this issue.  If these problems are still present in this version, then please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.