Bug 476493

Summary: libid3tag doesn't add terminators to strings, and doesn't conform to the ID3 specifications.
Product: [Fedora] Fedora Reporter: kohen.d
Component: libid3tagAssignee: Todd Zullinger <tmz>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 9CC: kohen.d, tmz
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-07-14 18:24:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
A better patch that adds the terminator only on encoded strings. none

Description kohen.d 2008-12-15 09:09:54 UTC
When writing a tag with string, the library doesn't add the NULL terminator to the string if it's the last field of a frame, although all text fields are supposed to be null-terminated.

Version-Release number of selected component: 0.15.1b-6.fc9 (also in the original tarball)


How reproducible: Write a tag (tested with USLT) and see that the string isn't null terminated in a hex-editor.

 
Actual results: Strings end and than there is the next frame identifier.


Expected results: There should be a null character after the string in addition to the frame length indicator. 


Additional info: http://www.id3.org/id3v2.4.0-structure

I made a patch that fixes it and tested it on my machine, it's attached. (AMD64 machine)
This is a bug affecting the original package, but it seems that it is not maintained any more.

Comment 1 Todd Zullinger 2008-12-19 04:18:30 UTC
I'm not sure that libid3tag is doing anything incorrectly here.  (I'm also not sure that I'm not mistaken. ;)

As far as I understand the ID3 spec, the terminator is part of the "Content Descriptor" - not part of the text string.  Looking at the USLT frame description at http://www.id3.org/id3v2.4.0-frames:

  4.8.   Unsynchronised lyrics/text transcription
  
     This frame contains the lyrics of the song or a text transcription of
     other vocal activities. The head includes an encoding descriptor and
     a content descriptor. The body consists of the actual text. The
     'Content descriptor' is a terminated string. If no descriptor is
     entered, 'Content descriptor' is $00 (00) only. Newline characters
     are allowed in the text. There may be more than one 'Unsynchronised
     lyrics/text transcription' frame in each tag, but only one with the
     same language and content descriptor.
  
       <Header for 'Unsynchronised lyrics/text transcription', ID: "USLT">
       Text encoding        $xx
       Language             $xx xx xx
       Content descriptor   <text string according to encoding> $00 (00)
       Lyrics/text          <full text string according to encoding>

I've edited a few files using gtkpod to write out the tags with libid3tag and I seem to find the correct terminator in the Content descriptor.

With your patch, there is a null terminator added at the end of the text string, which I don't think it correct.  Feel free to show me how I am wrong. :)

Comment 2 kohen.d 2008-12-19 13:15:17 UTC
As I said in the gtkpod-devel mailing list this is fromhttp://www.id3.org/id3v2.4.0-structure:
<paste>
4.   ID3v2 frame overview
.
.
.
Frames that allow different types of text encoding contains a text
   encoding description byte. Possible encodings:

     $00   ISO-8859-1 [ISO-8859-1]. Terminated with $00.
     $01   UTF-16 [UTF-16] encoded Unicode [UNICODE] with BOM. All
           strings in the same frame SHALL have the same byteorder.
           Terminated with $00 00.
     $02   UTF-16BE [UTF-16] encoded Unicode [UNICODE] without BOM.
           Terminated with $00 00.
     $03   UTF-8 [UTF-8] encoded Unicode [UNICODE]. Terminated with $00.

   Strings dependent on encoding are represented in frame descriptions
   as <text string according to encoding>, or <full text string
   according to encoding> if newlines are allowed. Any empty strings of
   type $01 which are NULL-terminated may have the Unicode BOM followed
   by a Unicode NULL ($FF FE 00 00 or $FE FF 00 00).
<end>
The strings should be null terminated, if there is a encoding option for the frame.
I assumed that all text frames have an encoding field but it turns out that the LINK frame does not. The patch should be modified to test whether the frame has an encoding field and if so, add a terminator.
In addition, it turns out that libid3 adds a terminator and the iPod lyrics option reads until it reaches a null terminator

Comment 3 Todd Zullinger 2008-12-19 14:04:11 UTC
(In reply to comment #2)
> As I said in the gtkpod-devel mailing list this is
> fromhttp://www.id3.org/id3v2.4.0-structure:
> <paste>
> 4.   ID3v2 frame overview
> .
> .
> .
> Frames that allow different types of text encoding contains a text
>    encoding description byte. Possible encodings:
> 
>      $00   ISO-8859-1 [ISO-8859-1]. Terminated with $00.
>      $01   UTF-16 [UTF-16] encoded Unicode [UNICODE] with BOM. All
>            strings in the same frame SHALL have the same byteorder.
>            Terminated with $00 00.
>      $02   UTF-16BE [UTF-16] encoded Unicode [UNICODE] without BOM.
>            Terminated with $00 00.
>      $03   UTF-8 [UTF-8] encoded Unicode [UNICODE]. Terminated with $00.
> 
>    Strings dependent on encoding are represented in frame descriptions
>    as <text string according to encoding>, or <full text string
>    according to encoding> if newlines are allowed. Any empty strings of
>    type $01 which are NULL-terminated may have the Unicode BOM followed
>    by a Unicode NULL ($FF FE 00 00 or $FE FF 00 00).
> <end>
> The strings should be null terminated, if there is a encoding option for the
> frame.

Understood.  What I believe the spec refers to here with "encoding description byte" is part of what the USLT frame description I quoted refers to as the "Content descriptor."  The part that is terminated is the text string that describes the language of the frame.  This is exactly what libid3tag currently does, I think that's correct.  (It's also what eyeD3 does, FWIW.)

(This discussion illustrates why so many folks hate the ID3 spec.  It's far more complicated than it needs to be. ;)

> the iPod lyrics option reads until it reaches a null terminator

This does not match my experience.  I have lyrics on many of the tracks on my iPod and none of them have a null terminator at the end of the lyrics (based on the few that I poked and knowing that eyeD3 does not write any such terminator).  The iPod handles them just fine.

Comment 4 kohen.d 2008-12-19 15:29:06 UTC
Created attachment 327452 [details]
A better patch that adds the terminator only on encoded strings.

Comment 5 kohen.d 2008-12-19 15:30:13 UTC
After about 10 minutes of googling, I saw that eye3D uses id3lib, and I tested the same lib using Amarok's write_to_id3 plugin before I made this patch.
I tested it with a hex editor and saw the null terminator with my own eyes, and only with it i place did my iPod (Classic 80GB) ended the lyrics without a frame header visible in the lyrics page, but since the specifications are horrible, I wanted to see code with my own eyes... (How much I love the GPL!!!) 
So here is the code from id3lib that writes strings and text fields:
<code begin>
size_t io::writeString(ID3_Writer& writer, String data)
{
  size_t size = writeText(writer, data);
  writer.writeChar('\0');
  return size + 1;
}

size_t io::writeText(ID3_Writer& writer, String data)
{
  ID3_Writer::pos_type beg = writer.getCur();
  writer.writeChars(data.data(), data.size());
  return writer.getCur() - beg;
}

size_t io::writeUnicodeString(ID3_Writer& writer, String data, bool bom)
{
  size_t size = writeUnicodeText(writer, data, bom);
  unicode_t null = NULL_UNICODE;
  writer.writeChars((const unsigned char*) &null, 2);
  return size + 2;
}

size_t io::writeUnicodeText(ID3_Writer& writer, String data, bool bom)
{
  ID3_Writer::pos_type beg = writer.getCur();
  size_t size = (data.size() / 2) * 2;
  if (size == 0)
  {
    return 0;
  }
  if (bom)
  {
    // Write the BOM: 0xFEFF
    unicode_t BOM = 0xFEFF;
    writer.writeChars((const unsigned char*) &BOM, 2);
    for (size_t i = 0; i < size; i += 2)
    {
      unicode_t ch = (data[i] << 8) | data[i+1];
      writer.writeChars((const unsigned char*) &ch, 2);
    }
  }
  return writer.getCur() - beg;
}
<code end>
And since string size starts from one, it does write a null terminator, every time, like my previous patch.
I've attached a better patch that adds the null terminator only to frames with an encoding field, as per the specs.

Comment 6 Bug Zapper 2009-06-10 03:25:37 UTC
This message is a reminder that Fedora 9 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 9.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '9'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 9's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 9 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 7 Bug Zapper 2009-07-14 18:24:10 UTC
Fedora 9 changed to end-of-life (EOL) status on 2009-07-10. Fedora 9 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.