Description of problem: So if you get the file: http://kojipkgs.fedoraproject.org/mash/branched-20121115/18/i386/debug/.repodata/filelists.xml.gz http://kojipkgs.fedoraproject.org/mash/branched-20121116/18/i386/debug/.repodata/filelists.xml.gz http://kojipkgs.fedoraproject.org/mash/branched-kevintest/i386-debug/.repodata/filelists.xml.gz % sha256sum /tmp/filelists.xml.gz d074ecd02835aab0a57cbb3fde077bdfa873d3ee566efb967051b9063c9e371e /tmp/filelists.xml.gz ...then you can do this on a f18/i686 host: xmllint filelists.xml.gz > /dev/null ...and it fails with parse errors similar to what yum gets using the API at: http://kojipkgs.fedoraproject.org/mash/branched-20121115/logs/mash.log http://kojipkgs.fedoraproject.org/mash/branched-20121116/logs/mash.log ...but if you do: gunzip filelists.xml xmllint filelists.xml > /dev/null ...then everything works (as does yum). Even better, if you then do: gzip -9 filelists.xml xmllint filelists.xml.gz > /dev/null ...it will still work! The versions affected seem to be: 2.9.0 2.8.0 ...downgrading to: libxml2-2.7.8-9.fc17 ...will also fix the problem.
This prevents us from doing composes. which means we can not make beta without it fixed.
therefore, +1 blocker.
+1 blocker since it blocks compose
Okay, i can reproduce that with git head on my F17 box, which unfortunately means I don't have a fix yet for this issue <grin/>, Daniel
Seems the problem was added when adding support for lzma compression http://git.gnome.org/browse/libxml2/commit/?id=eae52617790eb77a9109390651ff808f53a4cddb Either I find quickly how it breaks and push a fix for it or the alternative is to disable lzma support temporary until it is found. Daniel
Hum ... on F17 thinkpad:~/XML -> cp filelists.xml.gz.bak filelists.xml.gz thinkpad:~/XML -> xmllint --noout filelists.xml.gz filelists.xml.gz:1224249: parser error : Premature end of data in tag file line 1224249 <file>/usr/src/debug/ ^ filelists.xml.gz:1224249: parser error : Premature end of data in tag package line 1224042 <file>/usr/src/debug/ ^ filelists.xml.gz:1224249: parser error : Premature end of data in tag filelists line 2 <file>/usr/src/debug/ ^ thinkpad:~/XML -> gunzip filelists.xml.gz thinkpad:~/XML -> ls -l filelists.xml -rw-rw-r--. 1 veillard veillard 97018427 Nov 19 20:42 filelists.xml thinkpad:~/XML -> gzip filelists.xml thinkpad:~/XML -> ls -l filelists.xml.gz -rw-rw-r--. 1 veillard veillard 8165215 Nov 19 20:42 filelists.xml.gz thinkpad:~/XML -> xmllint --noout filelists.xml.gz thinkpad:~/XML -> The point is that the decoder added with the new support for lzma in libxml2 complains at the end of decoding the stream, it tries to get a CRC and a LEN and fails to extract those 8 bytes which are used to detect errors. When the compression is done with gzip, those 2 fields are found. Wikipedia does not seem to suggest the footer is optional: http://en.wikipedia.org/wiki/Gzip " an 8-byte footer, containing a CRC-32 checksum and the length of the original uncompressed data" there is something weird there. gunzip doesn't complain on the file though, Daniel
Okay it really seems to be related to the lack of the footer (checksum/len) in the given gzip'ed file. If in the libxml2 module I remove the error in case the crc or len cannot be found the decompression and parsing on top succeed (which implies the stream itself is complete otherwise the XML parser would raise a fatal error): thinkpad:~/XML -> sha256sum filelists.xml.gz d074ecd02835aab0a57cbb3fde077bdfa873d3ee566efb967051b9063c9e371e filelists.xml.gz thinkpad:~/XML -> ./xmllint --noout filelists.xml.gz thinkpad:~/XML -> git diff diff --git a/xzlib.c b/xzlib.c index 928bd17..e08f3fc 100644 --- a/xzlib.c +++ b/xzlib.c @@ -552,17 +552,20 @@ xz_decomp(xz_statep state) #ifdef HAVE_ZLIB_H if (state->how == GZIP) { if (gz_next4(state, &crc) == -1 || gz_next4(state, &len) == -1) { - xz_error(state, LZMA_DATA_ERROR, "unexpected end of file"); - return -1; - } - if (crc != state->zstrm.adler) { - xz_error(state, LZMA_DATA_ERROR, "incorrect data check"); - return -1; - } - if (len != (state->zstrm.total_out & 0xffffffffL)) { - xz_error(state, LZMA_DATA_ERROR, "incorrect length check"); - return -1; - } + /* + xz_error(state, LZMA_DATA_ERROR, "unexpected end of file"); + return -1; + */ + } else { + if (crc != state->zstrm.adler) { + xz_error(state, LZMA_DATA_ERROR, "incorrect data check"); + return -1; + } + if (len != (state->zstrm.total_out & 0xffffffffL)) { + xz_error(state, LZMA_DATA_ERROR, "incorrect length check"); + return -1; + } + } state->strm.avail_in = 0; state->strm.next_in = NULL; state->strm.avail_out = 0; Is the footer somehow 'optional' ? What is actually generating the zipped file list ? I wonder if there isn't a bug at the generation level instead ! Daniel
thinkpad:~/XML -> gzip tst.xml thinkpad:~/XML -> ls -l tst.xml.gz -rw-rw-r--. 1 veillard veillard 36 Nov 19 21:50 tst.xml.gz thinkpad:~/XML -> dd if=tst.xml.gz of=tst2.xml.gz bs=1 count=28 28+0 records in 28+0 records out 28 bytes (28 B) copied, 0.00013959 s, 201 kB/s thinkpad:~/XML -> ./xmllint tst2.xml.gz <?xml version="1.0"?> <node/> thinkpad:~/XML -> /usr/bin/xmllint tst2.xml.gz tst2.xml.gz:1: parser error : Document is empty ^ tst2.xml.gz:1: parser error : Start tag expected, '<' not found ^ thinkpad:~/XML -> gunzip tst2.xml.gz gzip: tst2.xml.gz: unexpected end of file thinkpad:~/XML -> it's getting really weird, if i hand craft a small gzip file by removing the footer, gunzip *does* complain ! So it seems to me the existing behaviour of libxml2 is right, but I don't understand why gunzip does not complain on filelists.xml.gz, there is something else going on... Daniel
The compressed file is generated using python's gzip.GzipFile() (and a hack to override _write_gzip_header, to make it look like gzip -n -9). Which AFAIK mainly uses zlib underneath. See: /usr/lib/python2.7/site-packages/createrepo/utils.py
Well, the problem is with footer, not header. So there are two options - 1) track the issue on machine that builds filelists as it seems it's possible to generate correct one by using correct gzip - or does this happen everywhere even with working gzip while using the library?, 2) use DV's patch not to check CRC/len (ugly & dirty but)...
Discussed at 2012-11-19 QA meeting, acting as a blocker review meeting. Accepted as a blocker, clearly.
libxml2-2.9.0-3.fc18 has been submitted as an update for Fedora 18. https://admin.fedoraproject.org/updates/libxml2-2.9.0-3.fc18
libxml2-2.9.0-3.fc18 has been pushed to the Fedora 18 stable repository. If problems still persist, please make note of it in this bug report.
The diagnosis "test file is missing footer" might be incorrect. The last 8 bytes of broken and working compressed file are identical: ---------------------------- ametzler@argenau:/tmp/GZIP$ cat filelists.xml.gz | xmllint - > /dev/null I/O error : Not enough space -:1224249: parser error : Premature end of data in tag file line 1224249 <file>/usr/src/debug/ ^ -:1224249: parser error : Premature end of data in tag package line 1224042 <file>/usr/src/debug/ ^ -:1224249: parser error : Premature end of data in tag filelists line 2 <file>/usr/src/debug/ ^ ametzler@argenau:/tmp/GZIP$ zcat filelists.xml.gz | gzip -9 | xmllint - > /dev/null ametzler@argenau:/tmp/GZIP$ tail --bytes=8 filelists.xml.gz | md5sum 97a8f077fa82919b70855654c9e102c7 - ametzler@argenau:/tmp/GZIP$ zcat filelists.xml.gz | gzip -9 | tail --bytes=8 - | md5sum 97a8f077fa82919b70855654c9e102c7 - ----------------------------
Seems Mike Alexander actually managed to chase that problem and come with a more correct patch ! https://bugzilla.gnome.org/show_bug.cgi?id=712528 Daniel