Bug 877567 - Very weird bug gzip decompression bug in "recent" libxml2 versions
Very weird bug gzip decompression bug in "recent" libxml2 versions
Product: Fedora
Classification: Fedora
Component: libxml2 (Show other bugs)
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Daniel Veillard
Fedora Extras Quality Assurance
Depends On:
Blocks: F18Beta/F18BetaBlocker
  Show dependency treegraph
Reported: 2012-11-16 17:00 EST by James Antill
Modified: 2013-11-28 11:40 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2012-11-20 02:16:54 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description James Antill 2012-11-16 17:00:34 EST
Description of problem:

So if you get the file:



% sha256sum /tmp/filelists.xml.gz
d074ecd02835aab0a57cbb3fde077bdfa873d3ee566efb967051b9063c9e371e  /tmp/filelists.xml.gz

...then you can do this on a f18/i686 host:

xmllint filelists.xml.gz > /dev/null

...and it fails with parse errors similar to what yum gets using the API at:


...but if you do:

gunzip filelists.xml
xmllint filelists.xml > /dev/null

...then everything works (as does yum). Even better, if you then do:

gzip -9 filelists.xml
xmllint filelists.xml.gz > /dev/null

...it will still work!

 The versions affected seem to be:


...downgrading to:


...will also fix the problem.
Comment 1 Dennis Gilmore 2012-11-16 18:54:16 EST
This prevents us from doing composes. which means we can not make beta without it fixed.
Comment 2 Adam Williamson 2012-11-16 18:57:18 EST
therefore, +1 blocker.
Comment 3 Tim Flink 2012-11-16 20:08:36 EST
+1 blocker since it blocks compose
Comment 4 Daniel Veillard 2012-11-19 06:16:00 EST
Okay, i can reproduce that with git head on my F17 box, which unfortunately
means I don't have a fix yet for this issue <grin/>,

Comment 5 Daniel Veillard 2012-11-19 06:45:14 EST
Seems the problem was added when adding support for lzma compression

Either I find quickly how it breaks and push a fix for it or the alternative
is to disable lzma support temporary until it is found.

Comment 6 Daniel Veillard 2012-11-19 08:04:54 EST
Hum ... on F17

thinkpad:~/XML -> cp filelists.xml.gz.bak filelists.xml.gz
thinkpad:~/XML -> xmllint --noout filelists.xml.gz
filelists.xml.gz:1224249: parser error : Premature end of data in tag file line 1224249
filelists.xml.gz:1224249: parser error : Premature end of data in tag package line 1224042
filelists.xml.gz:1224249: parser error : Premature end of data in tag filelists line 2
thinkpad:~/XML -> gunzip filelists.xml.gz
thinkpad:~/XML -> ls -l filelists.xml
-rw-rw-r--. 1 veillard veillard 97018427 Nov 19 20:42 filelists.xml
thinkpad:~/XML -> gzip filelists.xml
thinkpad:~/XML -> ls -l filelists.xml.gz
-rw-rw-r--. 1 veillard veillard 8165215 Nov 19 20:42 filelists.xml.gz
thinkpad:~/XML -> xmllint --noout filelists.xml.gz
thinkpad:~/XML -> 

  The point is that the decoder added with the new support for lzma in
libxml2 complains at the end of decoding the stream, it tries to get
a CRC and a LEN and fails to extract those 8 bytes which are used to
detect errors. When the compression is done with gzip, those 2 fields
are found. Wikipedia does not seem to suggest the footer is optional:

"    an 8-byte footer, containing a CRC-32 checksum and the length of the original uncompressed data"

  there is something weird there. gunzip doesn't complain on the file

Comment 7 Daniel Veillard 2012-11-19 08:26:21 EST
Okay it really seems to be related to the lack of the footer (checksum/len)
in the given gzip'ed file.

If in the libxml2 module I remove the error in case the crc or len cannot
be found the decompression and parsing on top succeed (which implies the
stream itself is complete otherwise the XML parser would raise a fatal

thinkpad:~/XML -> sha256sum filelists.xml.gz
d074ecd02835aab0a57cbb3fde077bdfa873d3ee566efb967051b9063c9e371e  filelists.xml.gz
thinkpad:~/XML -> ./xmllint --noout filelists.xml.gz
thinkpad:~/XML -> git diff
diff --git a/xzlib.c b/xzlib.c
index 928bd17..e08f3fc 100644
--- a/xzlib.c
+++ b/xzlib.c
@@ -552,17 +552,20 @@ xz_decomp(xz_statep state)
 #ifdef HAVE_ZLIB_H
         if (state->how == GZIP) {
             if (gz_next4(state, &crc) == -1 || gz_next4(state, &len) == -1) {
-                xz_error(state, LZMA_DATA_ERROR, "unexpected end of file");
-                return -1;
-            }
-            if (crc != state->zstrm.adler) {
-                xz_error(state, LZMA_DATA_ERROR, "incorrect data check");
-                return -1;
-            }
-            if (len != (state->zstrm.total_out & 0xffffffffL)) {
-                xz_error(state, LZMA_DATA_ERROR, "incorrect length check");
-                return -1;
-            }
+                /*
+               xz_error(state, LZMA_DATA_ERROR, "unexpected end of file");
+               return -1;
+                */
+            } else {
+               if (crc != state->zstrm.adler) {
+                   xz_error(state, LZMA_DATA_ERROR, "incorrect data check");
+                   return -1;
+               }
+               if (len != (state->zstrm.total_out & 0xffffffffL)) {
+                   xz_error(state, LZMA_DATA_ERROR, "incorrect length check");
+                   return -1;
+               }
+           }
             state->strm.avail_in = 0;
             state->strm.next_in = NULL;
             state->strm.avail_out = 0;

Is the footer somehow 'optional' ? What is actually generating the
zipped file list ? I wonder if there isn't a bug at the generation level
instead !

Comment 8 Daniel Veillard 2012-11-19 09:01:13 EST
thinkpad:~/XML -> gzip tst.xml
thinkpad:~/XML -> ls -l tst.xml.gz
-rw-rw-r--. 1 veillard veillard 36 Nov 19 21:50 tst.xml.gz
thinkpad:~/XML -> dd if=tst.xml.gz of=tst2.xml.gz bs=1 count=28
28+0 records in
28+0 records out
28 bytes (28 B) copied, 0.00013959 s, 201 kB/s
thinkpad:~/XML -> ./xmllint tst2.xml.gz
<?xml version="1.0"?>
thinkpad:~/XML -> /usr/bin/xmllint tst2.xml.gz
tst2.xml.gz:1: parser error : Document is empty

tst2.xml.gz:1: parser error : Start tag expected, '<' not found

thinkpad:~/XML -> gunzip tst2.xml.gz

gzip: tst2.xml.gz: unexpected end of file
thinkpad:~/XML ->

  it's getting really weird, if i hand craft a small gzip file by removing
the footer, gunzip *does* complain ! So it seems to me the existing behaviour
of libxml2 is right, but I don't understand why gunzip does not complain
on filelists.xml.gz, there is something else going on...

Comment 9 James Antill 2012-11-19 09:40:48 EST
 The compressed file is generated using python's gzip.GzipFile() (and a hack to override _write_gzip_header, to make it look like gzip -n -9).
 Which AFAIK mainly uses zlib underneath.

 See: /usr/lib/python2.7/site-packages/createrepo/utils.py
Comment 10 Jaroslav Reznik 2012-11-19 10:01:56 EST
Well, the problem is with footer, not header.

So there are two options - 1) track the issue on machine that builds filelists as it seems it's possible to generate correct one by using correct gzip - or does this happen everywhere even with working gzip while using the library?, 2) use DV's patch not to check CRC/len (ugly & dirty but)...
Comment 11 Adam Williamson 2012-11-19 12:06:23 EST
Discussed at 2012-11-19 QA meeting, acting as a blocker review meeting. Accepted as a blocker, clearly.
Comment 12 Fedora Update System 2012-11-19 14:38:19 EST
libxml2-2.9.0-3.fc18 has been submitted as an update for Fedora 18.
Comment 13 Fedora Update System 2012-11-20 02:16:58 EST
libxml2-2.9.0-3.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 14 Andreas Metzler 2012-12-25 06:21:08 EST
The diagnosis "test file is missing footer" might be incorrect. The last 8 bytes of broken and working compressed file are identical:
ametzler@argenau:/tmp/GZIP$ cat filelists.xml.gz |  xmllint -  > /dev/null
I/O error : Not enough space
-:1224249: parser error : Premature end of data in tag file line 1224249
-:1224249: parser error : Premature end of data in tag package line 1224042
-:1224249: parser error : Premature end of data in tag filelists line 2
ametzler@argenau:/tmp/GZIP$ zcat filelists.xml.gz |  gzip -9 |  xmllint -  > /dev/null
ametzler@argenau:/tmp/GZIP$ tail --bytes=8 filelists.xml.gz | md5sum
97a8f077fa82919b70855654c9e102c7  -
ametzler@argenau:/tmp/GZIP$ zcat filelists.xml.gz |  gzip -9 | tail --bytes=8 - | md5sum
97a8f077fa82919b70855654c9e102c7  -
Comment 15 Daniel Veillard 2013-11-28 10:18:22 EST
Seems Mike Alexander actually managed to chase that problem and come with
a more correct patch !



Note You need to log in before you can comment on or make changes to this bug.