Description of problem: This crash occured with the same backup script as the last time/bug - data is piped into lz4 from dump. I will check to see if I can reliably reproduce the error again. Version-Release number of selected component: lz4-r124-1.fc20 Additional info: reporter: libreport-2.2.3 backtrace_rating: 4 cmdline: lz4 -BD -9 crash_function: LZ4HC_InsertAndFindBestMatch executable: /usr/bin/lz4 kernel: 3.16.3-200.fc20.x86_64 runlevel: N 5 type: CCpp uid: 1000 Truncated backtrace: Thread no. 1 (5 frames) #0 LZ4HC_InsertAndFindBestMatch at ../lz4hc.c:441 #1 LZ4HC_compress_generic at ../lz4hc.c:630 #2 LZ4_compressHC2_limitedOutput_continue at ../lz4hc.c:987 #3 compress_file_blockDependency at lz4io.c:468 #4 LZ4IO_compressFilename at lz4io.c:565
Created attachment 964154 [details] File: backtrace
Created attachment 964155 [details] File: cgroup
Created attachment 964156 [details] File: core_backtrace
Created attachment 964157 [details] File: dso_list
Created attachment 964158 [details] File: environ
Created attachment 964159 [details] File: exploitable
Created attachment 964160 [details] File: limits
Created attachment 964161 [details] File: maps
Created attachment 964162 [details] File: open_fds
Created attachment 964163 [details] File: proc_pid_status
Created attachment 964164 [details] File: var_log_messages
DUMP: Date of this level 2 dump: Tue Dec 2 23:01:09 2014 DUMP: Date of last level 0 dump: Fri Nov 28 23:06:43 2014 DUMP: Dumping /dev/mapper/vg0-root (/) to standard output DUMP: Label: none DUMP: Writing 10 Kilobyte records DUMP: mapping (Pass I) [regular files] DUMP: mapping (Pass II) [directories] DUMP: estimated 8652782 blocks. DUMP: Volume 1 started with block 1 at: Tue Dec 2 23:01:36 2014 DUMP: dumping (Pass III) [directories] DUMP: dumping (Pass IV) [regular files] DUMP: Broken pipe DUMP: The ENTIRE dump is aborted. I don't think I'll be able to reliably reproduce the crash, since the filesystem has surely changed since then (broken pipe / see above).
Hello James, Thank you so much for reporting this issue and providing the attached debug files. I appreciate it. I've informed the upstream developer about the issue. We'll soon fix it. Thank you.
Thanks for detailed report. The following trace is informative : matchIndex = 4294251273 This basically means the match find went into negative directory. It should not happen, there are some special conditions to ensure it does not. So now, what would be great is a way to reproduce and observe the bug...
This part is also interesting : dictLimit = 65536 lowLimit = 4393084 The ways it's supposed to work is dictLimit >= lowLimit, in all circumstances. So the above situation is not normal. Something to look into... Once again, if one way to reproduce the bug exists, it will greatly help investigation.
OK my bad, it's not the "same" lowLimit as the one saved into the tracking structure. Here lowLimit is a dynamic value, depending on the position of input Ptr within the stream. dictLimit is still at its initial value, 64KB. That's strange because lz4io is supposed to compress "block by block" with each block being 4 MB by default. The first block has been completed, because lowLimit > 4 MB. Therefore, the thread is currentlyc compressing block 2. Between both, a call to LZ4_saveDictHC() should be completed : https://github.com/Cyan4973/lz4/blob/master/lz4hc.c#L937
OK, got it. r124's lz4io doesn't call LZ4_saveDictHC() between each block, but the deprecated LZ4_slideInputBufferHC() instead. It's no longer the proper way to use the API. LZ4_slideInputBufferHC() doesn't respect the new condition regarding matchIndex >= 64 KB (this condition wasn't necessary before). Without this condition, an extraordinary set of circumstances could result in a matchIndex underflow. Unfortunately, it's really difficult to produce such problem, so it managed to get through hundred of thousands of fuzzer runs. LZ4_slideInputBufferHC() wasn't updated to fit the new condition, which is a mistake. Actually, r125 can't replicate the problem because it has been completely upgraded regarding lz4io : lz4io internal logic has been removed and wired instead into lz4frame, which of course uses LZ4_saveDictHC(). So everything is now completely different. You can have a look at the new version at : https://github.com/Cyan4973/lz4/tree/dev (Note that directory structure has been updated too). Now I'm puzzled about what to do about the deprecated LZ4_slideInputBufferHC() function. Either update it, or remove it entirely ?
Here's an update, lz4 crashed again last night, so I decided to try to run it again this AM. I was able to gather an uncompressed dump, that when I pipe into lz4 -BD -9, it crashes. This is much different than the first bug I reported, where the error occurred from an essentially empty dump output file. In this case the crash happens about 80% through the corpus, which is 9.6GiB. I have been able to reproduce by replaying (dd if=dumpfile skip=... | lz4 -BD -9 > dumpfile.lz4) the last 2GiB. I do not want to share this file since it contains email, etc., but I am willing to run debug code, or gather more information. I will try the new version from github when I next have time, but I will make one more note. The point at which corpus is able to cause the crash is partway through my (large) thunderbird email archive. I see highly repetitive strings like the following, scattered inbetween the plaintext messages: IiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIkF1dGhlbnRpY2F0aW9uIGZyb20gNDA6MzA6MDQ6NmY6 [ truncated ... ] LCIiLCIiLCIiLCIiLCIxNzIuMTYuMzkuOTUiLCIiLCIiLCIiLCIiLCIiLCIiLCIiLCIiLCIiLCIiLCIi [ truncated ... ] IiwiIiwiIiwiMTAyIiwiIiwiIiwiIiwiIiwiIiwiIiwiIiwiIiwiIiwiIiwiIiwiIiwiIiwiIiwiRHJv [ truncated ... ] IiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIs Note, that it doesn't crash immediately from that point, but eats through about 1GiB before crashing. Thanks for looking into this! --James
There is now an rc125 pre-release available for your testing at : https://github.com/Cyan4973/lz4/releases
Created attachment 969198 [details] 6MiB corpus that crashes lz4 r125 I had a few minutes to test. I experienced interesting results - the previous corpus did not cause a crash in the same spot, but rather much sooner in the file (with totally different structure). I was able to tease out a small file you can use to test. The attachment causes r125 to crash, but not 124: $ gunzip crashes.lz4.r125.gz $ ~/bin/lz4 -BD -9 crashes.lz4.r125 Compressed filename will be : ./crashes.lz4.r125.lz4 Segmentation fault (core dumped) $ Happy hunting!
Are you using GCC 4.9 to compile ?
I defaulted to what's in F20 - gcc --version shows that I'm using 4.8.3: gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7) Copyright (C) 2013 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
OK, thanks for the check. I'm going to look into it You seem to constantly find new interesting corner cases James :)
No problem - I aim to please ;^)
Phew... It took me a while to find a way to reproduce the problem, but now that's done. It's safely reproduced within the updated test C.I. suite. And a fix comes with it. It's a stupid bug, but it requires some corner case conditions to be triggered, and so avoided the previous test suite. You can test it at : https://github.com/Cyan4973/lz4/tree/dev Regards and thanks for the report !
Hi There is now an LZ4 release candidate available, at : https://github.com/Cyan4973/lz4/releases It's supposed to fix this issue and a bunch of other minor ones. James : would you be so kind as testing this version on your log system ? Since you already found so many corner cases with this configuration, you deserve the title of ultimate torture tester. More seriously, I would prefer issues to be detected and corrected before r126 becomes official release. If possible. Best regards
Hi Yann, I've had good luck with the December 19 r126 RC so far - it went through the previous corpii that caused crashes, but this time without any problems. I am making it the system default (automated torture) so if I see any more errors, I'll let you know. Thanks again! --James
Excellent ! I'm planning to release r126 sometimes next week, so if everything goes properly, this release candidate will simply become the next version. Regards
r126 is out
Yes, I'll build it tonight. Thank you James for reporting this issue and Yann for the new build. Wish you both happy new year ahead! :)
Happy New Year! Just quick (&& happy update) - since December 21, lz4 r126 has processed over 260 GiB of real world data for me, with a largest single job of 142 GiB - all without issue. Thanks again :^D --James
Thanks for the feedback ! Happy new year James and RedHat / Fedora team !
lz4-r127-1.fc21 has been submitted as an update for Fedora 21. https://admin.fedoraproject.org/updates/lz4-r127-1.fc21
lz4-r127-1.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/lz4-r127-1.fc20
lz4-r127-1.el5 has been submitted as an update for Fedora EPEL 5. https://admin.fedoraproject.org/updates/lz4-r127-1.el5
lz4-r127-1.el7 has been submitted as an update for Fedora EPEL 7. https://admin.fedoraproject.org/updates/lz4-r127-1.el7
lz4-r127-1.el6 has been submitted as an update for Fedora EPEL 6. https://admin.fedoraproject.org/updates/lz4-r127-1.el6
lz4-r127-1.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report.
lz4-r127-1.fc21 has been pushed to the Fedora 21 stable repository. If problems still persist, please make note of it in this bug report.
lz4-r127-1.el5 has been pushed to the Fedora EPEL 5 stable repository. If problems still persist, please make note of it in this bug report.
lz4-r127-1.el6 has been pushed to the Fedora EPEL 6 stable repository. If problems still persist, please make note of it in this bug report.
lz4-r127-1.el7 has been pushed to the Fedora EPEL 7 stable repository. If problems still persist, please make note of it in this bug report.