Bug 1170243 - [abrt] lz4: LZ4HC_InsertAndFindBestMatch(): lz4 killed by SIGSEGV
Summary: [abrt] lz4: LZ4HC_InsertAndFindBestMatch(): lz4 killed by SIGSEGV
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: lz4
Version: 20
Hardware: x86_64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: pjp
QA Contact: Fedora Extras Quality Assurance
URL: https://retrace.fedoraproject.org/faf...
Whiteboard: abrt_hash:827151bbc966aceeef6b67f5478...
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-12-03 14:55 UTC by James Boyle
Modified: 2015-01-26 20:14 UTC (History)
3 users (show)

Fixed In Version: lz4-r127-1.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-01-19 01:33:10 UTC


Attachments (Terms of Use)
File: backtrace (86.53 KB, text/plain)
2014-12-03 14:55 UTC, James Boyle
no flags Details
File: cgroup (184 bytes, text/plain)
2014-12-03 14:55 UTC, James Boyle
no flags Details
File: core_backtrace (1.28 KB, text/plain)
2014-12-03 14:55 UTC, James Boyle
no flags Details
File: dso_list (218 bytes, text/plain)
2014-12-03 14:55 UTC, James Boyle
no flags Details
File: environ (211 bytes, text/plain)
2014-12-03 14:55 UTC, James Boyle
no flags Details
File: exploitable (82 bytes, text/plain)
2014-12-03 14:55 UTC, James Boyle
no flags Details
File: limits (1.29 KB, text/plain)
2014-12-03 14:55 UTC, James Boyle
no flags Details
File: maps (1.42 KB, text/plain)
2014-12-03 14:55 UTC, James Boyle
no flags Details
File: open_fds (138 bytes, text/plain)
2014-12-03 14:55 UTC, James Boyle
no flags Details
File: proc_pid_status (984 bytes, text/plain)
2014-12-03 14:55 UTC, James Boyle
no flags Details
File: var_log_messages (7.95 KB, text/plain)
2014-12-03 14:55 UTC, James Boyle
no flags Details
6MiB corpus that crashes lz4 r125 (54.69 KB, application/octet-stream)
2014-12-15 17:27 UTC, James Boyle
no flags Details

Description James Boyle 2014-12-03 14:55:17 UTC
Description of problem:
This crash occured with the same backup script as the last time/bug - data is piped into lz4 from dump.  I will check to see if I can reliably reproduce the error again.

Version-Release number of selected component:
lz4-r124-1.fc20

Additional info:
reporter:       libreport-2.2.3
backtrace_rating: 4
cmdline:        lz4 -BD -9
crash_function: LZ4HC_InsertAndFindBestMatch
executable:     /usr/bin/lz4
kernel:         3.16.3-200.fc20.x86_64
runlevel:       N 5
type:           CCpp
uid:            1000

Truncated backtrace:
Thread no. 1 (5 frames)
 #0 LZ4HC_InsertAndFindBestMatch at ../lz4hc.c:441
 #1 LZ4HC_compress_generic at ../lz4hc.c:630
 #2 LZ4_compressHC2_limitedOutput_continue at ../lz4hc.c:987
 #3 compress_file_blockDependency at lz4io.c:468
 #4 LZ4IO_compressFilename at lz4io.c:565

Comment 1 James Boyle 2014-12-03 14:55:19 UTC
Created attachment 964154 [details]
File: backtrace

Comment 2 James Boyle 2014-12-03 14:55:20 UTC
Created attachment 964155 [details]
File: cgroup

Comment 3 James Boyle 2014-12-03 14:55:21 UTC
Created attachment 964156 [details]
File: core_backtrace

Comment 4 James Boyle 2014-12-03 14:55:22 UTC
Created attachment 964157 [details]
File: dso_list

Comment 5 James Boyle 2014-12-03 14:55:22 UTC
Created attachment 964158 [details]
File: environ

Comment 6 James Boyle 2014-12-03 14:55:23 UTC
Created attachment 964159 [details]
File: exploitable

Comment 7 James Boyle 2014-12-03 14:55:24 UTC
Created attachment 964160 [details]
File: limits

Comment 8 James Boyle 2014-12-03 14:55:24 UTC
Created attachment 964161 [details]
File: maps

Comment 9 James Boyle 2014-12-03 14:55:25 UTC
Created attachment 964162 [details]
File: open_fds

Comment 10 James Boyle 2014-12-03 14:55:26 UTC
Created attachment 964163 [details]
File: proc_pid_status

Comment 11 James Boyle 2014-12-03 14:55:27 UTC
Created attachment 964164 [details]
File: var_log_messages

Comment 12 James Boyle 2014-12-03 16:12:04 UTC
  DUMP: Date of this level 2 dump: Tue Dec  2 23:01:09 2014
  DUMP: Date of last level 0 dump: Fri Nov 28 23:06:43 2014
  DUMP: Dumping /dev/mapper/vg0-root (/) to standard output
  DUMP: Label: none
  DUMP: Writing 10 Kilobyte records
  DUMP: mapping (Pass I) [regular files]
  DUMP: mapping (Pass II) [directories]
  DUMP: estimated 8652782 blocks.
  DUMP: Volume 1 started with block 1 at: Tue Dec  2 23:01:36 2014
  DUMP: dumping (Pass III) [directories]
  DUMP: dumping (Pass IV) [regular files]
  DUMP: Broken pipe
  DUMP: The ENTIRE dump is aborted.

I don't think I'll be able to reliably reproduce the crash, since the filesystem has surely changed since then (broken pipe / see above).

Comment 13 pjp 2014-12-03 16:41:31 UTC
  Hello James,

Thank you so much for reporting this issue and providing the attached debug files. I appreciate it. I've informed the upstream developer about the issue. We'll soon fix it.

Thank you.

Comment 14 Yann Collet 2014-12-03 16:47:27 UTC
Thanks for detailed report.

The following trace is informative :

matchIndex = 4294251273

This basically means the match find went into negative directory.

It should not happen, there are some special conditions to ensure it does not.

So now, what would be great is a way to reproduce and observe the bug...

Comment 15 Yann Collet 2014-12-03 17:30:24 UTC
This part is also interesting :

        dictLimit = 65536
        lowLimit = 4393084

The ways it's supposed to work is 
dictLimit >= lowLimit, in all circumstances.
So the above situation is not normal.
Something to look into...

Once again, if one way to reproduce the bug exists, it will greatly help investigation.

Comment 16 Yann Collet 2014-12-03 17:30:24 UTC
This part is also interesting :

        dictLimit = 65536
        lowLimit = 4393084

The ways it's supposed to work is 
dictLimit >= lowLimit, in all circumstances.
So the above situation is not normal.
Something to look into...

Once again, if one way to reproduce the bug exists, it will greatly help investigation.

Comment 17 Yann Collet 2014-12-03 17:56:08 UTC
OK my bad, it's not the "same" lowLimit as the one saved into the tracking structure.

Here lowLimit is a dynamic value, depending on the position of input Ptr within the stream.

dictLimit is still at its initial value, 64KB.

That's strange because lz4io is supposed to compress "block by block"
with each block being 4 MB by default.

The first block has been completed, because lowLimit > 4 MB.
Therefore, the thread is currentlyc compressing block 2.

Between both, a call to LZ4_saveDictHC() should be completed :
https://github.com/Cyan4973/lz4/blob/master/lz4hc.c#L937

Comment 18 Yann Collet 2014-12-03 18:07:06 UTC
OK, got it.

r124's lz4io doesn't call LZ4_saveDictHC() between each block,
but the deprecated LZ4_slideInputBufferHC() instead.

It's no longer the proper way to use the API.

LZ4_slideInputBufferHC() doesn't respect the new condition regarding matchIndex >= 64 KB (this condition wasn't necessary before). Without this condition, an extraordinary set of circumstances could result in a matchIndex underflow. Unfortunately, it's really difficult to produce such problem, so it managed to get through hundred of thousands of fuzzer runs.

LZ4_slideInputBufferHC() wasn't updated to fit the new condition, which is a mistake.

Actually, r125 can't replicate the problem because it has been completely upgraded regarding lz4io : lz4io internal logic has been removed and wired instead into lz4frame, which of course uses LZ4_saveDictHC(). So everything is now completely different.

You can have a look at the new version at :
https://github.com/Cyan4973/lz4/tree/dev

(Note that directory structure has been updated too).



Now I'm puzzled about what to do about the deprecated LZ4_slideInputBufferHC() function. Either update it, or remove it entirely ?

Comment 19 James Boyle 2014-12-04 16:59:58 UTC
Here's an update, lz4 crashed again last night, so I decided to try to run it again this AM.  I was able to gather an uncompressed dump, that when I pipe into lz4 -BD -9, it crashes.  This is much different than the first bug I reported, where the error occurred from an essentially empty dump output file.  In this case the crash happens about 80% through the corpus, which is 9.6GiB.  I have been able to reproduce by replaying (dd if=dumpfile skip=... | lz4 -BD -9 > dumpfile.lz4) the last 2GiB.  I do not want to share this file since it contains email, etc., but I am willing to run debug code, or gather more information.  I will try the new version from github when I next have time, but I will make one more note.  The point at which corpus is able to cause the crash is partway through my (large) thunderbird email archive.  I see highly repetitive strings like the following, scattered inbetween the plaintext messages:

IiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIkF1dGhlbnRpY2F0aW9uIGZyb20gNDA6MzA6MDQ6NmY6
[ truncated ... ]
LCIiLCIiLCIiLCIiLCIxNzIuMTYuMzkuOTUiLCIiLCIiLCIiLCIiLCIiLCIiLCIiLCIiLCIiLCIiLCIi
[ truncated ... ]
IiwiIiwiIiwiMTAyIiwiIiwiIiwiIiwiIiwiIiwiIiwiIiwiIiwiIiwiIiwiIiwiIiwiIiwiIiwiRHJv
[ truncated ... ]
IiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIsIiIs

Note, that it doesn't crash immediately from that point, but eats through about 1GiB before crashing.

Thanks for looking into this!
--James

Comment 20 Yann Collet 2014-12-08 12:57:22 UTC
There is now an rc125 pre-release available for your testing at :
https://github.com/Cyan4973/lz4/releases

Comment 21 James Boyle 2014-12-15 17:27:15 UTC
Created attachment 969198 [details]
6MiB corpus that crashes lz4 r125

I had a few minutes to test. I experienced interesting results - the previous corpus did not cause a crash in the same spot, but rather much sooner in the file (with totally different structure).  I was able to tease out a small file you can use to test.  The attachment causes r125 to crash, but not 124:

$ gunzip crashes.lz4.r125.gz
$ ~/bin/lz4 -BD -9 crashes.lz4.r125
Compressed filename will be : ./crashes.lz4.r125.lz4 
Segmentation fault (core dumped)
$ 

Happy hunting!

Comment 22 Yann Collet 2014-12-15 17:28:53 UTC
Are you using GCC 4.9 to compile ?

Comment 23 James Boyle 2014-12-15 17:37:22 UTC
I defaulted to what's in F20 - gcc --version shows that I'm using 4.8.3:

gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7)
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Comment 24 Yann Collet 2014-12-15 17:41:28 UTC
OK, thanks for the check.
I'm going to look into it

You seem to constantly find new interesting corner cases James :)

Comment 25 James Boyle 2014-12-15 17:59:43 UTC
No problem - I aim to please ;^)

Comment 26 Yann Collet 2014-12-16 01:16:04 UTC
Phew...
It took me a while to find a way to reproduce the problem, but now that's done. It's safely reproduced within the updated test C.I. suite.

And a fix comes with it.
It's a stupid bug, but it requires some corner case conditions to be triggered, and so avoided the previous test suite.

You can test it at :
https://github.com/Cyan4973/lz4/tree/dev


Regards and thanks for the report !

Comment 27 Yann Collet 2014-12-19 09:01:04 UTC
Hi

There is now an LZ4 release candidate available, at :
https://github.com/Cyan4973/lz4/releases

It's supposed to fix this issue and a bunch of other minor ones.

James : would you be so kind as testing this version on your log system ?
Since you already found so many corner cases with this configuration, you deserve the title of ultimate torture tester. More seriously, I would prefer issues to be detected and corrected before r126 becomes official release. If possible.

Best regards

Comment 28 James Boyle 2014-12-21 19:05:37 UTC
Hi Yann,

I've had good luck with the December 19 r126 RC so far - it went through the previous corpii that caused crashes, but this time without any problems.  I am making it the system default (automated torture) so if I see any more errors, I'll let you know.

Thanks again!
--James

Comment 29 Yann Collet 2014-12-21 20:15:30 UTC
Excellent !

I'm planning to release r126 sometimes next week, so if everything goes properly, this release candidate will simply become the next version.

Regards

Comment 30 Yann Collet 2014-12-25 19:32:39 UTC
r126 is out

Comment 31 pjp 2014-12-26 05:42:11 UTC
Yes, I'll build it tonight. Thank you James for reporting this issue and Yann for the new build.

Wish you both happy new year ahead! :)

Comment 32 James Boyle 2015-01-02 15:31:21 UTC
Happy New Year!

Just quick (&& happy update) - since December 21, lz4 r126 has processed over 260 GiB of real world data for me, with a largest single job of 142 GiB - all without issue.

Thanks again :^D

--James

Comment 33 Yann Collet 2015-01-02 17:45:08 UTC
Thanks for the feedback !

Happy new year James and RedHat / Fedora team !

Comment 34 Fedora Update System 2015-01-08 14:14:17 UTC
lz4-r127-1.fc21 has been submitted as an update for Fedora 21.
https://admin.fedoraproject.org/updates/lz4-r127-1.fc21

Comment 35 Fedora Update System 2015-01-08 14:15:28 UTC
lz4-r127-1.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/lz4-r127-1.fc20

Comment 36 Fedora Update System 2015-01-08 14:15:34 UTC
lz4-r127-1.el5 has been submitted as an update for Fedora EPEL 5.
https://admin.fedoraproject.org/updates/lz4-r127-1.el5

Comment 37 Fedora Update System 2015-01-08 14:16:43 UTC
lz4-r127-1.el7 has been submitted as an update for Fedora EPEL 7.
https://admin.fedoraproject.org/updates/lz4-r127-1.el7

Comment 38 Fedora Update System 2015-01-08 14:16:50 UTC
lz4-r127-1.el6 has been submitted as an update for Fedora EPEL 6.
https://admin.fedoraproject.org/updates/lz4-r127-1.el6

Comment 39 Fedora Update System 2015-01-19 01:33:10 UTC
lz4-r127-1.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 40 Fedora Update System 2015-01-26 02:34:49 UTC
lz4-r127-1.fc21 has been pushed to the Fedora 21 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 41 Fedora Update System 2015-01-26 20:11:43 UTC
lz4-r127-1.el5 has been pushed to the Fedora EPEL 5 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 42 Fedora Update System 2015-01-26 20:11:54 UTC
lz4-r127-1.el6 has been pushed to the Fedora EPEL 6 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 43 Fedora Update System 2015-01-26 20:14:18 UTC
lz4-r127-1.el7 has been pushed to the Fedora EPEL 7 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.