Hello, would you mind trying out with 2.3 version? I built the RPM package for you. http://koji.fedoraproject.org/koji/taskinfo?taskID=6061595 What platform are you trying this on? Is that Intel or ARM? Thanks. I am upstream maintainer of lbzip2. To be able to analyze the case I need a reproducer (compressed file which causes segfault during decompression). If the file is too big (larger than several MB) then you can use lbzrecover (from lbzip2-utils) to cut it to smaller pieces and attach one of the small files. If for some reason (confidential data etc.) you cannot include reproducer (even send privately to me) then recompile lbzip2 with debugging information and with no optimisation and make sure that segfault can be reproduced consistently (happens on the same instruction). Then attach lbzip2 binary you used with detailed backtrace and register dump (the more information the better). I will try debugging the problem, but without reproducer it may be impossible. (In reply to Lukas Zapletal from comment #1) > Hello, > > would you mind trying out with 2.3 version? I built the RPM package for you. > > http://koji.fedoraproject.org/koji/taskinfo?taskID=6061595 > > What platform are you trying this on? Is that Intel or ARM? Thanks. It looks like 2.3 exhibits the same segfault. (In reply to Mikolaj Izdebski from comment #2) > I am upstream maintainer of lbzip2. > > To be able to analyze the case I need a reproducer (compressed file which > causes segfault during decompression). If the file is too big (larger than > several MB) then you can use lbzrecover (from lbzip2-utils) to cut it to > smaller pieces and attach one of the small files. > > If for some reason (confidential data etc.) you cannot include reproducer > (even send privately to me) then recompile lbzip2 with debugging information > and with no optimisation and make sure that segfault can be reproduced > consistently (happens on the same instruction). Then attach lbzip2 binary > you used with detailed backtrace and register dump (the more information the > better). I will try debugging the problem, but without reproducer it may be > impossible. Yes it is confidential data unfortunately. Process run time is on the order of 1 - 1.5 hours before it happens, and the input is a tar image that is ~230G in size. I'll try and create a reproducer tar file with random data.. (In reply to Lukas Zapletal from comment #1) > What platform are you trying this on? Is that Intel or ARM? Thanks. Forgot to say.. Intel x86_64 (In reply to Mikolaj Izdebski from comment #2) > I am upstream maintainer of lbzip2. > > To be able to analyze the case I need a reproducer (compressed file which > causes segfault during decompression). If the file is too big (larger than > several MB) then you can use lbzrecover (from lbzip2-utils) to cut it to > smaller pieces and attach one of the small files. > > If for some reason (confidential data etc.) you cannot include reproducer > (even send privately to me) then recompile lbzip2 with debugging information > and with no optimisation and make sure that segfault can be reproduced > consistently (happens on the same instruction). Then attach lbzip2 binary > you used with detailed backtrace and register dump (the more information the > better). I will try debugging the problem, but without reproducer it may be > impossible. So far I have been unable to reproduce this with random data files as input. I am trying to reproduce it again in gdb against the real data input. Are there any specific gdb command outputs I can provide? Created attachment 814353 [details]
Backtrace with more information
Reproduced again. More debugging info
Created attachment 814358 [details]
Dump of assembler code for function retrieve
Dump of assembler code for function retrieve.
Created attachment 814367 [details]
Dump of assembler code for function do_retrieve
Dump of assembler code for function do_retrieve
Thank you for more detailed information. I will analyze the data you provided and try to reproduce the crash. Just to confirm, these dumps are for lbzip2-2.2-2.fc19.x86_64? (In reply to Mikolaj Izdebski from comment #11) > Thank you for more detailed information. > I will analyze the data you provided and try to reproduce the crash. > Just to confirm, these dumps are for lbzip2-2.2-2.fc19.x86_64? Yes lbzip2-2.2-2.fc19.x86_64 and lbzip2-debuginfo-2.2-2.fc19.x86_64 are what I'm working with. Thanks Created attachment 814852 [details]
GDB session better look at variables
I reproduced this again and have gdb attached to it. Looking at s = T->perm[T->count[k] + ((v - T->base[k]) >> (64 - k))]; at current runtime, GDB says it cannot access memory.
Created attachment 814854 [details]
C to calculate value s
This C program is what I used to calculate s value. There is probably a way to make gdb do it, but this was faster for me.
(In reply to Will Bending from comment #14) > Created attachment 814854 [details] > C to calculate value s > > This C program is what I used to calculate s value. There is probably a way > to make gdb do it, but this was faster for me. Correction, to calculate the index for T->perm[T->count[k] + ((v - T->base[k]) >> (64 - k))] not s value, although this memory's contents would be assigned to variable s. (gdb) print T->perm[2826676582] Cannot access memory at address 0x7fb604f81670 Created attachment 814855 [details] Process memory map Here is the map of memory at time of the crash in attachment 814852 [details]. (In reply to Will Bending from comment #16) > Created attachment 814855 [details] > Process memory map > > Here is the map of memory at time of the crash in attachment 814852 [details] > [details]. Another error correction: memory map is here in attachment 814855 [details] I am working on reproducing again, but so far it seems like it is crashing reliably at the same place, although I have seen very different variable values in the scope of the frame where this illegal access happens. Thank you for helping with this bug. I really would like to fix it, but I won't have time until Sunday. (In reply to Mikolaj Izdebski from comment #18) > Thank you for helping with this bug. I really would like to fix it, but I > won't have time until Sunday. I understand. It is not holding me up on this project as I changed to pigz, although I would prefer this algorithm. It has better compression ratio, and I can live with the performance loss because I can get more data on the LTO3 tape. Let me know if I can get anything else useful out of GDB or /proc for you. I am recording this next crash so hopefully running it in reverse will show something more. Created attachment 815533 [details]
non optimized srpm
srpm used to build the binary with optimizations turned off.
%configure \
--build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= \
--disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin \
--sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include \
--libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var \
--sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info
make %{?_smp_mflags} CFLAGS='-O0 -g -pipe -Wall -Wp,-fexceptions -fstack-protector --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic' \
CXXFLAGS='-O0 -g -pipe -Wall -Wp,-fexceptions -fstack-protector --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic' \
FFLAGS='-O0 -g -pipe -Wall -Wp,-fexceptions -fstack-protector --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -I/usr/lib64/gfortran/modules' \
FCFLAGS='-O0 -g -pipe -Wall -Wp,-fexceptions -fstack-protector --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -I/usr/lib64/gfortran/modules' \
LDFLAGS='-Wl,-z,relro '
Created attachment 815534 [details]
non optimized binary rpm
non optimized binary rpm
Created attachment 815535 [details]
non optimized debug symbols
non optimized debug symbols
Created attachment 815536 [details]
gdb session non optimized binary
This is a good gdb session of the non-optimized binary having the crash. All variables in frame 0 scope are shown.
Created attachment 815537 [details]
assembler for non optimized binary
The assembler code for the non optimized binary.
Created attachment 815538 [details]
process memory map non optimized binary
process memory map non optimized binary
(In reply to Mikolaj Izdebski from comment #18) > Thank you for helping with this bug. I really would like to fix it, but I > won't have time until Sunday. I went ahead and rebuilt lbzip2 with optimizations switched off like you asked originally. It reproduces in the same call, and the data structures are easy to look at. So to me it would appear this has a very out-of-bounds k value when considering arrays base[] and count[]. I intend to leave this GDB session paused where it is if possible.. if not it is easy enough to reproduce this. Let me know if you have something you want looked at in GDB session. Unfortunately reverse only takes me back to where it was interrupted by GDB attaching.. maybe I am using 'record' incorrectly. (In reply to Will Bending from comment #26) > So to me it would appear this has a very out-of-bounds k value when > considering arrays base[] and count[]. That seems to be the problem. base[] array seems to have incorrect element base[21] which causes k to run out of bounds (1 <= k <= 20). I have prepared a fix. Because I don't have reproducer I cannot verify the fix myself. Will, could you check if the following RPM fixes the problem for you? http://mizdebsk.fedorapeople.org/lbzip2-2.2-3.fc21.0.1.x86_66.rpm http://mizdebsk.fedorapeople.org/lbzip2-2.2-3.fc21.0.1.src.rpm (In reply to Mikolaj Izdebski from comment #27) > http://mizdebsk.fedorapeople.org/lbzip2-2.2-3.fc21.0.1.x86_66.rpm > http://mizdebsk.fedorapeople.org/lbzip2-2.2-3.fc21.0.1.src.rpm There is a typo. Of course I meant: http://mizdebsk.fedorapeople.org/lbzip2-2.2-3.fc21.0.1.x86_64.rpm http://mizdebsk.fedorapeople.org/lbzip2-2.2-3.fc21.0.1.src.rpm (In reply to Mikolaj Izdebski from comment #28) > (In reply to Mikolaj Izdebski from comment #27) > > http://mizdebsk.fedorapeople.org/lbzip2-2.2-3.fc21.0.1.x86_66.rpm > > http://mizdebsk.fedorapeople.org/lbzip2-2.2-3.fc21.0.1.src.rpm > > There is a typo. Of course I meant: > > http://mizdebsk.fedorapeople.org/lbzip2-2.2-3.fc21.0.1.x86_64.rpm > http://mizdebsk.fedorapeople.org/lbzip2-2.2-3.fc21.0.1.src.rpm Hey looks like you have a good fix. Nice work. I built a binary from this srpm and confirm I am not seeing the crash. It looks like Amanda's amrecover completed successfully. I will continue testing with other large restores from this data set and let you know results. Thanks very much. Fixed in lbzip2-2.2-4 (In reply to Will Bending from comment #29) > Hey looks like you have a good fix. Nice work. > > I built a binary from this srpm and confirm I am not seeing the crash. It > looks like Amanda's amrecover completed successfully. > > I will continue testing with other large restores from this data set and let > you know results. > > Thanks very much. I'm glad the fix works for you. I'll create an update soon. Thank you for taking time in reporting this and providing all the details. Without backtrace and data structure dumps I wouldn't be able to fix it. lbzip2-2.2-4.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/lbzip2-2.2-4.fc20 lbzip2-2.2-4.fc19 has been submitted as an update for Fedora 19. https://admin.fedoraproject.org/updates/lbzip2-2.2-4.fc19 lbzip2-2.2-4.fc18 has been submitted as an update for Fedora 18. https://admin.fedoraproject.org/updates/lbzip2-2.2-4.fc18 (In reply to Mikolaj Izdebski from comment #31) > (In reply to Will Bending from comment #29) > > Hey looks like you have a good fix. Nice work. > > > > I built a binary from this srpm and confirm I am not seeing the crash. It > > looks like Amanda's amrecover completed successfully. > > > > I will continue testing with other large restores from this data set and let > > you know results. > > > > Thanks very much. > > I'm glad the fix works for you. I'll create an update soon. > > Thank you for taking time in reporting this and providing all the details. > Without backtrace and data structure dumps I wouldn't be able to fix it. Glad to have helped. I have tested several more times today and cannot reproduce the crash with this patch applied. I will switch my backup compression method back to lbzip2 and finish evaluating Amanda. Thanks for the quick fix. lbzip2-2.2-4.fc19 has been pushed to the Fedora 19 stable repository. If problems still persist, please make note of it in this bug report. lbzip2-2.2-4.fc18 has been pushed to the Fedora 18 stable repository. If problems still persist, please make note of it in this bug report. I believe that this bug is fixed in lbzip2-2.2-4, which is available in updates for Fedora 19, so I am closing this bug now. The build containing the fix can be found at Koji: http://koji.fedoraproject.org/koji/buildinfo?buildID=474109 lbzip2-2.2-4.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report. |
Created attachment 810741 [details] backtrace Description of problem: lbzip2 worker thread crashing with signal 11. In my case this is during an attempted file restore using Amanda's amrecover command. I am using lbzip2 as the compression option in my Amanda configuration. It is being invoked by amrecover via Bash shell script wrapper with -n 5 to use 5 CPU cores. Version-Release number of selected component (if applicable): 2.2-2.fc19 How reproducible: 100% in my Amanda configuration. Unsure how reproducible elsewhere. Steps to Reproduce: 1. configure lbzip2 as Amanda's compression program with -n 5 2. perform backup testing 3. restore multiple files/large directories of jpg images with amrecover Actual results: lbzip2 segfaults aborting the restore Expected results: restore succeeds Additional info: backtrace attached