Bug 2267598
Summary: | Invalid writes regression in liblzma.so | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Nathan Scott <nathans> |
Component: | xz | Assignee: | Matej Mužila <mmuzila> |
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | rawhide | CC: | awilliam, eblake, ehila, jnovy, mjw, mmuzila, pkubat, praiskup, qguo, rjones, sam, thomas.barbier |
Target Milestone: | --- | Keywords: | Regression |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | xz-5.6.1-1.fc41 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2024-03-09 14:26:18 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Nathan Scott
2024-03-04 03:17:21 UTC
I ran valgrind over some of the tests in the xz test suite and was not able to reproduce any error. There are also no commits upstream since 5.6.0 which would indicate any fix. So I think we'll need a reproducer. > 1. Run PCP regression tests, which involve running programs under valgrind
These tests are where and how do I run them? The spec file itself doesn't mention valgrind.
I seem to have reproduced this in another project. My stack trace has some more symbols: ==746855== Invalid write of size 8 ==746855== at 0x52E8645: ??? (in /usr/lib64/liblzma.so.5.6.0) ==746855== by 0x52CA83B: _get_cpuid (in /usr/lib64/liblzma.so.5.6.0) ==746855== by 0x6: ??? ==746855== by 0x1FFEFFF4AF: ??? ==746855== by 0x77AD31E59B84CFFF: ??? ==746855== by 0x1FFEFFF4AF: ??? ==746855== by 0x400F253: elf_machine_rela (dl-machine.h:314) ==746855== by 0x400F253: elf_dynamic_do_Rela (do-rel.h:147) ==746855== by 0x400F253: _dl_relocate_object (dl-reloc.c:301) ==746855== by 0x52015AF: ??? ==746855== by 0x5200B0F: ??? ==746855== by 0x1FFEFFF43F: ??? ==746855== by 0x1FFEFFF42F: ??? ==746855== by 0x53E6D17: ??? (in /usr/lib64/libffi.so.8.1.2) ==746855== Address 0x1ffeffe538 is on thread 1's stack ==746855== 136 bytes below stack pointer Even a trivial program that links with lzma reproduces this: $ echo 'int main(){return 0;}' > test.c $ gcc test.c -llzma -o test $ valgrind ./test ==749691== Memcheck, a memory error detector ==749691== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al. ==749691== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info ==749691== Command: ./test ==749691== ==749691== Invalid write of size 8 ==749691== at 0x4897645: ??? (in /usr/lib64/liblzma.so.5.6.0) ==749691== by 0x487983B: _get_cpuid (in /usr/lib64/liblzma.so.5.6.0) ==749691== by 0x6: ??? ==749691== by 0x1FFEFFF8DF: ??? ==749691== by 0xDD2A8041A0E922FF: ??? ==749691== by 0x1FFEFFF8DF: ??? ==749691== by 0x400F253: elf_machine_rela (dl-machine.h:314) ==749691== by 0x400F253: elf_dynamic_do_Rela (do-rel.h:147) ==749691== by 0x400F253: _dl_relocate_object (dl-reloc.c:301) ==749691== by 0x483BA7F: ??? ==749691== by 0x483B56F: ??? ==749691== by 0x1FFEFFF86F: ??? ==749691== by 0x1FFEFFF85F: ??? ==749691== by 0x48CDD9F: ??? (in /usr/lib64/libc.so.6) ==749691== Address 0x1ffeffe968 is on thread 1's stack ==749691== 136 bytes below stack pointer ==749691== ==749691== ==749691== HEAP SUMMARY: ==749691== in use at exit: 0 bytes in 0 blocks ==749691== total heap usage: 0 allocs, 0 frees, 0 bytes allocated ==749691== ==749691== All heap blocks were freed -- no leaks are possible ==749691== ==749691== For lists of detected and suppressed errors, rerun with: -s ==749691== ERROR SUMMARY: 112 errors from 1 contexts (suppressed: 0 from 0) (xz-libs-5.6.0-2.fc41.x86_64) I changed xz to use ./configure --disable-ifunc which works around the problem: https://src.fedoraproject.org/rpms/xz/c/6db19f2231927b4d93e9c021d32cb7433708e26f?branch=rawhide Build for F40: https://koji.fedoraproject.org/koji/taskinfo?taskID=114461506 Build for Rawhide: https://koji.fedoraproject.org/koji/taskinfo?taskID=114461456 As this is only a workaround, let's keep this bug open. Is the broken version in F40 stable currently? If so, should we propose this as an FE to ensure we don't ship the broken one in Beta? These are the fixed packages: F40: https://bodhi.fedoraproject.org/updates/FEDORA-2024-f5033032b8 F41: https://bodhi.fedoraproject.org/updates/FEDORA-2024-f0381d82b3 This is the broken update: https://bodhi.fedoraproject.org/updates/FEDORA-2024-4417db3376 (Do we actually need to rebuild perl-Compress-Raw-Lzma on every *release*? Spec file seems to say only on version changes.) Not sure what FE is, but it looks like the broken update is not in Fedora 40 right now, so we just need to make sure it doesn't get out. Yes. We do. You can tell, because the tests failed. :D Those are the tests that always fail when the package can't be updated because the perl-Compress-Raw-Lzma dependency is broken. It would have made more sense to just edit the new xz build into the existing F40 update, but now that ship has sailed :( We will need to bump and build perl-Compress-Raw-Lzma again and edit it into the new update. Actually, let me qualify that - no, we don't need to rebuild it every *release*, only on version changes - but because https://bodhi.fedoraproject.org/updates/FEDORA-2024-4417db3376 never made it to stable, stable still has the old perl-Compress-Raw-Lzma that requires version 5.4.6 of xz-libs. Hi Rich, Sorry for the lack of detail - the failure rate is so high for us here that I wondered if every program using liblzma would exhibit the same problem - that's certainly the case across all our tools. This is last nights run and shows the extent of the issue. I expect that ~130 rawhide failures list is every test we have that uses valgrind. https://performancecopilot.github.io/qa-reports/reports/20240304_212241-a8847c35/ You can install the pcp-testsuite package to get at the PCP tests locally. They're shell scripts so pretty easy to follow (obviously, many of them invoke other tools like valgrind). The individual tests can also be perused here: https://github.com/performancecopilot/pcp/tree/main/qa cheers. I'll do the perl-Compress-Raw-Lzma builds and clean up the update. (In reply to Adam Williamson from comment #9) > Actually, let me qualify that - no, we don't need to rebuild it every > *release*, only on version changes - but because > https://bodhi.fedoraproject.org/updates/FEDORA-2024-4417db3376 never made it > to stable, stable still has the old perl-Compress-Raw-Lzma that requires > version 5.4.6 of xz-libs. Right yes, this above is the reason. In perl-Compress-Raw-Lzma it only depends on the liblzma version: Requires: xz-libs%{?_isa} = %((pkg-config --modversion liblzma 2>/dev/null || echo 0) | tr -dc '[0-9.]') where 'pkg-config --modversion liblzma' expands to '5.6.0'. So the comment is correct. Confirming we're seeing goodness across the PCP tests running on rawhide once more. Thanks Rich! This is fixed by https://github.com/tukaani-project/xz/commit/82ecc538193b380a21622aea02b0ba078e7ade92 included in xz 5.6.1. FEDORA-2024-7e9c14633a (perl-Compress-Raw-Lzma-2.209-5.fc41 and xz-5.6.1-1.fc41) has been submitted as an update to Fedora 41. https://bodhi.fedoraproject.org/updates/FEDORA-2024-7e9c14633a FEDORA-2024-7e9c14633a (perl-Compress-Raw-Lzma-2.209-5.fc41 and xz-5.6.1-1.fc41) has been pushed to the Fedora 41 stable repository. If problem still persists, please make note of it in this bug report. Yikes - the author of the upstream patch used this bug to justify making his xz backdoor (CVE-2024-3094) even bigger. :( https://github.com/tukaani-project/xz/commit/82ecc538193b380a21622aea02b0ba078e7ade92 https://www.openwall.com/lists/oss-security/2024/03/29/4 (In reply to Richard W.M. Jones from comment #14) > This is fixed by > https://github.com/tukaani-project/xz/commit/ > 82ecc538193b380a21622aea02b0ba078e7ade92 > included in xz 5.6.1. Unfortunately (or luckily?) github has disabled the project. Is this commit available somewhere? I wonder how that "fix" worked around the valgrind memcheck errors. (In reply to Mark Wielaard from comment #18) > (In reply to Richard W.M. Jones from comment #14) > > This is fixed by > > https://github.com/tukaani-project/xz/commit/ > > 82ecc538193b380a21622aea02b0ba078e7ade92 > > included in xz 5.6.1. > > Unfortunately (or luckily?) github has disabled the project. > Is this commit available somewhere? I wonder how that "fix" worked around > the valgrind memcheck errors. I was curious as well, the project website is still hosting the source. The commit seems to be here https://git.tukaani.org/?p=xz.git;a=commit;h=82ecc538193b380a21622aea02b0ba078e7ade92 (In reply to Ehila from comment #19) > (In reply to Mark Wielaard from comment #18) > > (In reply to Richard W.M. Jones from comment #14) > > > This is fixed by > > > https://github.com/tukaani-project/xz/commit/ > > > 82ecc538193b380a21622aea02b0ba078e7ade92 > > > included in xz 5.6.1. > > > > Unfortunately (or luckily?) github has disabled the project. > > Is this commit available somewhere? I wonder how that "fix" worked around > > the valgrind memcheck errors. > > I was curious as well, the project website is still hosting the source. The > commit seems to be here > https://git.tukaani.org/?p=xz.git;a=commit; > h=82ecc538193b380a21622aea02b0ba078e7ade92 Wow, thanks. That commit does sound somewhat plausible. And I doubt I would have recognized all this as suspicious. Although it should have because there is no real reason this would only show up under valgrind (valgrind does however have an issue where interception of a ifunc can misfire, so it isn't completely unreasonable to suspect a valgrind bug here). But the "real" fix for this "valgrind issue" seems to come an hour later when some test files are updated. Which are then included in the next xz release. |