We see a test in coreos-installer build to crash with a segfault when build in recent rawhide buildroot. [sharkcz@devel10 rust-coreos-installer]$ /home/sharkcz/rust-coreos-installer/coreos-installer-0.7.0/target/release/deps/libcoreinst-e60dceed92ae9902 running 16 tests test blockdev::tests::disk_sector_size_reader ... ok test blockdev::tests::lsblk_split ... ok test blockdev::tests::test_saved_partitions ... ok test cmdline::tests::test_parse_partition_filters ... ok test download::tests::test_image_copy_default_first_mb ... ok test download::tests::test_write_image_limit ... Neoprávněný přístup do paměti (SIGSEGV) (core dumped [obraz paměti uložen]) I suspect something is wrong in llvm 11 ... Version-Release number of selected component (if applicable): BAD = rust-1.46.0-2.fc34 + llvm-libs-11.0.0-0.8.rc3.fc34 OK = rust-1.45.2-1.fc33 + llvm10-libs-10.0.0-9.fc34 How reproducible: 100% Steps to Reproduce: 1. rebuild rust-coreos-installer
backtrace from gdb (gdb) where #0 0x000002aa0c3616c4 in core::ptr::drop_in_place () at /builddir/build/BUILD/rustc-1.46.0-src/src/libcore/ptr/mod.rs:184 #1 core::ptr::drop_in_place () at /builddir/build/BUILD/rustc-1.46.0-src/src/libcore/ptr/mod.rs:184 #2 core::ptr::drop_in_place () at /builddir/build/BUILD/rustc-1.46.0-src/src/libcore/ptr/mod.rs:184 #3 core::ptr::drop_in_place () at /builddir/build/BUILD/rustc-1.46.0-src/src/libcore/ptr/mod.rs:184 #4 core::ptr::drop_in_place () at /builddir/build/BUILD/rustc-1.46.0-src/src/libcore/ptr/mod.rs:184 #5 <libcoreinst::source::FileLocation as libcoreinst::source::ImageLocation>::sources (self=0x3ffcc3fc4f8) at src/source.rs:135 #6 libcoreinst::download::tests::test_write_image_limit () at src/download.rs:489 #7 0x000002aa0c36ccb2 in std::panicking::try::do_call () #8 0x000002aa0c3a224c in __rust_try () #9 0x000002aa0c39ecbe in test::run_test::run_test_inner::{{closure}} () #10 0x000002aa0c39e4b6 in test::run_test::run_test_inner () #11 0x000002aa0c39ce6e in test::run_test () #12 0x000002aa0c39559c in test::run_tests () #13 0x000002aa0c37ef4e in test::console::run_tests_console () #14 0x000002aa0c392a54 in test::test_main () #15 0x000002aa0c3943d4 in test::test_main_static () #16 0x000002aa0c310a72 in std::rt::lang_start::{{closure}} () at /builddir/build/BUILD/rustc-1.46.0-src/src/libstd/rt.rs:67 #17 0x000002aa0c664446 in std::panicking::try::do_call () #18 0x000002aa0c66daa4 in __rust_try () #19 0x000002aa0c665156 in std::rt::lang_start_internal () #20 0x000002aa0c310a58 in std::rt::lang_start (main=<optimized out>, argc=<optimized out>, argv=<optimized out>) at /builddir/build/BUILD/rustc-1.46.0-src/src/libstd/rt.rs:67 #21 0x000003ff9b8abbda in __libc_start_main () from /lib64/libc.so.6 #22 0x000002aa0c30cdf4 in _start ()
build in koji is https://koji.fedoraproject.org/koji/taskinfo?taskID=52413150
I have made a build of rust 1.46 with llvm-10 (https://koji.fedoraproject.org/koji/taskinfo?taskID=52444549) and the test crashes there as well. So this could be really a rust issue, not a LLVM one. [sharkcz@devel10 rust-coreos-installer]$ /home/sharkcz/rust-coreos-installer/coreos-installer-0.7.0/target/release/deps/libcoreinst-135b46c0efd4c9e3 running 16 tests test blockdev::tests::disk_sector_size_reader ... ok test blockdev::tests::lsblk_split ... ok test blockdev::tests::test_saved_partitions ... ok test cmdline::tests::test_parse_partition_filters ... ok test download::tests::test_image_copy_default_first_mb ... ok test download::tests::test_write_image_limit ... Segmentation fault (core dumped) For the record - my builds were done with a single CPU system, the output is different on a multi-CPU system
It looks like upstream coreos-installer has been dancing around s390x issues: https://github.com/coreos/coreos-installer/pull/360 https://github.com/coreos/coreos-installer/issues/372 https://github.com/coreos/coreos-installer/pull/373 I don't know if those changes have anything to do with the test in question here.
All three of those issues came down to an LTO bug in Rust 1.43 and 1.44: https://github.com/coreos/coreos-installer/issues/372#issuecomment-686424629. We didn't see it in FCOS, which was already on 1.45. We ended up making net no code changes for it (there's a PR plus a revert) and just disabled LTO in the RHCOS package. The issue reported here is not known to be related. I could try disabling LTO in the package.
Looks like it also fails with LTO disabled: https://koji.fedoraproject.org/koji/taskinfo?taskID=52479791
Ah, right, I think that s390x LTO issue was bug 1837660. That was a build failure, but I guess it could break in other ways. Current LLVM shouldn't be affected by that particular issue.
I can reproduce this, but only in optimized builds with "-Ccodegen-units=1", which rust-packaging %cargo_prep sets in ".cargo/config". So if you need an immediate workaround, you could edit that file to increase or remove that argument. I believe the default is 16 in release builds. I'll try to bisect the rust change, but the build is really slow in the beaker machine I got...
Josh, thanks for the workaround. Applied in rust-coreos-installer-0.7.0-2.fc34.
I've confirmed that the upstream LLVM patch fixes the problem here. https://reviews.llvm.org/D89034
This should be fixed in rawhide now -- do the coreos folks need this backported to stable branches too?
It'd be useful, but not required. Per comment 9 we have a workaround in the coreos-installer package. Thanks for all your work to track this down!
This bug appears to have been reported against 'rawhide' during the Fedora 34 development cycle. Changing version to 34.
This message is a reminder that Fedora Linux 34 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora Linux 34 on 2022-06-07. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a 'version' of '34'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, change the 'version' to a later Fedora Linux version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora Linux 34 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora Linux, you are encouraged to change the 'version' to a later version prior to this bug being closed.
Fedora Linux 34 entered end-of-life (EOL) status on 2022-06-07. Fedora Linux 34 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. Thank you for reporting this bug and we are sorry it could not be fixed.