Bug 1883457 - coreos-installer test segfaults with rust-1.46/llvm-11
Summary: coreos-installer test segfaults with rust-1.46/llvm-11
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: llvm
Version: 34
Hardware: s390x
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Tom Stellard
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: ZedoraTracker
TreeView+ depends on / blocked
 
Reported: 2020-09-29 09:23 UTC by Dan Horák
Modified: 2022-06-08 00:59 UTC (History)
14 users (show)

Fixed In Version: llvm-11.0.0-1.fc34
Clone Of:
Environment:
Last Closed: 2022-06-08 00:59:48 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github rust-lang rust issues 77382 0 None closed coreos-installer test segfaults on s390x-unknown-linux-gnu 2021-01-12 13:07:17 UTC
LLVM 47736 0 P RESOLVED SystemZ reordered store/compare clobbers CC 2021-01-12 13:07:21 UTC

Description Dan Horák 2020-09-29 09:23:25 UTC
We see a test in coreos-installer build to crash with a segfault when build in recent rawhide buildroot.

[sharkcz@devel10 rust-coreos-installer]$ /home/sharkcz/rust-coreos-installer/coreos-installer-0.7.0/target/release/deps/libcoreinst-e60dceed92ae9902

running 16 tests
test blockdev::tests::disk_sector_size_reader ... ok
test blockdev::tests::lsblk_split ... ok
test blockdev::tests::test_saved_partitions ... ok
test cmdline::tests::test_parse_partition_filters ... ok
test download::tests::test_image_copy_default_first_mb ... ok
test download::tests::test_write_image_limit ... Neoprávněný přístup do paměti (SIGSEGV) (core dumped [obraz paměti uložen])


I suspect something is wrong in llvm 11 ...


Version-Release number of selected component (if applicable):
BAD = rust-1.46.0-2.fc34 + llvm-libs-11.0.0-0.8.rc3.fc34
OK = rust-1.45.2-1.fc33 + llvm10-libs-10.0.0-9.fc34

How reproducible:
100%

Steps to Reproduce:
1. rebuild rust-coreos-installer

Comment 1 Dan Horák 2020-09-29 09:30:11 UTC
backtrace from gdb

(gdb) where
#0  0x000002aa0c3616c4 in core::ptr::drop_in_place () at /builddir/build/BUILD/rustc-1.46.0-src/src/libcore/ptr/mod.rs:184
#1  core::ptr::drop_in_place () at /builddir/build/BUILD/rustc-1.46.0-src/src/libcore/ptr/mod.rs:184
#2  core::ptr::drop_in_place () at /builddir/build/BUILD/rustc-1.46.0-src/src/libcore/ptr/mod.rs:184
#3  core::ptr::drop_in_place () at /builddir/build/BUILD/rustc-1.46.0-src/src/libcore/ptr/mod.rs:184
#4  core::ptr::drop_in_place () at /builddir/build/BUILD/rustc-1.46.0-src/src/libcore/ptr/mod.rs:184
#5  <libcoreinst::source::FileLocation as libcoreinst::source::ImageLocation>::sources (self=0x3ffcc3fc4f8) at src/source.rs:135
#6  libcoreinst::download::tests::test_write_image_limit () at src/download.rs:489
#7  0x000002aa0c36ccb2 in std::panicking::try::do_call ()
#8  0x000002aa0c3a224c in __rust_try ()
#9  0x000002aa0c39ecbe in test::run_test::run_test_inner::{{closure}} ()
#10 0x000002aa0c39e4b6 in test::run_test::run_test_inner ()
#11 0x000002aa0c39ce6e in test::run_test ()
#12 0x000002aa0c39559c in test::run_tests ()
#13 0x000002aa0c37ef4e in test::console::run_tests_console ()
#14 0x000002aa0c392a54 in test::test_main ()
#15 0x000002aa0c3943d4 in test::test_main_static ()
#16 0x000002aa0c310a72 in std::rt::lang_start::{{closure}} () at /builddir/build/BUILD/rustc-1.46.0-src/src/libstd/rt.rs:67
#17 0x000002aa0c664446 in std::panicking::try::do_call ()
#18 0x000002aa0c66daa4 in __rust_try ()
#19 0x000002aa0c665156 in std::rt::lang_start_internal ()
#20 0x000002aa0c310a58 in std::rt::lang_start (main=<optimized out>, argc=<optimized out>, argv=<optimized out>) at /builddir/build/BUILD/rustc-1.46.0-src/src/libstd/rt.rs:67
#21 0x000003ff9b8abbda in __libc_start_main () from /lib64/libc.so.6
#22 0x000002aa0c30cdf4 in _start ()

Comment 2 Dan Horák 2020-09-29 10:09:23 UTC
build in koji is https://koji.fedoraproject.org/koji/taskinfo?taskID=52413150

Comment 3 Dan Horák 2020-09-29 14:22:42 UTC
I have made a build of rust 1.46 with llvm-10 (https://koji.fedoraproject.org/koji/taskinfo?taskID=52444549) and the test crashes there as well. So this could be really a rust issue, not a LLVM one.

[sharkcz@devel10 rust-coreos-installer]$ /home/sharkcz/rust-coreos-installer/coreos-installer-0.7.0/target/release/deps/libcoreinst-135b46c0efd4c9e3

running 16 tests
test blockdev::tests::disk_sector_size_reader ... ok
test blockdev::tests::lsblk_split ... ok
test blockdev::tests::test_saved_partitions ... ok
test cmdline::tests::test_parse_partition_filters ... ok
test download::tests::test_image_copy_default_first_mb ... ok
test download::tests::test_write_image_limit ... Segmentation fault (core dumped)


For the record - my builds were done with a single CPU system, the output is different on a multi-CPU system

Comment 4 Josh Stone 2020-09-29 21:19:16 UTC
It looks like upstream coreos-installer has been dancing around s390x issues:

https://github.com/coreos/coreos-installer/pull/360
https://github.com/coreos/coreos-installer/issues/372
https://github.com/coreos/coreos-installer/pull/373

I don't know if those changes have anything to do with the test in question here.

Comment 5 Benjamin Gilbert 2020-09-29 22:02:43 UTC
All three of those issues came down to an LTO bug in Rust 1.43 and 1.44: https://github.com/coreos/coreos-installer/issues/372#issuecomment-686424629.  We didn't see it in FCOS, which was already on 1.45.  We ended up making net no code changes for it (there's a PR plus a revert) and just disabled LTO in the RHCOS package.

The issue reported here is not known to be related.  I could try disabling LTO in the package.

Comment 6 Benjamin Gilbert 2020-09-29 22:17:33 UTC
Looks like it also fails with LTO disabled: https://koji.fedoraproject.org/koji/taskinfo?taskID=52479791

Comment 7 Josh Stone 2020-09-29 22:19:28 UTC
Ah, right, I think that s390x LTO issue was bug 1837660. That was a build failure, but I guess it could break in other ways. Current LLVM shouldn't be affected by that particular issue.

Comment 8 Josh Stone 2020-09-30 17:45:35 UTC
I can reproduce this, but only in optimized builds with "-Ccodegen-units=1", which rust-packaging %cargo_prep sets in ".cargo/config". So if you need an immediate workaround, you could edit that file to increase or remove that argument. I believe the default is 16 in release builds.

I'll try to bisect the rust change, but the build is really slow in the beaker machine I got...

Comment 9 Benjamin Gilbert 2020-09-30 22:58:21 UTC
Josh, thanks for the workaround.  Applied in rust-coreos-installer-0.7.0-2.fc34.

Comment 10 Josh Stone 2020-10-15 00:28:23 UTC
I've confirmed that the upstream LLVM patch fixes the problem here.
https://reviews.llvm.org/D89034

Comment 11 Josh Stone 2020-10-19 17:23:45 UTC
This should be fixed in rawhide now -- do the coreos folks need this backported to stable branches too?

Comment 12 Benjamin Gilbert 2020-10-19 17:48:53 UTC
It'd be useful, but not required.  Per comment 9 we have a workaround in the coreos-installer package.

Thanks for all your work to track this down!

Comment 13 Ben Cotton 2021-02-09 15:19:10 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 34 development cycle.
Changing version to 34.

Comment 14 Ben Cotton 2022-05-12 16:39:46 UTC
This message is a reminder that Fedora Linux 34 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 34 on 2022-06-07.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '34'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 34 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 15 Ben Cotton 2022-06-08 00:59:48 UTC
Fedora Linux 34 entered end-of-life (EOL) status on 2022-06-07.

Fedora Linux 34 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.