The package for curve25519-dalek failed during the F39 mass rebuild with Rust 1.71 on s390x only, with a test crashing with SIGSEGV. This happens on both rawhide and fedora 38, but not fedora 37 (which still has Rust 1.70). Running cargo test with "--test-threads 1" in mock / qemu I get the following culprit: test edwards::test::basepoint_tables ... qemu: uncaught target signal 11 (Segmentation fault) - core dumped Reproducible: Always Steps to Reproduce: 1. fedpkg clone rust-curve25519-dalek 2. cd rust-curve25519-dalek 3. fedpkg srpm 4. mock -r fedora-rawhide-s390x ./*.src.rpm Actual Results: Tests crash with a segmentation fault on s390x. Expected Results: Tests pass (as they did with Rust <= 1.70).
I can reproduce this on the latest curve25519-dalek in its git repo, with upstream builds of 1.71.0, 1.72-beta, and 1.73-nightly, but upstream 1.70.0 is fine. RUSTFLAGS=-Ccodegen-units=1 cargo test --lib --release
cargo-bisect-rustc found this: searched nightlies: from nightly-2023-02-28 to nightly-2023-07-27 regressed nightly: nightly-2023-05-09 searched commit range: https://github.com/rust-lang/rust/compare/c4190f2d3a46a59f435f7b42f58bc22b2f4d6917...2f2c438dce75d8cc532c3baa849eeddc0901802c regressed commit: https://github.com/rust-lang/rust/commit/dfe31889e10e36eed53327d1ca624fbf21b475a5 But that's pretty surprising that turning OFF an optimization would cause problems, unless it was masking something else.
Thanks for investigating! That seems suspicious, yes. Looking at the code for the failing test, I don't see anything suspicious though ... https://github.com/dalek-cryptography/curve25519-dalek/blob/3.2.1/src/edwards.rs#L1408-L1432
Well, I can confirm that forcing that pass on with -Zmir-enable-passes=+RenameReturnPlace fixes the test on all toolchains, and likewise forcing it off (-) breaks it with earlier toolchains that were working. I'll try more testing with that off to see if there's an underlying regression point.
With -Zmir-opt-level=0 (because -Zmir-enable-passes isn't old enough), I bisected to nightly-2022-03-10 working, nightly-2022-03-11 crashing. https://github.com/rust-lang/rust/compare/458262b1315e0de7be940fe95e111bb045e4a2a4...5f4e0677190b82e61dc507e3e72caf89da8e5e28 That includes commit 0c7d0a1 "Use new pass manager on s390x with LLVM 14", and indeed adding -Znew-llvm-pass-manager=no makes it work again. But current Rust and LLVM don't have that option anymore, and anyway I should stop black-boxing this and figure out what's actually changing in the codegen output to make this crash. :)
This bug appears to have been reported against 'rawhide' during the Fedora Linux 39 development cycle. Changing version to 39.
This is still happening with the latest version of rustc and LLVM in Rawhide and the latest version of curve25519-dalek. I actually need to build this package because something I'm working on needs a newer version, so I'll need to disable tests on s390x for now.
I think this is an issue in post RA pseudo expansion. We go from renamable $r2q = L128 $r15d, 14920, killed $r2d :: (load (s128) from %stack.13, align 8) to $r2d = LG $r15d, 14920, $r2d :: (load (s128) from %stack.13, align 8) $r3d = LG $r15d, 14928, killed $r2d :: (load (s128) from %stack.13, align 8) Note how $r2d gets over-written now.
I've filed https://github.com/llvm/llvm-project/issues/91437 with that finding for now.
Thank you! Looks like it's fixed in the development branch and on its way to be backported to the LLVM 18 branch :party:
As far as I can tell, the fix for this is in llvm 18.1.6+, which is available on Fedora 40+. Thanks!