Bug 2226905 - curve25519-dalek tests compiled with rust 1.71.0 crash with SIGSEGV on s390x
Summary: curve25519-dalek tests compiled with rust 1.71.0 crash with SIGSEGV on s390x
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: rust
Version: 39
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Rust SIG
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 2226527
TreeView+ depends on / blocked
 
Reported: 2023-07-26 22:29 UTC by Fabio Valentini
Modified: 2023-08-16 08:13 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Fabio Valentini 2023-07-26 22:29:15 UTC
The package for curve25519-dalek failed during the F39 mass rebuild with Rust 1.71 on s390x only, with a test crashing with SIGSEGV.

This happens on both rawhide and fedora 38, but not fedora 37 (which still has Rust 1.70).

Running cargo test with "--test-threads 1" in mock / qemu I get the following culprit:

test edwards::test::basepoint_tables ... qemu: uncaught target signal 11 (Segmentation fault) - core dumped


Reproducible: Always

Steps to Reproduce:
1. fedpkg clone rust-curve25519-dalek
2. cd rust-curve25519-dalek
3. fedpkg srpm
4. mock -r fedora-rawhide-s390x ./*.src.rpm
Actual Results:  
Tests crash with a segmentation fault on s390x.

Expected Results:  
Tests pass (as they did with Rust <= 1.70).

Comment 1 Josh Stone 2023-07-27 19:50:20 UTC
I can reproduce this on the latest curve25519-dalek in its git repo, with upstream builds of 1.71.0, 1.72-beta, and 1.73-nightly, but upstream 1.70.0 is fine.

RUSTFLAGS=-Ccodegen-units=1 cargo test --lib --release

Comment 2 Josh Stone 2023-07-27 22:40:05 UTC
cargo-bisect-rustc found this:

searched nightlies: from nightly-2023-02-28 to nightly-2023-07-27
regressed nightly: nightly-2023-05-09
searched commit range: https://github.com/rust-lang/rust/compare/c4190f2d3a46a59f435f7b42f58bc22b2f4d6917...2f2c438dce75d8cc532c3baa849eeddc0901802c
regressed commit: https://github.com/rust-lang/rust/commit/dfe31889e10e36eed53327d1ca624fbf21b475a5

But that's pretty surprising that turning OFF an optimization would cause problems, unless it was masking something else.

Comment 3 Fabio Valentini 2023-07-27 22:45:13 UTC
Thanks for investigating! That seems suspicious, yes.

Looking at the code for the failing test, I don't see anything suspicious though ...
https://github.com/dalek-cryptography/curve25519-dalek/blob/3.2.1/src/edwards.rs#L1408-L1432

Comment 4 Josh Stone 2023-07-28 00:42:36 UTC
Well, I can confirm that forcing that pass on with -Zmir-enable-passes=+RenameReturnPlace fixes the test on all toolchains, and likewise forcing it off (-) breaks it with earlier toolchains that were working. I'll try more testing with that off to see if there's an underlying regression point.

Comment 5 Josh Stone 2023-07-28 05:32:09 UTC
With -Zmir-opt-level=0 (because -Zmir-enable-passes isn't old enough), I bisected to nightly-2022-03-10 working, nightly-2022-03-11 crashing.

https://github.com/rust-lang/rust/compare/458262b1315e0de7be940fe95e111bb045e4a2a4...5f4e0677190b82e61dc507e3e72caf89da8e5e28

That includes commit 0c7d0a1 "Use new pass manager on s390x with LLVM 14", and indeed adding -Znew-llvm-pass-manager=no makes it work again. But current Rust and LLVM don't have that option anymore, and anyway I should stop black-boxing this and figure out what's actually changing in the codegen output to make this crash. :)

Comment 6 Fedora Release Engineering 2023-08-16 08:13:39 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 39 development cycle.
Changing version to 39.


Note You need to log in before you can comment on or make changes to this bug.