Hide Forgot
This rawhide build has been running for almost 5 hours now: https://koji.fedoraproject.org/koji/taskinfo?taskID=83328158 The build for the same version on Fedora 36 with Rust 1.58.1 finished within 13 minutes, and that included the snail-speed armv7hl, which is no longer even built on rawhide. I have built other crates on rawhide with Rust 1.59.0 and none of those had problems, but the build for rust-inferno-0.10.8-3.fc37 seems to have gotten stuck, with none of the builds of crate dependencies ever finishing (there's only log output for them getting started, but none of them finishing): from the log (in %build): + /usr/bin/env CARGO_HOME=.cargo RUSTC_BOOTSTRAP=1 'RUSTFLAGS=-Copt-level=3 -Cdebuginfo=2 -Ccodegen-units=1 -Clink-arg=-Wl,-z,relro -Clink-arg=-Wl,-z,now -Clink-arg=-Wl,-dT,/builddir/build/BUILD/inferno-0.10.8/.package_note-rust-inferno-0.10.8-3.fc37.s390x.ld --cap-lints=warn' /usr/bin/cargo build -j2 -Z avoid-dev-deps --release Compiling version_check v0.9.4 Compiling libc v0.2.119 Compiling proc-macro2 v1.0.36 (..) Compiling clap v2.34.0 Compiling dashmap v4.0.2 Compiling structopt-derive v0.4.18 (and then nothing happens)
It happened again, with clap_derive: https://koji.fedoraproject.org/koji/taskinfo?taskID=83366868 Looks like it might be related to compilation of procedural macros? inferno got stuck with structopt-derive, clap_derive is a proc-macro crate.
Looks like it also affects alacritty: https://koji.fedoraproject.org/koji/taskinfo?taskID=83353501
I think I'll rebuild with ExcludeArch for now. I seriously doubt that anyone is using alacritty on s390x.
CCing a few LLVM folks, as that's where it appears to be stuck, according to "perf top": Overhead Shared Object Symbol 33.23% libLLVM-13.so [.] llvm::CodeMetrics::collectEphemeralValues 20.13% libLLVM-13.so [.] llvm::isSafeToSpeculativelyExecute 19.12% libLLVM-13.so [.] llvm::SmallPtrSetImplBase::insert_imp_big 8.35% libLLVM-13.so [.] llvm::SmallPtrSetImplBase::Grow 6.71% libLLVM-13.so [.] llvm::SmallPtrSetImplBase::FindBucketFor 2.09% libLLVM-13.so [.] llvm::CallGraphNode::removeCallEdgeFor 1.24% libLLVM-13.so [.] 0x0000000002913e26 I'll refrain from re-assigning the component for now, as I'm not sure why this would only start with Rust 1.59...
I just confirmed that this is affecting rust 1.59 on fedora-{rawhide,36,35,34}-x390x. A good test case is the "rust-clap_derive" package (without my "revert building on s390x" commit from the rawhide branch), as it has few dependencies, and compiles relatively fast, even in QEMU (or at least, it *would* be fast, if it worked). So it appears that LLVM 13 is not to blame, since Rust on Fedora 34 is using LLVM 12.
Using cargo-bisect-rustc found a regression point between 1.58.1 and 1.59.0. --- searched nightlies: from nightly-2021-12-01 to nightly-2022-02-28 regressed nightly: nightly-2021-12-24 searched commit range: https://github.com/rust-lang/rust/compare/34926f0a1681458588a2d4240c0715ef9eff7d35...c09a9529c51cde41c1101e56049d418edb07bf71 regressed commit: https://github.com/rust-lang/rust/commit/e98309298d927307c5184f4869604bd068d26183 <details> <summary>bisected with <a href='https://github.com/rust-lang/cargo-bisect-rustc'>cargo-bisect-rustc</a> v0.6.1</summary> Host triple: s390x-unknown-linux-gnu Reproduce with: ```bash cargo bisect-rustc --access github --start 2021-12-01 --end 2022-02-28 --timeout 300 -- build --release ``` </details> --- That merge commit is for https://github.com/rust-lang/rust/pull/90408 The change to ItemSortKey looks suspicious to me, though I'm not sure why that would get it stuck in LLVM, but I'm testing a possible fix for that now. If that doesn't work, I'll file a Rust issue and follow up that way.
@Josh Do you happen to have an IR reproducer that hangs opt? My first suspicion here was more NewPM catastrophic inlining (I believe that landed in 1.59), but we do disable that for both s390x and LLVM < 13, so that can't be it. Here's an old report for a hang in collectEphemeralValues(): https://github.com/rust-lang/rust/issues/66617
My fix does appear to solve it: https://github.com/rust-lang/rust/pull/94505 And a scratch build: https://koji.fedoraproject.org/koji/taskinfo?taskID=83519938 But I'm not entirely satisfied with how LLVM would be affected, so I still captured bitcode for it. https://jistone.fedorapeople.org/bz2058803/ We don't need rpmbuild to reproduce the problem. In clean source of "clap_derive 3.12", for each compiler I ran: $ RUSTFLAGS=-Ccodegen-units=1 cargo rustc --release -- -Csave-temps For 1.59.0-1.fc37 and nightly-2022-03-01, the file I uploaded is the last one it wrote before hanging. Then I picked the equivalent no-opt.bc for 1.59.0-2.fc37 and "patched" (same nightly), both with my PR fix. However, I haven't found any way to make opt hang on the "bad" ones...
> In clean source of "clap_derive 3.12", Oops, typo, that should be 3.1.2.
FEDORA-2022-c9bd6f0053 has been submitted as an update to Fedora 37. https://bodhi.fedoraproject.org/updates/FEDORA-2022-c9bd6f0053
FEDORA-2022-c9bd6f0053 has been pushed to the Fedora 37 stable repository. If problem still persists, please make note of it in this bug report.
> However, I haven't found any way to make opt hang on the "bad" ones... I can reproduce the hang with "opt -O2 -enable-new-pm=0" on current LLVM HEAD.
Based on the inlining debug log, I strongly suspect that this is the same catastrophic cross-SCC inlining issue that we've previously seen with the new pass manager (it's another instance involving recursive drop glue). This just happens to be a case where it occurs with the legacy pass manager, but not the new pass manager. I don't see an obvious way to port the NewPM fix from https://reviews.llvm.org/D120584 to the LegacyPM inliner, because we don't have a direct way to fetch the SCC for a CG node there. Though that should become a moot point soon(TM) anyway, with the legacy PM going away.
> I can reproduce the hang with "opt -O2 -enable-new-pm=0" on current LLVM HEAD. Oh, is it default-enabled in opt now too? Maybe that option should be renamed, but it's going away anyway... Too bad if this is the inlining thing again, because then the Rust change is just incidental bad luck. I do think that fix is still correct in its own right, but the LLVM problem will remain lurking until 15, I guess. Or if D120584 is feasible for the 14 release branch, maybe we can wean Rust from oldPM even on s390x with 14.