Description of problem: ======================= Frequent crashes of rustc / llvm started happening with rust 1.65.0, LLVM 15, on powerpc64le. - Older versions of Rust (up to 1.64.0) with LLVM 15 were not affected by this issue. - Older branches of Fedora (with LLVM 14 or 13) are not affected by this issue. - Other architectures do not seem to be affected by this specific issue. So this particular manifestation of this crash is indeed specific to the combination of (Rust 1.65.0, LLVM 15, powerpc64le). However, the backtrace looks similar to previously reported rustc doctest compilation issues. Previous, similar report that was specific to Rust 1.60, armv7hl, and LLVM 14: https://bugzilla.redhat.com/show_bug.cgi?id=2086106 Example backtrace: ---- src/lib.rs - (line 48) stdout ---- /lib64/librustc_driver-51908432be916ebd.so(+0x7c5d0c)[0x7fff99bf5d0c] linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x7fff9deb0464] /lib64/libLLVM-15.so(_ZN4llvm12LiveRegUnits10accumulateERKNS_12MachineInstrE+0xbc)[0x7fff92d32f7c] /lib64/libLLVM-15.so(_ZN4llvm12RegScavenger25scavengeRegisterBackwardsERKNS_19TargetRegisterClassENS_26MachineInstrBundleIteratorINS_12MachineInstrELb0EEEbib+0x164)[0x7fff92fce264] /lib64/libLLVM-15.so(+0x13cf3b0)[0x7fff92fcf3b0] /lib64/libLLVM-15.so(+0x13cef0c)[0x7fff92fcef0c] /lib64/libLLVM-15.so(_ZN4llvm24scavengeFrameVirtualRegsERNS_15MachineFunctionERNS_12RegScavengerE+0x90)[0x7fff92fce940] /lib64/libLLVM-15.so(+0x133c9d4)[0x7fff92f3c9d4] /lib64/libLLVM-15.so(_ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE+0x320)[0x7fff92dd9480] /lib64/libLLVM-15.so(_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE+0x820)[0x7fff92ad6d00] /lib64/libLLVM-15.so(_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE+0x54)[0x7fff92adf6b4] /lib64/libLLVM-15.so(_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE+0x514)[0x7fff92ad7714] /lib64/libLLVM-15.so(_ZN4llvm6legacy11PassManager3runERNS_6ModuleE+0x1c)[0x7fff92adfbdc] /lib64/librustc_driver-51908432be916ebd.so(+0x7d894c)[0x7fff99c0894c] /lib64/librustc_driver-51908432be916ebd.so(+0xaf6564)[0x7fff99f26564] /lib64/librustc_driver-51908432be916ebd.so(+0xafd618)[0x7fff99f2d618] /lib64/librustc_driver-51908432be916ebd.so(+0xa2ee34)[0x7fff99e5ee34] /lib64/librustc_driver-51908432be916ebd.so(+0xa29294)[0x7fff99e59294] /lib64/librustc_driver-51908432be916ebd.so(+0x9fa964)[0x7fff99e2a964] /lib64/librustc_driver-51908432be916ebd.so(+0xb21058)[0x7fff99f51058] /lib64/libstd-502eee8307185671.so(rust_metadata_std_d868c97109ef4fde+0xd2fdc)[0x7fff99322fdc] /lib64/libc.so.6(+0xb9af8)[0x7fff99099af8] Couldn't compile the test. Version-Release number of selected component (if applicable): ============================================================= rust-1.65.0-1.fc38.ppc64le cargo-1.65.0-1.fc38.ppc64le llvm-libs-15.0.4-1.fc38.ppc64le How reproducible: ================= 100% reproducible for affected crates: - bat - comrak - convert_case - deser-hjson - jql - json_value_merge - lexical-core - rpick - rustc-demangle - slog - rust-yubibomb Most of these crates ship applications, so disabling "--release" mode (or LTO) is not a good workaround, and I'll probably modify these packages to skip doctests on ppc64le entirely, instead ... ====================================================================== Feel free to re-assign to the correct component if the Rust 1.65.0 update itself is fine, but exposes some underlying issue with LLVM 15, instead.
I can reproduce this myself on Fedora even outside of rpmbuild, and rustc-demangle is probably the smallest example for this purpose. However, I cannot reproduce using any upstream toolchain binaries -- I tried 1.64.0, 1.65.0, beta (~1.66), and nightly (~1.67), and all of them pass. One thing I discovered is that although rustdoc is getting the -Clto flag, it is not getting any regular optimization flags when compiling the doctests. When I set RUSTDOCFLAGS=-Copt-level=3, the doctests do pass! So, it still seems like there's a real bug if mismatched optimization crashes LLVM, but at least RUSTDOCFLAGS is a simple workaround. I'll try to capture and reduce a standalone LLVM test for the crash.
I finally reproduced this in my own build -- the key difference was "./configure --debuginfo-level-std=2", whereas the upstream builds only use level 1. With LLVM assertions enabled, it does hit one, even on LLVM main, e.g.: ---- src/lib.rs - (line 14) stdout ---- Remaining virtual register operands UNREACHABLE executed at /home/jistone/llvm-project/llvm/lib/CodeGen/MachineRegisterInfo.cpp:207! Couldn't compile the test. https://github.com/llvm/llvm-project/blob/8f104b806a2837f36463277f7d9162f53b595ebd/llvm/lib/CodeGen/MachineRegisterInfo.cpp#L207 I also checked with rustc -Zverify-llvm-ir=on, and that had no complaint.
Created attachment 1926758 [details] reduced LLVM IR This test crashes "llc -O0" with llvm-15.0.4-1.fc37. In a build on LLVM main with assertions enabled, it triggers here: $ llc -O0 reduced.ll Remaining virtual register operands UNREACHABLE executed at /home/jistone/llvm-project/llvm/lib/CodeGen/MachineRegisterInfo.cpp:207! PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. Stack dump: 0. Program arguments: llc -O0 ../build-ndebug/reduced.ll 1. Running pass 'Function Pass Manager' on module '../build-ndebug/reduced.ll'. 2. Running pass 'Fast Register Allocator' on function '@"_ZN55_$LT$std..io..stdio..Stdin$u20$as$u20$std..io..Read$GT$11read_to_end17haba70a09681d41d3E"' #0 0x0000000001f5adb4 PrintStackTraceSignalHandler(void*) Signals.cpp:0:0 #1 0x0000000001f584eb SignalHandler(int) Signals.cpp:0:0 #2 0x00007efcfe780b50 __restore_rt (/lib64/libc.so.6+0x3cb50) #3 0x00007efcfe7d0e7c __pthread_kill_implementation (/lib64/libc.so.6+0x8ce7c) #4 0x00007efcfe780aa6 gsignal (/lib64/libc.so.6+0x3caa6) #5 0x00007efcfe76a7fc abort (/lib64/libc.so.6+0x267fc) #6 0x0000000001eb7b8a (/home/jistone/llvm-project/build/bin/llc+0x1eb7b8a) #7 0x00000000011ba332 llvm::MachineRegisterInfo::clearVirtRegs() (/home/jistone/llvm-project/build/bin/llc+0x11ba332) #8 0x00000000012cca4c (anonymous namespace)::RegAllocFast::runOnMachineFunction(llvm::MachineFunction&) RegAllocFast.cpp:0:0 #9 0x000000000112883b llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (.part.0) MachineFunctionPass.cpp:0:0 #10 0x0000000001665c50 llvm::FPPassManager::runOnFunction(llvm::Function&) (/home/jistone/llvm-project/build/bin/llc+0x1665c50) #11 0x0000000001665d91 llvm::FPPassManager::runOnModule(llvm::Module&) (/home/jistone/llvm-project/build/bin/llc+0x1665d91) #12 0x0000000001666657 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/home/jistone/llvm-project/build/bin/llc+0x1666657) #13 0x0000000000657a66 main (/home/jistone/llvm-project/build/bin/llc+0x657a66) #14 0x00007efcfe76b510 __libc_start_call_main (/lib64/libc.so.6+0x27510) #15 0x00007efcfe76b5c9 __libc_start_main.5 (/lib64/libc.so.6+0x275c9) #16 0x00000000006d7995 _start (/home/jistone/llvm-project/build/bin/llc+0x6d7995) Aborted (core dumped)
I tried a mock build of llvm.spec with my upstream patch, and confirmed it does fix the rust doctests. https://github.com/llvm/llvm-project/issues/59172#issuecomment-1325789049
I've now added the simple "disable LTO" workaround to all library-only crates, where this has no effect on binary quality: - convert_case - deser-hjson - json_value_merge - lexical-core - rustc-demangle - slog The packages that ship binaries will be updated with the more-annoying-to-implement "do not run doctests on ppc64le" workaround later, as necessary.
I've now pushed workarounds for this issue to the remaining packages to ensure they don't fail to build during the upcoming mass rebuild.
My LLVM patch landed in main (16), and is under consideration for backport to 15.x: https://github.com/llvm/llvm-project-release-prs/pull/228 If that doesn't happen, we can also consider patching Fedora LLVM ourselves.
Should be fixed on rawhide by https://bodhi.fedoraproject.org/updates/FEDORA-2023-387545a496.
As far as I can tell, the crash is fixed, yes. I tried re-enabling LTO for some of the affected packages and they now build without problems. Thanks!