Bug 2142648 - rustc crashes when compiling doctests with release mode + LTO on ppc64le with LLVM 15
Summary: rustc crashes when compiling doctests with release mode + LTO on ppc64le with...
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: llvm
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Tom Stellard
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-11-14 18:04 UTC by Fabio Valentini
Modified: 2023-02-02 21:15 UTC (History)
15 users (show)

Fixed In Version: llvm-15.0.7-1.fc38
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-02-02 02:24:14 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
reduced LLVM IR (22.52 KB, text/plain)
2022-11-23 18:28 UTC, Josh Stone
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github llvm llvm-project issues 59172 0 None open `RegAllocFast::runOnMachineFunction` misses a virtual register 2022-11-23 18:29:05 UTC

Description Fabio Valentini 2022-11-14 18:04:51 UTC
Description of problem:
=======================

Frequent crashes of rustc / llvm started happening with rust 1.65.0, LLVM 15, on powerpc64le.

- Older versions of Rust (up to 1.64.0) with LLVM 15 were not affected by this issue.
- Older branches of Fedora (with LLVM 14 or 13) are not affected by this issue.
- Other architectures do not seem to be affected by this specific issue.

So this particular manifestation of this crash is indeed specific to the combination of (Rust 1.65.0, LLVM 15, powerpc64le).

However, the backtrace looks similar to previously reported rustc doctest compilation issues.

Previous, similar report that was specific to Rust 1.60, armv7hl, and LLVM 14:
https://bugzilla.redhat.com/show_bug.cgi?id=2086106

Example backtrace:

---- src/lib.rs - (line 48) stdout ----
/lib64/librustc_driver-51908432be916ebd.so(+0x7c5d0c)[0x7fff99bf5d0c]
linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x7fff9deb0464]
/lib64/libLLVM-15.so(_ZN4llvm12LiveRegUnits10accumulateERKNS_12MachineInstrE+0xbc)[0x7fff92d32f7c]
/lib64/libLLVM-15.so(_ZN4llvm12RegScavenger25scavengeRegisterBackwardsERKNS_19TargetRegisterClassENS_26MachineInstrBundleIteratorINS_12MachineInstrELb0EEEbib+0x164)[0x7fff92fce264]
/lib64/libLLVM-15.so(+0x13cf3b0)[0x7fff92fcf3b0]
/lib64/libLLVM-15.so(+0x13cef0c)[0x7fff92fcef0c]
/lib64/libLLVM-15.so(_ZN4llvm24scavengeFrameVirtualRegsERNS_15MachineFunctionERNS_12RegScavengerE+0x90)[0x7fff92fce940]
/lib64/libLLVM-15.so(+0x133c9d4)[0x7fff92f3c9d4]
/lib64/libLLVM-15.so(_ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE+0x320)[0x7fff92dd9480]
/lib64/libLLVM-15.so(_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE+0x820)[0x7fff92ad6d00]
/lib64/libLLVM-15.so(_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE+0x54)[0x7fff92adf6b4]
/lib64/libLLVM-15.so(_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE+0x514)[0x7fff92ad7714]
/lib64/libLLVM-15.so(_ZN4llvm6legacy11PassManager3runERNS_6ModuleE+0x1c)[0x7fff92adfbdc]
/lib64/librustc_driver-51908432be916ebd.so(+0x7d894c)[0x7fff99c0894c]
/lib64/librustc_driver-51908432be916ebd.so(+0xaf6564)[0x7fff99f26564]
/lib64/librustc_driver-51908432be916ebd.so(+0xafd618)[0x7fff99f2d618]
/lib64/librustc_driver-51908432be916ebd.so(+0xa2ee34)[0x7fff99e5ee34]
/lib64/librustc_driver-51908432be916ebd.so(+0xa29294)[0x7fff99e59294]
/lib64/librustc_driver-51908432be916ebd.so(+0x9fa964)[0x7fff99e2a964]
/lib64/librustc_driver-51908432be916ebd.so(+0xb21058)[0x7fff99f51058]
/lib64/libstd-502eee8307185671.so(rust_metadata_std_d868c97109ef4fde+0xd2fdc)[0x7fff99322fdc]
/lib64/libc.so.6(+0xb9af8)[0x7fff99099af8]
Couldn't compile the test.

Version-Release number of selected component (if applicable):
=============================================================

rust-1.65.0-1.fc38.ppc64le
cargo-1.65.0-1.fc38.ppc64le
llvm-libs-15.0.4-1.fc38.ppc64le

How reproducible:
=================

100% reproducible for affected crates:

- bat
- comrak
- convert_case
- deser-hjson
- jql
- json_value_merge
- lexical-core
- rpick
- rustc-demangle
- slog
- rust-yubibomb

Most of these crates ship applications, so disabling "--release" mode (or LTO) is not a good workaround, and I'll probably modify these packages to skip doctests on ppc64le entirely, instead ...

======================================================================

Feel free to re-assign to the correct component if the Rust 1.65.0 update itself is fine, but exposes some underlying issue with LLVM 15, instead.

Comment 1 Josh Stone 2022-11-18 01:31:11 UTC
I can reproduce this myself on Fedora even outside of rpmbuild, and rustc-demangle is probably the smallest example for this purpose. However, I cannot reproduce using any upstream toolchain binaries -- I tried 1.64.0, 1.65.0, beta (~1.66), and nightly (~1.67), and all of them pass.

One thing I discovered is that although rustdoc is getting the -Clto flag, it is not getting any regular optimization flags when compiling the doctests. When I set RUSTDOCFLAGS=-Copt-level=3, the doctests do pass!

So, it still seems like there's a real bug if mismatched optimization crashes LLVM, but at least RUSTDOCFLAGS is a simple workaround. I'll try to capture and reduce a standalone LLVM test for the crash.

Comment 2 Josh Stone 2022-11-22 20:29:51 UTC
I finally reproduced this in my own build -- the key difference was "./configure --debuginfo-level-std=2", whereas the upstream builds only use level 1.

With LLVM assertions enabled, it does hit one, even on LLVM main, e.g.:

---- src/lib.rs - (line 14) stdout ----
Remaining virtual register operands
UNREACHABLE executed at /home/jistone/llvm-project/llvm/lib/CodeGen/MachineRegisterInfo.cpp:207!
Couldn't compile the test.

https://github.com/llvm/llvm-project/blob/8f104b806a2837f36463277f7d9162f53b595ebd/llvm/lib/CodeGen/MachineRegisterInfo.cpp#L207

I also checked with rustc -Zverify-llvm-ir=on, and that had no complaint.

Comment 3 Josh Stone 2022-11-23 18:28:01 UTC
Created attachment 1926758 [details]
reduced LLVM IR

This test crashes "llc -O0" with llvm-15.0.4-1.fc37. In a build on LLVM main with assertions enabled, it triggers here:

$ llc -O0 reduced.ll
Remaining virtual register operands
UNREACHABLE executed at /home/jistone/llvm-project/llvm/lib/CodeGen/MachineRegisterInfo.cpp:207!
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: llc -O0 ../build-ndebug/reduced.ll
1.      Running pass 'Function Pass Manager' on module '../build-ndebug/reduced.ll'.
2.      Running pass 'Fast Register Allocator' on function '@"_ZN55_$LT$std..io..stdio..Stdin$u20$as$u20$std..io..Read$GT$11read_to_end17haba70a09681d41d3E"'
 #0 0x0000000001f5adb4 PrintStackTraceSignalHandler(void*) Signals.cpp:0:0
 #1 0x0000000001f584eb SignalHandler(int) Signals.cpp:0:0
 #2 0x00007efcfe780b50 __restore_rt (/lib64/libc.so.6+0x3cb50)
 #3 0x00007efcfe7d0e7c __pthread_kill_implementation (/lib64/libc.so.6+0x8ce7c)
 #4 0x00007efcfe780aa6 gsignal (/lib64/libc.so.6+0x3caa6)
 #5 0x00007efcfe76a7fc abort (/lib64/libc.so.6+0x267fc)
 #6 0x0000000001eb7b8a (/home/jistone/llvm-project/build/bin/llc+0x1eb7b8a)
 #7 0x00000000011ba332 llvm::MachineRegisterInfo::clearVirtRegs() (/home/jistone/llvm-project/build/bin/llc+0x11ba332)
 #8 0x00000000012cca4c (anonymous namespace)::RegAllocFast::runOnMachineFunction(llvm::MachineFunction&) RegAllocFast.cpp:0:0
 #9 0x000000000112883b llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (.part.0) MachineFunctionPass.cpp:0:0
#10 0x0000000001665c50 llvm::FPPassManager::runOnFunction(llvm::Function&) (/home/jistone/llvm-project/build/bin/llc+0x1665c50)
#11 0x0000000001665d91 llvm::FPPassManager::runOnModule(llvm::Module&) (/home/jistone/llvm-project/build/bin/llc+0x1665d91)
#12 0x0000000001666657 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/home/jistone/llvm-project/build/bin/llc+0x1666657)
#13 0x0000000000657a66 main (/home/jistone/llvm-project/build/bin/llc+0x657a66)
#14 0x00007efcfe76b510 __libc_start_call_main (/lib64/libc.so.6+0x27510)
#15 0x00007efcfe76b5c9 __libc_start_main.5 (/lib64/libc.so.6+0x275c9)
#16 0x00000000006d7995 _start (/home/jistone/llvm-project/build/bin/llc+0x6d7995)
Aborted (core dumped)

Comment 4 Josh Stone 2022-11-24 02:41:42 UTC
I tried a mock build of llvm.spec with my upstream patch, and confirmed it does fix the rust doctests.

https://github.com/llvm/llvm-project/issues/59172#issuecomment-1325789049

Comment 5 Fabio Valentini 2022-12-15 14:40:36 UTC
I've now added the simple "disable LTO" workaround to all library-only crates, where this has no effect on binary quality:

- convert_case
- deser-hjson
- json_value_merge
- lexical-core
- rustc-demangle
- slog

The packages that ship binaries will be updated with the more-annoying-to-implement "do not run doctests on ppc64le" workaround later, as necessary.

Comment 6 Fabio Valentini 2023-01-11 19:21:40 UTC
I've now pushed workarounds for this issue to the remaining packages to ensure they don't fail to build during the upcoming mass rebuild.

Comment 7 Josh Stone 2023-01-11 19:51:25 UTC
My LLVM patch landed in main (16), and is under consideration for backport to 15.x:
https://github.com/llvm/llvm-project-release-prs/pull/228

If that doesn't happen, we can also consider patching Fedora LLVM ourselves.

Comment 8 Nikita Popov 2023-01-16 14:01:21 UTC
Should be fixed on rawhide by https://bodhi.fedoraproject.org/updates/FEDORA-2023-387545a496.

Comment 9 Fabio Valentini 2023-02-02 21:15:22 UTC
As far as I can tell, the crash is fixed, yes. I tried re-enabling LTO for some of the affected packages and they now build without problems. Thanks!


Note You need to log in before you can comment on or make changes to this bug.