While testing if the patches from bug 2209635 do fix the issue, I decided to run a more elaborate test using llvm-test-suite in order to guarantee that more complex executions of clang would work. Unfortunately, 19 tests crashed Reproducible: Always Steps to Reproduce: On a host where bug 2209635 is fixed, run: 1. podman run -it --rm --platform linux/s390x fedora:latest Inside the container: 1. dnf install -y clang git /usr/bin/lscpu cmake ninja-build llvm-test-suite 2. git clone --depth 1 https://src.fedoraproject.org/rpms/llvm-test-suite.git llvm-test-suite 3. cd llvm-test-suite/tests/test-suite 4. ./runtest.sh Actual Results: Failed Tests (19): test-suite :: MicroBenchmarks/Builtins/Int128/Builtins.test test-suite :: MicroBenchmarks/ImageProcessing/AnisotropicDiffusion/AnisotropicDiffusion.test The output from the failling tests are very similar. This is the sample output from one of the tests: FAIL: test-suite :: MicroBenchmarks/LCALS/SubsetCRawLoops/lcalsCRaw.test (115 of 1989) ******************** TEST 'test-suite :: MicroBenchmarks/LCALS/SubsetCRawLoops/lcalsCRaw.test' FAILED ******************** /tmp/tmp.sElRYZ2yQ2/MicroBenchmarks/LCALS/SubsetCRawLoops/lcalsCRaw --benchmark_format=json > /tmp/tmp.sElRYZ2yQ2/MicroBenchmarks/LCALS/SubsetCRawLoops/Output/lcalsCRaw.test.bench.json terminate called after throwing an instance of 'std::out_of_range' what(): basic_string::substr: __pos (which is 10) > this->size() (which is 9) /tmp/tmp.sElRYZ2yQ2/MicroBenchmarks/LCALS/SubsetCRawLoops/Output/lcalsCRaw.test_run.script: line 1: 56376 Aborted (core dumped) /tmp/tmp.sElRYZ2yQ2/MicroBenchmarks/LCALS/SubsetCRawLoops/lcalsCRaw --benchmark_format=json > /tmp/tmp.sElRYZ2yQ2/MicroBenchmarks/LCALS/SubsetCRawLoops/Output/lcalsCRaw.test.bench.json test-suite :: MicroBenchmarks/ImageProcessing/BilateralFiltering/BilateralFilter.test test-suite :: MicroBenchmarks/ImageProcessing/Blur/blur.test test-suite :: MicroBenchmarks/ImageProcessing/Dilate/Dilate.test test-suite :: MicroBenchmarks/ImageProcessing/Dither/Dither.test test-suite :: MicroBenchmarks/ImageProcessing/Interpolation/Interpolation.test test-suite :: MicroBenchmarks/LCALS/SubsetALambdaLoops/lcalsALambda.test test-suite :: MicroBenchmarks/LCALS/SubsetARawLoops/lcalsARaw.test test-suite :: MicroBenchmarks/LCALS/SubsetBLambdaLoops/lcalsBLambda.test test-suite :: MicroBenchmarks/LCALS/SubsetBRawLoops/lcalsBRaw.test test-suite :: MicroBenchmarks/LCALS/SubsetCLambdaLoops/lcalsCLambda.test test-suite :: MicroBenchmarks/LCALS/SubsetCRawLoops/lcalsCRaw.test test-suite :: MicroBenchmarks/LoopInterchange/LoopInterchange.test test-suite :: MicroBenchmarks/LoopVectorization/LoopVectorizationBenchmarks.test test-suite :: MicroBenchmarks/MemFunctions/MemFunctions.test test-suite :: MicroBenchmarks/SLPVectorization/SLPVectorizationBenchmarks.test test-suite :: MicroBenchmarks/harris/harris.test test-suite :: MultiSource/Benchmarks/MiBench/automotive-basicmath/automotive-basicmath.test Testing Time: 1847.08s Passed: 1970 Failed: 19 Expected Results: No failures. Notice we do not have any failures when running on real s390x hardware.
Ilya, could you help me with this too, please?
Hi, sure, I'll take a look. This one seems a bit more complex, since, unlike the previous one, I don't seem to be able to reproduce it with the s390x-on-s390x emulation (I tried `qemu-s390x ./lcalsCRaw --benchmark_format=json`). I will still run trace diffing overnight and will try on x86_64 tomorrow, but, just in case, could you please attach your lcalsCRaw binary and the corresponding core dump?
Also, is there something related in the output of "dmesg" ? ... if a program really crashes, you sometimes get some valuable information there...
In the meantime I could reproduce this with qemu-s390x running on x86_64. Here is the backtrace: #0 0x0000004000c1fd0a in __pthread_kill_implementation () from /lib64/libc.so.6 #1 0x0000004000bcccb0 in raise () from /lib64/libc.so.6 #2 0x0000004000bad384 in abort () from /lib64/libc.so.6 #3 0x00000040009064d4 in __gnu_cxx::__verbose_terminate_handler() () from /lib64/libstdc++.so.6 #4 0x0000004000903ade in __cxxabiv1::__terminate(void (*)()) () from /lib64/libstdc++.so.6 #5 0x0000004000903b68 in std::terminate() () from /lib64/libstdc++.so.6 #6 0x0000004000903e76 in __cxa_throw () from /lib64/libstdc++.so.6 #7 0x0000004000938420 in std::__throw_out_of_range_fmt(char const*, ...) () from /lib64/libstdc++.so.6 #8 0x000000000108f894 in benchmark::CPUInfo::CPUInfo() () #9 0x000000000108d51c in benchmark::CPUInfo::Get() () #10 0x0000000001087514 in benchmark::BenchmarkReporter::Context::Context() () #11 0x0000000001036732 in benchmark::RunSpecifiedBenchmarks(benchmark::BenchmarkReporter*, benchmark::BenchmarkReporter*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) () #12 0x0000000001036158 in benchmark::RunSpecifiedBenchmarks() () #13 0x0000000001007956 in main ()
This seems to be a logic bug. I've extracted the following snippet from the benchmark: #include <fstream> #include <iostream> #include <cstdlib> int main() { int NumCPUs = 0; int MaxID = -1; std::ifstream f("/proc/cpuinfo"); if (!f.is_open()) { std::cerr << "failed to open /proc/cpuinfo\n"; return -1; } const std::string Key = "processor"; std::string ln; while (std::getline(f, ln)) { if (ln.empty()) continue; size_t SplitIdx = ln.find(':'); std::string value; #if defined(__s390__) // s390 has another format in /proc/cpuinfo // it needs to be parsed differently if (SplitIdx != std::string::npos) { std::cout << ln.size() << " " << ln << std::endl; value = ln.substr(Key.size() + 1, SplitIdx - Key.size() - 1); } #else if (SplitIdx != std::string::npos) value = ln.substr(SplitIdx + 1); #endif if (ln.size() >= Key.size() && ln.compare(0, Key.size(), Key) == 0) { NumCPUs++; if (!value.empty()) { int CurID = std::atoi(value.c_str()); MaxID = std::max(CurID, MaxID); } } } } and it dies on the following input line from /proc/cpuinfo: wp : yes (note that there are tabs in it). There are two problems here: 1) The parser in the benchmark is not resilient to malformed inputs. 2) qemu-user does not emulate /proc/cpuinfo on s390x (but there is support for sparc, hppa and riscv, so it's possible to implement).
(In reply to Ilya Leoshkevich from comment #5) > 1) The parser in the benchmark is not resilient to malformed inputs. Oh! Good finding! > 2) qemu-user does not emulate /proc/cpuinfo on s390x (but there is support > for sparc, hppa and riscv, so it's possible to implement). My intention reporting this bug was to guarantee there were no instructions being executed wrongly. You've just proved this is not happening and I'm happy with it. With that said, I'm OK if this bug is closed as NOTABUG. I'll leave for you (Ilya and the qemu packagers on Fedora) to decide what is the best course of action here.
(In reply to Thomas Huth from comment #3) > Also, is there something related in the output of "dmesg" ? ... if a program > really crashes, you sometimes get some valuable information there... Thomas, I didn't notice any related output in dmesg. However, the coredump is being generated, but the output is from qemu-s390x-static and is not very helpful to investigate the issue.
(In reply to Tulio Magno Quites Machado Filho from comment #7) > Thomas, I didn't notice any related output in dmesg. > However, the coredump is being generated, but the output is from > qemu-s390x-static and is not very helpful to investigate the issue. Ah, sorry, never mind, I did not read your description close enough. I thought you were running with full system emulation (qemu-system-s390x), but you're using "userspace" emulation (qemu-s390x) instead. My comment only made sense for qemu-system-s390x - with qemu-s390x you of course do not get any useful output in the "dmesg" of the host kernel. Sorry for the confusion.
I've posted https://lore.kernel.org/qemu-devel/20230601162541.689621-1-iii@linux.ibm.com/ for the cpuinfo issue, but with that fix one more failure remains (it was distinct to begin with): /tmp/tmp.nUUFsR5GwP/tools/timeit-target --limit-core 0 --limit-cpu 7200 --timeout 7200 --limit-file-size 104857600 --limit-rss-size 838860800 --append-exitstatus --redirect-output /tmp/tmp.nUUFsR5GwP/MultiSource/Benchmarks/MiBench/automotive-basicmath/Output/automotive-basicmath.test.out --redirect-input /dev/null --chdir /tmp/tmp.nUUFsR5GwP/MultiSource/Benchmarks/MiBench/automotive-basicmath --summary /tmp/tmp.nUUFsR5GwP/MultiSource/Benchmarks/MiBench/automotive-basicmath/Output/automotive-basicmath.test.time /tmp/tmp.nUUFsR5GwP/MultiSource/Benchmarks/MiBench/automotive-basicmath/automotive-basicmath /tmp/tmp.nUUFsR5GwP/tools/HashProgramOutput.sh /tmp/tmp.nUUFsR5GwP/MultiSource/Benchmarks/MiBench/automotive-basicmath/Output/automotive-basicmath.test.out /tmp/tmp.nUUFsR5GwP/tools/fpcmp-target /tmp/tmp.nUUFsR5GwP/MultiSource/Benchmarks/MiBench/automotive-basicmath/Output/automotive-basicmath.test.out /tmp/tmp.nUUFsR5GwP/MultiSource/Benchmarks/MiBench/automotive-basicmath/automotive-basicmath.reference_output /tmp/tmp.nUUFsR5GwP/tools/fpcmp-target: Comparison failed, textual difference between 'a' and 'e' I will have a look at it a bit later.
That one was another broken insn after all: https://lore.kernel.org/qemu-devel/20230601223027.795501-1-iii@linux.ibm.com/ Not the testsuite is finally green.
Assigning to Ilya and setting status. Note that there's no action which needs to be taken here. When we rebase qemu and include the fix we can just close this bug.
This bug appears to have been reported against 'rawhide' during the Fedora Linux 39 development cycle. Changing version to 39.
This message is a reminder that Fedora Linux 39 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora Linux 39 on 2024-11-26. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a 'version' of '39'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, change the 'version' to a later Fedora Linux version. Note that the version field may be hidden. Click the "Show advanced fields" button if you do not see it. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora Linux 39 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora Linux, you are encouraged to change the 'version' to a later version prior to this bug being closed.
Fedora Linux 39 entered end-of-life (EOL) status on 2024-11-26. Fedora Linux 39 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora Linux please feel free to reopen this bug against that version. Note that the version field may be hidden. Click the "Show advanced fields" button if you do not see the version field. If you are unable to reopen this bug, please file a new report against an active release. Thank you for reporting this bug and we are sorry it could not be fixed.