Bug 2035807
| Summary: | Valgrind crashes with illegal instruction error on s390x when trying to build snapd package for EPEL 9 | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Neal Gompa <ngompa13> | ||||
| Component: | valgrind | Assignee: | Mark Wielaard <mjw> | ||||
| valgrind sub component: | system-version | QA Contact: | Jesus Checa <jchecahi> | ||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||
| Severity: | unspecified | ||||||
| Priority: | unspecified | CC: | arnez, bstinson, fche, fweimer, jakub, jchecahi, jwboyer, maciek.borzecki, ohudlick, tdawson | ||||
| Version: | CentOS Stream | Keywords: | Patch, Triaged, Upstream | ||||
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
||||
| Target Release: | --- | ||||||
| Hardware: | s390x | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | valgrind-3.18.1-8.el9 | Doc Type: | No Doc Update | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2022-05-17 12:48:06 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Neal Gompa
2021-12-27 18:39:06 UTC
I am on vacation this week, back Jan 11. But lets see if we can make some progress anyway. /usr/bin/valgrind ./libsnap-confine-private/unit-tests ==4036598== Memcheck, a memory error detector ==4036598== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==4036598== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info ==4036598== Command: ./libsnap-confine-private/unit-tests ==4036598== # random seed: R02Sbb6363cb17884bef5e524f51f99e4a24 1..138 ok 1 /fault-injection vex s390->IR: specification exception: E700 0008 40C5 ==4036598== valgrind: Unrecognised instruction at address 0x48d5ca4. ==4036598== at 0x48D5CA4: ??? (in /usr/lib64/libglib-2.0.so.0.6800.4) ==4036598== by 0x48D9B41: ??? (in /usr/lib64/libglib-2.0.so.0.6800.4) ==4036598== by 0x48DA107: g_test_run_suite (in /usr/lib64/libglib-2.0.so.0.6800.4) ==4036598== by 0x48DA147: g_test_run (in /usr/lib64/libglib-2.0.so.0.6800.4) ==4036598== by 0x10BAD3: UnknownInlinedFun (unit-tests.c:28) ==4036598== by 0x10BAD3: main (unit-tests-main.c:21) - Does this only happen on centos9? I believe CentOS and Fedora have different base architecture defaults, so this could be a difference between z12 vs z13/ - Could you install debuginfo for libglib-2.0.so.0.6800.4 and/or could you disassemble the binary so we can see the instruction at 0x48d5ca4? (In reply to Mark Wielaard from comment #1) > I am on vacation this week, back Jan 11. But lets see if we can make some > progress anyway. > > /usr/bin/valgrind ./libsnap-confine-private/unit-tests > ==4036598== Memcheck, a memory error detector > ==4036598== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. > ==4036598== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright > info > ==4036598== Command: ./libsnap-confine-private/unit-tests > ==4036598== > # random seed: R02Sbb6363cb17884bef5e524f51f99e4a24 > 1..138 > ok 1 /fault-injection > vex s390->IR: specification exception: E700 0008 40C5 > ==4036598== valgrind: Unrecognised instruction at address 0x48d5ca4. > ==4036598== at 0x48D5CA4: ??? (in /usr/lib64/libglib-2.0.so.0.6800.4) > ==4036598== by 0x48D9B41: ??? (in /usr/lib64/libglib-2.0.so.0.6800.4) > ==4036598== by 0x48DA107: g_test_run_suite (in > /usr/lib64/libglib-2.0.so.0.6800.4) > ==4036598== by 0x48DA147: g_test_run (in > /usr/lib64/libglib-2.0.so.0.6800.4) > ==4036598== by 0x10BAD3: UnknownInlinedFun (unit-tests.c:28) > ==4036598== by 0x10BAD3: main (unit-tests-main.c:21) > > - Does this only happen on centos9? > I believe CentOS and Fedora have different base architecture defaults, > so this could be a difference between z12 vs z13/ > It only happens on CentOS Stream 9. Fedora Rawhide is fine still. (In reply to Neal Gompa from comment #2) > (In reply to Mark Wielaard from comment #1) > > I am on vacation this week, back Jan 11. But lets see if we can make some > > progress anyway. > > > > /usr/bin/valgrind ./libsnap-confine-private/unit-tests > > ==4036598== Memcheck, a memory error detector > > ==4036598== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. > > ==4036598== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright > > info > > ==4036598== Command: ./libsnap-confine-private/unit-tests > > ==4036598== > > # random seed: R02Sbb6363cb17884bef5e524f51f99e4a24 > > 1..138 > > ok 1 /fault-injection > > vex s390->IR: specification exception: E700 0008 40C5 > > ==4036598== valgrind: Unrecognised instruction at address 0x48d5ca4. > > ==4036598== at 0x48D5CA4: ??? (in /usr/lib64/libglib-2.0.so.0.6800.4) > > ==4036598== by 0x48D9B41: ??? (in /usr/lib64/libglib-2.0.so.0.6800.4) > > ==4036598== by 0x48DA107: g_test_run_suite (in > > /usr/lib64/libglib-2.0.so.0.6800.4) > > ==4036598== by 0x48DA147: g_test_run (in > > /usr/lib64/libglib-2.0.so.0.6800.4) > > ==4036598== by 0x10BAD3: UnknownInlinedFun (unit-tests.c:28) > > ==4036598== by 0x10BAD3: main (unit-tests-main.c:21) > > > > - Does this only happen on centos9? > > I believe CentOS and Fedora have different base architecture defaults, > > so this could be a difference between z12 vs z13/ > > > > It only happens on CentOS Stream 9. Fedora Rawhide is fine still. Thanks. So it likely is a z13 or z14 only issue. But without knowing the actual code that is at address 0x48d5ca4 it is hard to say what is going on without access to a z13/z14 capable s390x machine. As far as I can tell E700 xxxx xxC5 is a VFLR instruction, which is an aarch12 (z14) only instruction. But one that valgrind should implement. So I am not really clear on why it produces an specification exception. Do you have access to an machine were this fails? Could you disassemble /usr/lib64/libglib-2.0.so.0.6800.4 around address 0x48d5ca4? Also could you make sure that the machine where you are running does actually support the z14 instruction set? I see you are building with -march=z196 but then using a library which seems to use z14 instructions. I don't have access to s390x hardware. I only observe this in the Fedora Koji build system. I'm sorry, I can't provide any more detail. :( Here's the Koji build task that failed: https://koji.fedoraproject.org/koji/taskinfo?taskID=80531908 (In reply to Neal Gompa from comment #6) > Here's the Koji build task that failed: > https://koji.fedoraproject.org/koji/taskinfo?taskID=80531908 Thanks. It used glib2-2.68.4-3.el9.s390x. Unfortunately the valgrind output does not show the relative (in-object) offset, so I have to guess. $ s390x-linux-gnu-objdump -d --reloc usr/lib64/libglib-2.0.so.0.6800.4 | grep ca4:.*e7 7fca4: e7 00 00 08 40 c5 wflrx %v0,%v0,0,0 That's the only hit. m3 is 4, so it's extended format. Extended format does not seem to be implemented: static const HChar * s390_irgen_VFLR(UChar v1, UChar v2, UChar m3, UChar m4, UChar m5) { s390_insn_assert("vflr", m3 == 3 || (s390_host_has_vxe && m3 == 2)); if (m3 == 3) s390_vector_fp_convert(Iop_F64toF32, Ity_F64, Ity_F32, True, v1, v2, m3, m4, m5); else s390_vector_fp_convert(Iop_F128toF64, Ity_F128, Ity_F64, True, v1, v2, m3, m4, m5); return "vflr"; } This was added with Vector-enhancements facility 1 in the 12th edition (arch12, that is, z14), so I think it's valid for RHEL 9 binaries. This (In reply to Florian Weimer from comment #7) > ... > That's the only hit. m3 is 4, so it's extended format. Extended format does > not seem to be implemented: > ... Actually, it is. But the code checks for the wrong format code. Instead of checking for 4, it checks for 2. This is a typo. I created a Valgrind Bug for tracking and attached a possible fix: https://bugs.kde.org/show_bug.cgi?id=447991 The fix looks good, testing a fedora rawhide build with that patch now. Did a rpmbuild --rebuild snapd-2.54.1-1.fc36.src.rpm with the old valgrind-3.18.1-6.el9.s390x.rpm which replicated the issue. Then installed the new fedora rawhide valgrind-3.18.1-8.fc36.s390x.rpm which contains the proposed fix and the rpmbuild succeeded. All valgrind invocations in the build.log look fine with the patched valgrind. The following snippet reproduces the issue in old build valgrind-3.18.1-6.el9:
int main(){
asm("wflrx %v0,%v0,0,0");
return 0 ;
}
Verified that it doesn't reproduce with new build valgrind-3.18.1-8.el9. wflrx instruction doesn't cause valgrind to raise a SIGILL or report a specification exception anymore.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (new packages: valgrind), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:2401 |