RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2035807 - Valgrind crashes with illegal instruction error on s390x when trying to build snapd package for EPEL 9
Summary: Valgrind crashes with illegal instruction error on s390x when trying to build...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: valgrind
Version: CentOS Stream
Hardware: s390x
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Mark Wielaard
QA Contact: Jesus Checa
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-27 18:39 UTC by Neal Gompa
Modified: 2022-05-17 13:11 UTC (History)
10 users (show)

Fixed In Version: valgrind-3.18.1-8.el9
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-05-17 12:48:06 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)
Mock build log of snapd-2.54.1-1.el9 failing with Valgrind crashing (4.03 MB, text/plain)
2021-12-27 18:39 UTC, Neal Gompa
no flags Details


Links
System ID Private Priority Status Summary Last Updated
KDE Software Compilation 447991 0 NOR UNCONFIRMED s390x: Valgrind indicates illegal instruction on wflrx 2022-01-05 22:12:51 UTC
Red Hat Issue Tracker RHELPLAN-106617 0 None None None 2021-12-27 18:40:20 UTC
Red Hat Product Errata RHBA-2022:2401 0 None None None 2022-05-17 12:48:22 UTC

Description Neal Gompa 2021-12-27 18:39:06 UTC
Created attachment 1847988 [details]
Mock build log of snapd-2.54.1-1.el9 failing with Valgrind crashing

Description of problem:
When trying to build snapd on s390x, Valgrind crashes with an illegal instruction error, causing the tests to fail.

Version-Release number of selected component (if applicable):
1:3.18.1-5.el9

How reproducible:
Always

Steps to Reproduce:
1. Build snapd for EPEL 9 on s390x at the following commit: https://src.fedoraproject.org/rpms/snapd/c/3d93cdbdc3dadedbf46e2ae5b13a358362549462

Actual results:
Valgrind dies with the following error: "==4036598== valgrind: Unrecognised instruction at address 0x48d5ca4."


Expected results:
Valgrind passes as it does on Fedora.

Comment 1 Mark Wielaard 2022-01-03 12:28:14 UTC
I am on vacation this week, back Jan 11. But lets see if we can make some progress anyway.

/usr/bin/valgrind ./libsnap-confine-private/unit-tests
==4036598== Memcheck, a memory error detector
==4036598== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==4036598== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==4036598== Command: ./libsnap-confine-private/unit-tests
==4036598== 
# random seed: R02Sbb6363cb17884bef5e524f51f99e4a24
1..138
ok 1 /fault-injection
vex s390->IR: specification exception: E700 0008 40C5
==4036598== valgrind: Unrecognised instruction at address 0x48d5ca4.
==4036598==    at 0x48D5CA4: ??? (in /usr/lib64/libglib-2.0.so.0.6800.4)
==4036598==    by 0x48D9B41: ??? (in /usr/lib64/libglib-2.0.so.0.6800.4)
==4036598==    by 0x48DA107: g_test_run_suite (in /usr/lib64/libglib-2.0.so.0.6800.4)
==4036598==    by 0x48DA147: g_test_run (in /usr/lib64/libglib-2.0.so.0.6800.4)
==4036598==    by 0x10BAD3: UnknownInlinedFun (unit-tests.c:28)
==4036598==    by 0x10BAD3: main (unit-tests-main.c:21)

- Does this only happen on centos9?
  I believe CentOS and Fedora have different base architecture defaults,
  so this could be a difference between z12 vs z13/

- Could you install debuginfo for libglib-2.0.so.0.6800.4
  and/or could you disassemble the binary so we can see the instruction at 0x48d5ca4?

Comment 2 Neal Gompa 2022-01-03 13:09:10 UTC
(In reply to Mark Wielaard from comment #1)
> I am on vacation this week, back Jan 11. But lets see if we can make some
> progress anyway.
> 
> /usr/bin/valgrind ./libsnap-confine-private/unit-tests
> ==4036598== Memcheck, a memory error detector
> ==4036598== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
> ==4036598== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright
> info
> ==4036598== Command: ./libsnap-confine-private/unit-tests
> ==4036598== 
> # random seed: R02Sbb6363cb17884bef5e524f51f99e4a24
> 1..138
> ok 1 /fault-injection
> vex s390->IR: specification exception: E700 0008 40C5
> ==4036598== valgrind: Unrecognised instruction at address 0x48d5ca4.
> ==4036598==    at 0x48D5CA4: ??? (in /usr/lib64/libglib-2.0.so.0.6800.4)
> ==4036598==    by 0x48D9B41: ??? (in /usr/lib64/libglib-2.0.so.0.6800.4)
> ==4036598==    by 0x48DA107: g_test_run_suite (in
> /usr/lib64/libglib-2.0.so.0.6800.4)
> ==4036598==    by 0x48DA147: g_test_run (in
> /usr/lib64/libglib-2.0.so.0.6800.4)
> ==4036598==    by 0x10BAD3: UnknownInlinedFun (unit-tests.c:28)
> ==4036598==    by 0x10BAD3: main (unit-tests-main.c:21)
> 
> - Does this only happen on centos9?
>   I believe CentOS and Fedora have different base architecture defaults,
>   so this could be a difference between z12 vs z13/
> 

It only happens on CentOS Stream 9. Fedora Rawhide is fine still.

Comment 3 Mark Wielaard 2022-01-03 15:27:07 UTC
(In reply to Neal Gompa from comment #2)
> (In reply to Mark Wielaard from comment #1)
> > I am on vacation this week, back Jan 11. But lets see if we can make some
> > progress anyway.
> > 
> > /usr/bin/valgrind ./libsnap-confine-private/unit-tests
> > ==4036598== Memcheck, a memory error detector
> > ==4036598== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
> > ==4036598== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright
> > info
> > ==4036598== Command: ./libsnap-confine-private/unit-tests
> > ==4036598== 
> > # random seed: R02Sbb6363cb17884bef5e524f51f99e4a24
> > 1..138
> > ok 1 /fault-injection
> > vex s390->IR: specification exception: E700 0008 40C5
> > ==4036598== valgrind: Unrecognised instruction at address 0x48d5ca4.
> > ==4036598==    at 0x48D5CA4: ??? (in /usr/lib64/libglib-2.0.so.0.6800.4)
> > ==4036598==    by 0x48D9B41: ??? (in /usr/lib64/libglib-2.0.so.0.6800.4)
> > ==4036598==    by 0x48DA107: g_test_run_suite (in
> > /usr/lib64/libglib-2.0.so.0.6800.4)
> > ==4036598==    by 0x48DA147: g_test_run (in
> > /usr/lib64/libglib-2.0.so.0.6800.4)
> > ==4036598==    by 0x10BAD3: UnknownInlinedFun (unit-tests.c:28)
> > ==4036598==    by 0x10BAD3: main (unit-tests-main.c:21)
> > 
> > - Does this only happen on centos9?
> >   I believe CentOS and Fedora have different base architecture defaults,
> >   so this could be a difference between z12 vs z13/
> > 
> 
> It only happens on CentOS Stream 9. Fedora Rawhide is fine still.

Thanks. So it likely is a z13 or z14 only issue. But without knowing the actual code that is at address 0x48d5ca4 it is hard to say what is going on without access to a z13/z14 capable s390x machine.

As far as I can tell E700 xxxx xxC5 is a VFLR instruction, which is an aarch12 (z14) only instruction.
But one that valgrind should implement. So I am not really clear on why it produces an specification exception.

Do you have access to an machine were this fails? Could you disassemble /usr/lib64/libglib-2.0.so.0.6800.4 around address 0x48d5ca4?

Comment 4 Mark Wielaard 2022-01-03 15:36:06 UTC
Also could you make sure that the machine where you are running does actually support the z14 instruction set?
I see you are building with -march=z196 but then using a library which seems to use z14 instructions.

Comment 5 Neal Gompa 2022-01-03 16:13:24 UTC
I don't have access to s390x hardware. I only observe this in the Fedora Koji build system. I'm sorry, I can't provide any more detail. :(

Comment 6 Neal Gompa 2022-01-03 16:14:52 UTC
Here's the Koji build task that failed: https://koji.fedoraproject.org/koji/taskinfo?taskID=80531908

Comment 7 Florian Weimer 2022-01-03 16:47:21 UTC
(In reply to Neal Gompa from comment #6)
> Here's the Koji build task that failed:
> https://koji.fedoraproject.org/koji/taskinfo?taskID=80531908

Thanks. It used glib2-2.68.4-3.el9.s390x.

Unfortunately the valgrind output does not show the relative (in-object) offset, so I have to guess.

$ s390x-linux-gnu-objdump -d --reloc  usr/lib64/libglib-2.0.so.0.6800.4 | grep ca4:.*e7
   7fca4:	e7 00 00 08 40 c5 	wflrx	%v0,%v0,0,0

That's the only hit. m3 is 4, so it's extended format. Extended format does not seem to be implemented:

static const HChar *
s390_irgen_VFLR(UChar v1, UChar v2, UChar m3, UChar m4, UChar m5)
{
   s390_insn_assert("vflr", m3 == 3 || (s390_host_has_vxe && m3 == 2));

   if (m3 == 3)
      s390_vector_fp_convert(Iop_F64toF32, Ity_F64, Ity_F32, True,
                             v1, v2, m3, m4, m5);
   else
      s390_vector_fp_convert(Iop_F128toF64, Ity_F128, Ity_F64, True,
                             v1, v2, m3, m4, m5);

   return "vflr";
}

This was added with Vector-enhancements facility 1 in the 12th edition (arch12, that is, z14), so I think it's valid for RHEL 9 binaries.

Comment 8 Andreas Arnez 2022-01-05 19:29:35 UTC
This (In reply to Florian Weimer from comment #7)
> ...
> That's the only hit. m3 is 4, so it's extended format. Extended format does
> not seem to be implemented:
> ...
Actually, it is.  But the code checks for the wrong format code.  Instead of checking for 4, it checks for 2.  This is a typo.

I created a Valgrind Bug for tracking and attached a possible fix:
  https://bugs.kde.org/show_bug.cgi?id=447991

Comment 9 Mark Wielaard 2022-01-12 15:53:39 UTC
The fix looks good, testing a fedora rawhide build with that patch now.

Comment 10 Mark Wielaard 2022-01-12 17:18:36 UTC
Did a rpmbuild --rebuild snapd-2.54.1-1.fc36.src.rpm with the old valgrind-3.18.1-6.el9.s390x.rpm which replicated the issue.
Then installed the new fedora rawhide valgrind-3.18.1-8.fc36.s390x.rpm which contains the proposed fix and the rpmbuild succeeded.
All valgrind invocations in the build.log look fine with the patched valgrind.

Comment 11 Jesus Checa 2022-01-14 14:46:22 UTC
The following snippet reproduces the issue in old build valgrind-3.18.1-6.el9:

int main(){
    asm("wflrx %v0,%v0,0,0");
    return 0 ;
}

Verified that it doesn't reproduce with new build valgrind-3.18.1-8.el9. wflrx instruction doesn't cause valgrind to raise a SIGILL or report a specification exception anymore.

Comment 16 errata-xmlrpc 2022-05-17 12:48:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (new packages: valgrind), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2401


Note You need to log in before you can comment on or make changes to this bug.