2028609 – Miscompilation of LLVM on s390x with -march=z13 -mtune=z14 in GCC 8.x

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2028609 - Miscompilation of LLVM on s390x with -march=z13 -mtune=z14 in GCC 8.x

Summary: Miscompilation of LLVM on s390x with -march=z13 -mtune=z14 in GCC 8.x

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	gcc
Sub Component:
Version:	8.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Marek Polacek
QA Contact:	Václav Kadlčík
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2025710
TreeView+	depends on / blocked

Reported:	2021-12-02 19:03 UTC by Jakub Jelinek
Modified:	2023-07-18 14:19 UTC (History)
CC List:	8 users (show)
Fixed In Version:	gcc-8.5.0-10.el8
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-05-10 15:27:26 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
slpvec.C (2.79 KB, text/plain) 2021-12-02 19:03 UTC, Jakub Jelinek	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHELPLAN-104614	0	None	None	None	2021-12-02 19:07:14 UTC
Red Hat Product Errata	RHBA-2022:2072	0	None	None	None	2022-05-10 15:27:52 UTC

Description Jakub Jelinek 2021-12-02 19:03:50 UTC

Created attachment 1844506 [details]
slpvec.C

The following testcase is miscompiled on s390x with
g++ -fPIC -fvisibility-inlines-hidden -ffunction-sections -fdata-sections -O2 -fPIC  -fno-exceptions -fno-rtti -std=c++14 -mlong-double-128 -march=z13 -mtune=z14
both with the RHEL gcc 8.x and with upstream 8.5.0.
When miscompiled, it prints something like
__insertion_sort 0x3ffd74fd310 0x3ffd74fd348 0xdeadbeefcafebabe 0xdeadbeefcafebabe
__insertion_sort 0x3ffd74fd348 0x3ffd74fd348 0x10006b8 0xdeadbeefcafebabe
rather than
__insertion_sort 0x3ffd74fd310 0x3ffd74fd348 0x10006b8 0xdeadbeefcafebabe
__insertion_sort 0x3ffd74fd348 0x3ffd74fd348 0x10006b8 0xdeadbeefcafebabe

The interesting part is below, .cfi_* directives removed for brevity.
On entry, this function has 3 pointers in %r2, %r3 and %r4 registers, and
%r5 is pointer to the 16-byte function_ref<decltype(foo)> - object with trivially copyable class
containing 2 8-byte members.
_ZSt24__merge_sort_with_bufferIPPvS1_N4llvm12function_refIFbS0_S0_EEEEvT_S6_T0_T1_:
        stmg    %r6,%r15,48(%r15)
        lgr     %r14,%r15
        lay     %r15,-248(%r15)
        aghi    %r14,-32
        std     %f8,0(%r14)
        std     %f12,8(%r14)
        std     %f14,16(%r14)
        std     %f9,24(%r14)
        sgrk    %r11,%r3,%r2
        lgr     %r1,%r4
        srag    %r13,%r11,3
        agr     %r1,%r11
        lmg     %r8,%r9,0(%r5)
        stmg    %r8,%r9,160(%r15)
! The above stores the whole 16-byte function_ref correctly to %r15+160
        cgijle  %r11,48,.L13
        vlvgp   %v0,%r8,%r9
        ldgr    %f9,%r1
        ldgr    %f12,%r4
        la      %r1,200(%r15)
        lgr     %r10,%r3
        stg     %r11,176(%r15)
        ldgr    %f8,%r2
        lgr     %r6,%r9
        vlgvg   %r7,%v0,1
        stmg    %r8,%r9,184(%r15)
! So does the above
        lgr     %r8,%r1
.L14:
        la      %r11,56(%r2)
        lgr     %r4,%r8
        lgr     %r3,%r11
        stmg    %r6,%r7,200(%r15)
! But this one actually stores both 8-byte words the same to %r15+160, and %r15+200 is passed as %r4 to the function
        brasl   %r14,_ZSt16__insertion_sortIPPvN4llvm12function_refIFbS0_S0_EEEEvT_S6_T0_@PLT

In *.postreload, we have still correct:
(insn 16 12 166 2 (set (reg/v:TI 16 %f0 [orig:69 __comp ] [69])
        (reg:TI 8 %r8)) 1268 {movti}
     (nil))
...
(insn 137 136 140 3 (set (reg/v:TI 6 %r6 [orig:69 __comp ] [69])
        (reg/v:TI 16 %f0 [orig:69 __comp ] [69])) 1268 {movti}
     (nil))
The code spills it to 128-bit %f0 register and loads it back from it.
Next, split2 pass splits the latter (but not the former) into:
(insn 167 136 168 3 (set (reg:DI 6 %r6 [ __comp ])
        (reg:DI 16 %f0)) 1269 {*movdi_64}
     (nil))
(insn 168 167 140 3 (set (reg:DI 7 %r7 [orig:69 __comp+8 ] [69])
        (unspec:DI [
                (reg:V2DI 16 %f0)
                (const_int 1 [0x1])
            ] UNSPEC_VEC_EXTRACT)) 402 {*vec_extractv2di}
     (nil))
and finally cprop_hardreg seeing
(insn 187 188 186 3 (set (reg/v:TI 16 %f0 [orig:69 __comp ] [69])
        (reg:TI 8 %r8)) 1268 {movti}
     (nil))
changes insn 167 to:
(insn 167 136 168 3 (set (reg:DI 6 %r6 [ __comp ])
        (reg:DI 9 %r9 [16])) 1269 {*movdi_64}
     (nil))
I'm not sure if this is a bug in the
; Split a VR -> GPR TImode move into 2 vector load GR from VR element.
; For the higher order bits we do simply a DImode move while the
; second part is done via vec extract.  Both will end up as vlgvg.
(define_split
  [(set (match_operand:TI 0 "register_operand" "")
        (match_operand:TI 1 "register_operand" ""))]
  "TARGET_VX && reload_completed
   && GENERAL_REG_P (operands[0])
   && VECTOR_REG_P (operands[1])"
  [(set (match_dup 2) (match_dup 4))
   (set (match_dup 3) (unspec:DI [(match_dup 5) (const_int 1)]
                                 UNSPEC_VEC_EXTRACT))]
{
  operands[2] = operand_subword (operands[0], 0, 0, TImode);
  operands[3] = operand_subword (operands[0], 1, 0, TImode);
  operands[4] = gen_rtx_REG (DImode, REGNO (operands[1]));
  operands[5] = gen_rtx_REG (V2DImode, REGNO (operands[1]));
})
splitter, in cprop_hardreg or the s390x representation of those TImodes in floating point registers.

In GCC 9 it got "fixed" with https://gcc.gnu.org/r9-3763-gef976be1a23a517 but that just means it went latent.
And I can't reproduce it even with upstream GCC 9 branch with r9-3763 reverted - some RA decisions changed.
But that doesn't mean the problem isn't latent even on the trunk, certainly the above splitter is still there.

Comment 1 Jakub Jelinek 2021-12-02 19:05:49 UTC

Andreas, do you think you could have a look, at least to point us in the right direction.
Thanks.

Comment 2 Andreas Krebbel 2021-12-03 09:15:34 UTC

I' wondering if it is correct that our back-end accepts TI->DI mode changes in TARGET_CAN_CHANGE_MODE_CLASS. Just doing that mode change while leaving the register number as is would result in dealing with the high word part of the TImode. So it probably would be wrong for all big endian targets to return true in that case.

When cprop records value copies it rejects such mode switches as part of the copy on big endian targets. When it replaces the source of an assignment it consults the TARGET_CAN_CHANGE_MODE_CLASS. Rejecting the TI->DI change here should fix the problem. I don't know how much fallout this will produce. I'll run a regression test with that.

Comment 3 Jakub Jelinek 2021-12-03 09:54:05 UTC

Just trying to understand what s390x is doing
__attribute__((noipa)) __int128_t
foo (__int128_t a)
{
  asm volatile ("" : "+f" (a));
  asm volatile ("" : "+r" (a));
  return a;
}

__attribute__((noipa)) long long
bar (__int128_t a)
{
  long long b;
  asm volatile ("" : "=f" (b) : "0" (a));
  asm volatile ("" : "+r" (b));
  return b;
}

int
main ()
{
  __int128_t a = ((__int128_t) 0x0102030405060708ULL << 64) | 0x090a0b0c0d0e0f00ULL;
  foo (a);
  bar (a);
  return 0;
}
it seems that when a TImode value appears in a floating point register, it contains V2DImode { high-64-bits, low-64-bits }, but when accessed using ldgr it reads the first half (i.e. high part), rather than what the compiler would expect (low part of the register).
So yes, dealing with it in s390_can_change_mode_class does look like a way to go, after all, it already has some
  /* Values residing in VRs are little-endian style.  All modes are
     placed left-aligned in an VR.  This means that we cannot allow
     switching between modes with differing sizes.  Also if the vector
     facility is available we still place TFmode values in VR register
     pairs, since the only instructions we have operating on TFmodes
     only deal with register pairs.  Therefore we have to allow DFmode
     subregs of TFmodes to enable the TFmode splitters.  */
code.

Comment 4 Andreas Krebbel 2021-12-06 19:57:23 UTC

Working around this problem by disabling TI->DI mode changes in can_change_mode didn't work out. The hook then would also disable all DImode subregs of TImode regs. But this is needed when splitting TImode operations.

What about doing the following in crop instead. Boostraps find on s390x without causing testsuite regressions:

diff --git a/gcc/regcprop.c b/gcc/regcprop.c
index 18132425ab2..402478b8292 100644
--- a/gcc/regcprop.c
+++ b/gcc/regcprop.c
@@ -402,7 +402,10 @@ maybe_mode_change (machine_mode orig_mode, machine_mode copy_mode,
 
   if (orig_mode == new_mode)
     return gen_raw_REG (new_mode, regno);
-  else if (mode_change_ok (orig_mode, new_mode, regno))
+  else if (mode_change_ok (orig_mode, new_mode, regno)
+          && (hard_regno_nregs (copy_regno, copy_mode) ==
+              hard_regno_nregs (regno, copy_mode)
+              || !maybe_ne (subreg_lowpart_offset (new_mode, copy_mode), 0U)))
     {
       int copy_nregs = hard_regno_nregs (copy_regno, copy_mode);
       int use_nregs = hard_regno_nregs (copy_regno, new_mode);

Comment 5 Vladimir Makarov 2022-01-12 13:55:47 UTC

Here are my thoughts about the problem.

Regprop was designed long time ago on assumption that narrow mode refers for low part of the value which is wrong for s390x vregs (at least when we are looking at the splitter).

We could modify regprop to work w/o the assumption.  But it is too risky change here as it might need many modifications and we can skip to change some regcprop code working on the assumption.

More doable solution is to check the assumption in the place of actual transformation and reject it when the assumption does not hold.  That is what Andreas patch is trying to do indirectly but actually prohibiting this optimization for all big endian targets.

The right solution here would be introducing an undocumented hook (something narrow_mode_refers_low_part_p(regno) with default value true) and use it in the same place proposed by Andreas. 

The hook should be temporary probably as the right solution in the long run should be a change of regcprop to work w/o the assumption.

Comment 6 Marek Polacek 2022-01-12 14:12:50 UTC

Thanks Vlad.  Andreas, will you be able to update the patch to use a hook?  I suppose it could go upstream first.

Comment 7 Andreas Krebbel 2022-01-13 17:44:12 UTC

(In reply to Marek Polacek from comment #6)
> Thanks Vlad.  Andreas, will you be able to update the patch to use a hook? 
> I suppose it could go upstream first.

Ok.
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588407.html

Comment 8 Marek Polacek 2022-01-13 17:47:43 UTC

(In reply to Andreas Krebbel from comment #7)
> (In reply to Marek Polacek from comment #6)
> > Thanks Vlad.  Andreas, will you be able to update the patch to use a hook? 
> > I suppose it could go upstream first.
> 
> Ok.
> https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588407.html

Awesome, thanks.

Comment 9 Marek Polacek 2022-01-24 18:48:33 UTC

In today's meeting we concluded that we'll backport this version
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588407.html
into RHEL 8 GCC.

Comment 10 Andreas Krebbel 2022-01-24 19:01:49 UTC

That's the patch I've posted for upstream 2022-01-21:
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589036.html

Comment 11 Marek Polacek 2022-01-25 23:15:39 UTC

This is the patch I ended up committing after testing on s390x.
It fixes a typo: hook_bool_unit_true -> hook_bool_uint_true, and since that hook doesn't exist yet in GCC 8, I've added it, which looked obvious enough.

--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -10488,6 +10488,18 @@ s390_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
   return false;
 }
 
+/* Implement TARGET_NARROW_MODE_REFERS_LOW_PART_P.  */
+
+static bool
+s390_narrow_mode_refers_low_part_p (unsigned int regno)
+{
+  if (reg_classes_intersect_p (VEC_REGS, REGNO_REG_CLASS (regno)))
+    return false;
+
+  return true;
+}
+
+
 /* Implement TARGET_MODES_TIEABLE_P.  */
 
 static bool
@@ -16956,6 +16968,9 @@ s390_case_values_threshold (void)
 #undef TARGET_CASE_VALUES_THRESHOLD
 #define TARGET_CASE_VALUES_THRESHOLD s390_case_values_threshold
 
+#undef TARGET_NARROW_MODE_REFERS_LOW_PART_P
+#define TARGET_NARROW_MODE_REFERS_LOW_PART_P s390_narrow_mode_refers_low_part_p
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-s390.h"
--- a/gcc/regcprop.c
+++ b/gcc/regcprop.c
@@ -426,7 +426,8 @@ maybe_mode_change (machine_mode orig_mode, machine_mode copy_mode,
 
   if (orig_mode == new_mode)
     return gen_raw_REG (new_mode, regno);
-  else if (mode_change_ok (orig_mode, new_mode, regno))
+  else if (mode_change_ok (orig_mode, new_mode, regno)
+	   && targetm.narrow_mode_refers_low_part_p (regno))
     {
       int copy_nregs = hard_regno_nregs (copy_regno, copy_mode);
       int use_nregs = hard_regno_nregs (copy_regno, new_mode);
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -5446,6 +5446,16 @@ value that the middle-end intended.",
  bool, (machine_mode from, machine_mode to, reg_class_t rclass),
  hook_bool_mode_mode_reg_class_t_true)
 
+/* This hook is used to work around a problem in regcprop. Hardcoded
+assumptions currently prevent it from working correctly for targets
+where the low part of a multi-word register doesn't align to accessing
+the register with a narrower mode.  */
+DEFHOOK_UNDOC
+(narrow_mode_refers_low_part_p,
+"",
+bool, (unsigned int regno),
+hook_bool_uint_true)
+
 /* Change pseudo allocno class calculated by IRA.  */
 DEFHOOK
 (ira_change_pseudo_allocno_class,
--- a/gcc/hooks.h
+++ b/gcc/hooks.h
@@ -86,6 +86,7 @@ extern void hook_void_tree (tree);
 extern void hook_void_tree_treeptr (tree, tree *);
 extern void hook_void_int_int (int, int);
 extern void hook_void_gcc_optionsp (struct gcc_options *);
+extern bool hook_bool_uint_true (unsigned int);
 extern bool hook_bool_uint_uintp_false (unsigned int, unsigned int *);
 
 extern int hook_int_uint_mode_1 (unsigned int, machine_mode);
--- a/gcc/hooks.c
+++ b/gcc/hooks.c
@@ -498,6 +498,14 @@ hook_void_gcc_optionsp (struct gcc_optio
 {
 }
 
+/* Generic hook that takes an unsigned int and returns true.  */
+
+bool
+hook_bool_uint_true (unsigned int)
+{
+  return true;
+}
+
 /* Generic hook that takes an unsigned int, an unsigned int pointer and
    returns false.  */

Comment 13 Marek Polacek 2022-01-26 17:09:43 UTC

Sadly the patch in Comment 11 does not seem to work.  Andreas, do you have any ideas why?  The new hook s390_narrow_mode_refers_low_part_p *is* called, so I don't know what's going on.

Comment 14 Andreas Krebbel 2022-01-27 08:56:22 UTC

Sorry, my bad. The hook is invoked with the wrong reg actually. Please change it like so:

diff --git a/gcc/regcprop.c b/gcc/regcprop.c
index f457108b028..5585b759fb9 100644
--- a/gcc/regcprop.c
+++ b/gcc/regcprop.c
@@ -403,7 +403,7 @@ maybe_mode_change (machine_mode orig_mode, machine_mode copy_mode,
   if (orig_mode == new_mode)
     return gen_raw_REG (new_mode, regno);
   else if (mode_change_ok (orig_mode, new_mode, regno)
-          && targetm.narrow_mode_refers_low_part_p (regno))
+          && targetm.narrow_mode_refers_low_part_p (copy_regno))
     {
       int copy_nregs = hard_regno_nregs (copy_regno, copy_mode);
       int use_nregs = hard_regno_nregs (copy_regno, new_mode);

I've verified that with that additional change the output for Jakubs testcase changes as expected:

<       lgr     %r6,%r9
---
>       lgdr    %r6,%f0

r9 contained the low part of the r8/r9 register pair while the new instruction loads the high part into r6 as expected.

Sorry for not testing this more carefully and making you do an extra round.

Comment 15 Marek Polacek 2022-01-27 21:32:19 UTC

(In reply to Andreas Krebbel from comment #14)
> Sorry for not testing this more carefully and making you do an extra round.

Thanks! and no worries.  It works now:

# g++ -fPIC -fvisibility-inlines-hidden -ffunction-sections -fdata-sections -O2 -fPIC  -fno-exceptions -fno-rtti -std=c++14 -mlong-double-128 -march=z13 -mtune=z14 slpvec.C 
# ./a.out 
__insertion_sort 0x3ffc76fd590 0x3ffc76fd5c8 0x10006b8 0xdeadbeefcafebabe
__insertion_sort 0x3ffc76fd5c8 0x3ffc76fd5c8 0x10006b8 0xdeadbeefcafebabe

Comment 20 errata-xmlrpc 2022-05-10 15:27:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (gcc bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2072

Note You need to log in before you can comment on or make changes to this bug.