Bug 2028609
Summary: | Miscompilation of LLVM on s390x with -march=z13 -mtune=z14 in GCC 8.x | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Jakub Jelinek <jakub> | ||||
Component: | gcc | Assignee: | Marek Polacek <mpolacek> | ||||
gcc sub component: | system-version | QA Contact: | Václav Kadlčík <vkadlcik> | ||||
Status: | CLOSED ERRATA | Docs Contact: | |||||
Severity: | unspecified | ||||||
Priority: | unspecified | CC: | ahajkova, dhorak, fweimer, jakub, krebbel1, ohudlick, tstellar, vmakarov | ||||
Version: | 8.5 | Keywords: | Bugfix, Triaged | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | gcc-8.5.0-10.el8 | Doc Type: | No Doc Update | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2022-05-10 15:27:26 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 2025710 | ||||||
Attachments: |
|
Description
Jakub Jelinek
2021-12-02 19:03:50 UTC
Andreas, do you think you could have a look, at least to point us in the right direction. Thanks. I' wondering if it is correct that our back-end accepts TI->DI mode changes in TARGET_CAN_CHANGE_MODE_CLASS. Just doing that mode change while leaving the register number as is would result in dealing with the high word part of the TImode. So it probably would be wrong for all big endian targets to return true in that case. When cprop records value copies it rejects such mode switches as part of the copy on big endian targets. When it replaces the source of an assignment it consults the TARGET_CAN_CHANGE_MODE_CLASS. Rejecting the TI->DI change here should fix the problem. I don't know how much fallout this will produce. I'll run a regression test with that. Just trying to understand what s390x is doing __attribute__((noipa)) __int128_t foo (__int128_t a) { asm volatile ("" : "+f" (a)); asm volatile ("" : "+r" (a)); return a; } __attribute__((noipa)) long long bar (__int128_t a) { long long b; asm volatile ("" : "=f" (b) : "0" (a)); asm volatile ("" : "+r" (b)); return b; } int main () { __int128_t a = ((__int128_t) 0x0102030405060708ULL << 64) | 0x090a0b0c0d0e0f00ULL; foo (a); bar (a); return 0; } it seems that when a TImode value appears in a floating point register, it contains V2DImode { high-64-bits, low-64-bits }, but when accessed using ldgr it reads the first half (i.e. high part), rather than what the compiler would expect (low part of the register). So yes, dealing with it in s390_can_change_mode_class does look like a way to go, after all, it already has some /* Values residing in VRs are little-endian style. All modes are placed left-aligned in an VR. This means that we cannot allow switching between modes with differing sizes. Also if the vector facility is available we still place TFmode values in VR register pairs, since the only instructions we have operating on TFmodes only deal with register pairs. Therefore we have to allow DFmode subregs of TFmodes to enable the TFmode splitters. */ code. Working around this problem by disabling TI->DI mode changes in can_change_mode didn't work out. The hook then would also disable all DImode subregs of TImode regs. But this is needed when splitting TImode operations. What about doing the following in crop instead. Boostraps find on s390x without causing testsuite regressions: diff --git a/gcc/regcprop.c b/gcc/regcprop.c index 18132425ab2..402478b8292 100644 --- a/gcc/regcprop.c +++ b/gcc/regcprop.c @@ -402,7 +402,10 @@ maybe_mode_change (machine_mode orig_mode, machine_mode copy_mode, if (orig_mode == new_mode) return gen_raw_REG (new_mode, regno); - else if (mode_change_ok (orig_mode, new_mode, regno)) + else if (mode_change_ok (orig_mode, new_mode, regno) + && (hard_regno_nregs (copy_regno, copy_mode) == + hard_regno_nregs (regno, copy_mode) + || !maybe_ne (subreg_lowpart_offset (new_mode, copy_mode), 0U))) { int copy_nregs = hard_regno_nregs (copy_regno, copy_mode); int use_nregs = hard_regno_nregs (copy_regno, new_mode); Here are my thoughts about the problem. Regprop was designed long time ago on assumption that narrow mode refers for low part of the value which is wrong for s390x vregs (at least when we are looking at the splitter). We could modify regprop to work w/o the assumption. But it is too risky change here as it might need many modifications and we can skip to change some regcprop code working on the assumption. More doable solution is to check the assumption in the place of actual transformation and reject it when the assumption does not hold. That is what Andreas patch is trying to do indirectly but actually prohibiting this optimization for all big endian targets. The right solution here would be introducing an undocumented hook (something narrow_mode_refers_low_part_p(regno) with default value true) and use it in the same place proposed by Andreas. The hook should be temporary probably as the right solution in the long run should be a change of regcprop to work w/o the assumption. Thanks Vlad. Andreas, will you be able to update the patch to use a hook? I suppose it could go upstream first. (In reply to Marek Polacek from comment #6) > Thanks Vlad. Andreas, will you be able to update the patch to use a hook? > I suppose it could go upstream first. Ok. https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588407.html (In reply to Andreas Krebbel from comment #7) > (In reply to Marek Polacek from comment #6) > > Thanks Vlad. Andreas, will you be able to update the patch to use a hook? > > I suppose it could go upstream first. > > Ok. > https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588407.html Awesome, thanks. In today's meeting we concluded that we'll backport this version https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588407.html into RHEL 8 GCC. That's the patch I've posted for upstream 2022-01-21: https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589036.html This is the patch I ended up committing after testing on s390x. It fixes a typo: hook_bool_unit_true -> hook_bool_uint_true, and since that hook doesn't exist yet in GCC 8, I've added it, which looked obvious enough. --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -10488,6 +10488,18 @@ s390_hard_regno_mode_ok (unsigned int regno, machine_mode mode) return false; } +/* Implement TARGET_NARROW_MODE_REFERS_LOW_PART_P. */ + +static bool +s390_narrow_mode_refers_low_part_p (unsigned int regno) +{ + if (reg_classes_intersect_p (VEC_REGS, REGNO_REG_CLASS (regno))) + return false; + + return true; +} + + /* Implement TARGET_MODES_TIEABLE_P. */ static bool @@ -16956,6 +16968,9 @@ s390_case_values_threshold (void) #undef TARGET_CASE_VALUES_THRESHOLD #define TARGET_CASE_VALUES_THRESHOLD s390_case_values_threshold +#undef TARGET_NARROW_MODE_REFERS_LOW_PART_P +#define TARGET_NARROW_MODE_REFERS_LOW_PART_P s390_narrow_mode_refers_low_part_p + struct gcc_target targetm = TARGET_INITIALIZER; #include "gt-s390.h" --- a/gcc/regcprop.c +++ b/gcc/regcprop.c @@ -426,7 +426,8 @@ maybe_mode_change (machine_mode orig_mode, machine_mode copy_mode, if (orig_mode == new_mode) return gen_raw_REG (new_mode, regno); - else if (mode_change_ok (orig_mode, new_mode, regno)) + else if (mode_change_ok (orig_mode, new_mode, regno) + && targetm.narrow_mode_refers_low_part_p (regno)) { int copy_nregs = hard_regno_nregs (copy_regno, copy_mode); int use_nregs = hard_regno_nregs (copy_regno, new_mode); --- a/gcc/target.def +++ b/gcc/target.def @@ -5446,6 +5446,16 @@ value that the middle-end intended.", bool, (machine_mode from, machine_mode to, reg_class_t rclass), hook_bool_mode_mode_reg_class_t_true) +/* This hook is used to work around a problem in regcprop. Hardcoded +assumptions currently prevent it from working correctly for targets +where the low part of a multi-word register doesn't align to accessing +the register with a narrower mode. */ +DEFHOOK_UNDOC +(narrow_mode_refers_low_part_p, +"", +bool, (unsigned int regno), +hook_bool_uint_true) + /* Change pseudo allocno class calculated by IRA. */ DEFHOOK (ira_change_pseudo_allocno_class, --- a/gcc/hooks.h +++ b/gcc/hooks.h @@ -86,6 +86,7 @@ extern void hook_void_tree (tree); extern void hook_void_tree_treeptr (tree, tree *); extern void hook_void_int_int (int, int); extern void hook_void_gcc_optionsp (struct gcc_options *); +extern bool hook_bool_uint_true (unsigned int); extern bool hook_bool_uint_uintp_false (unsigned int, unsigned int *); extern int hook_int_uint_mode_1 (unsigned int, machine_mode); --- a/gcc/hooks.c +++ b/gcc/hooks.c @@ -498,6 +498,14 @@ hook_void_gcc_optionsp (struct gcc_optio { } +/* Generic hook that takes an unsigned int and returns true. */ + +bool +hook_bool_uint_true (unsigned int) +{ + return true; +} + /* Generic hook that takes an unsigned int, an unsigned int pointer and returns false. */ Sadly the patch in Comment 11 does not seem to work. Andreas, do you have any ideas why? The new hook s390_narrow_mode_refers_low_part_p *is* called, so I don't know what's going on. Sorry, my bad. The hook is invoked with the wrong reg actually. Please change it like so:
diff --git a/gcc/regcprop.c b/gcc/regcprop.c
index f457108b028..5585b759fb9 100644
--- a/gcc/regcprop.c
+++ b/gcc/regcprop.c
@@ -403,7 +403,7 @@ maybe_mode_change (machine_mode orig_mode, machine_mode copy_mode,
if (orig_mode == new_mode)
return gen_raw_REG (new_mode, regno);
else if (mode_change_ok (orig_mode, new_mode, regno)
- && targetm.narrow_mode_refers_low_part_p (regno))
+ && targetm.narrow_mode_refers_low_part_p (copy_regno))
{
int copy_nregs = hard_regno_nregs (copy_regno, copy_mode);
int use_nregs = hard_regno_nregs (copy_regno, new_mode);
I've verified that with that additional change the output for Jakubs testcase changes as expected:
< lgr %r6,%r9
---
> lgdr %r6,%f0
r9 contained the low part of the r8/r9 register pair while the new instruction loads the high part into r6 as expected.
Sorry for not testing this more carefully and making you do an extra round.
(In reply to Andreas Krebbel from comment #14) > Sorry for not testing this more carefully and making you do an extra round. Thanks! and no worries. It works now: # g++ -fPIC -fvisibility-inlines-hidden -ffunction-sections -fdata-sections -O2 -fPIC -fno-exceptions -fno-rtti -std=c++14 -mlong-double-128 -march=z13 -mtune=z14 slpvec.C # ./a.out __insertion_sort 0x3ffc76fd590 0x3ffc76fd5c8 0x10006b8 0xdeadbeefcafebabe __insertion_sort 0x3ffc76fd5c8 0x3ffc76fd5c8 0x10006b8 0xdeadbeefcafebabe Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (gcc bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:2072 |