Bug 2273618

Summary: Optimizing with -O2 causes wrong results on s390x
Product: [Fedora] Fedora Reporter: Jonas Ådahl <jadahl>
Component: gccAssignee: Jakub Jelinek <jakub>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 40CC: dan, dmalcolm, fweimer, jakub, jlaw, josmyers, jwakely, mcermak, mpolacek, msebor, nickc, nixuser, sipoyare
Target Milestone: ---   
Target Release: ---   
Hardware: s390x   
OS: Linux   
Whiteboard:
Fixed In Version: gcc-14.0.1-0.14.fc41 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-04-12 13:45:04 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 467765    
Attachments:
Description Flags
Reproducer none

Description Jonas Ådahl 2024-04-05 11:00:05 UTC
When investigating faulty rendering in GNOME Shell when running under s390x, I eventually discovered that compiling mutter with -O0 made the issue go away.

Eventually I narrowed it down to a function that did a memcpy from a local float array to a stack allocated float array in a callee.

I could also work around it in three ways:

* #pragma GCC optimize ("O0") around the affected function.
* Mark the float array copied from as volatile
* Switch the memcpy to a for loop

With that in mind, I took the relevant code, removed as much as I could while still reproducing. It isn't only the memcpy; e.g. it needs a bit of noise to make it reproduce.

Attaching reproducing C file. When running, if it doesn't reproduce, it exits cleanly. If it reproduces it'll print

1.000000 == 0.000000 failed
Aborted (core dumped)

The three discovered workarounds are included in the C file, hidden behind `#if 0`.

Reproducible: Always

Comment 1 Jonas Ådahl 2024-04-05 11:00:46 UTC
Created attachment 2025354 [details]
Reproducer

Comment 2 Dan Horák 2024-04-05 11:45:42 UTC
Jonas, could you make also the attachment public? Thanks.

Comment 3 Jonas Ådahl 2024-04-05 11:55:08 UTC
(In reply to Dan Horák from comment #2)
> Jonas, could you make also the attachment public? Thanks.

Done; sorry about that.

Comment 4 Dan Horák 2024-04-05 12:05:02 UTC
Thanks and for the record it reproduces on z14 with gcc-14.0.1-0.13.fc41.s390x, but not with gcc-13.2.1-4.fc38.s390x

Comment 5 Jakub Jelinek 2024-04-05 12:11:44 UTC
Simplified for -march=z13 -O0:

typedef struct { const float *a; int b, c; float *d; } S;

__attribute__((noipa)) void
bar (void)
{
}

__attribute__((noinline, optimize (2))) static void
foo (S *e)
{
  const float *f;
  float *g;
  float h[4] = { 0.0, 0.0, 1.0, 1.0 };
  if (!e->b)
    f = h;
  else
    f = e->a;
  g = &e->d[0];
  __builtin_memcpy (g, f, sizeof (float) * 4);
  bar ();
  if (!e->b)
    if (g[0] != 0.0 || g[1] != 0.0 || g[2] != 1.0 || g[3] != 1.0)
      __builtin_abort ();
}

int
main ()
{
  float d[4];
  S e = { .d = d };
  foo (&e);
  return 0;
}

Bisecting now.

Comment 6 Jakub Jelinek 2024-04-05 13:10:07 UTC
Bisected to https://gcc.gnu.org/r14-5831