Bug 1677602

Summary:

pseudo-RNG mis-compiled with gcc9

Product:

[Fedora] Fedora

Reporter:

Dan Horák <dan>

Component:

gcc

Assignee:

Jakub Jelinek <jakub>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

rawhide

CC:

aoliva, bugproxy, davejohansen, dmalcolm, fweimer, hannsj_uhl, ingvar, jakub, jwakely, law, mpolacek, msebor, nickc

Target Milestone:

---

Target Release:

---

Hardware:

s390x

OS:

Linux

Whiteboard:

Fixed In Version:

gcc-9.0.1-0.6.fc30

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2019-03-01 13:02:29 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

467765, 1675181

Attachments:

Description	Flags
reduced standalone test case	none
preprocessed reduced standalone test case	none

Description Dan Horák 2019-02-15 10:46:02 UTC

Created attachment 1535113 [details]
reduced standalone test case

Description of problem:
Looks like gcc9 mis-compiles a pseudo-RNG that's part of jemalloc test suite (https://github.com/jemalloc/jemalloc/blob/dev/test/unit/SFMT.c and https://github.com/jemalloc/jemalloc/blob/dev/test/src/SFMT.c)

When the do_recursion() function in the attached source code is compiled with -O0, then the check passes.


Version-Release number of selected component (if applicable):
gcc-9.0.1-0.4.fc30.s390x

How reproducible:


Steps to Reproduce:
1. gcc -o test -O2 -Wall test.c
2. ./test
3.

Actual results:
Output mismatch for i=2
Output mismatch for i=6


Expected results:
no output

Comment 1 Dan Horák 2019-02-15 10:46:37 UTC

Created attachment 1535114 [details]
preprocessed reduced standalone test case

Comment 2 Jakub Jelinek 2019-02-15 13:39:59 UTC

Needs -march=zEC12 -mtune=z13 -O2 to reproduce (haven't tried other arches admittedly, the default I had in my cross didn't reproduce it).
Started with http://gcc.gnu.org/r266203 .  Let me bisect it manually which function is affected.

Comment 3 Dan Horák 2019-02-15 13:49:41 UTC

do_recursion() breaks the generator

Comment 4 Jakub Jelinek 2019-02-15 13:52:02 UTC

Yeah, verified that too, taking r266202 produced code (which works) and patching r266203 do_recursion makes it fail, while r266203 init_gen_rand and init_by_array are fine (the only two other changed functions).

Comment 5 Jakub Jelinek 2019-02-15 14:42:24 UTC

Reduced testcase:
#if __SIZEOF_INT__ == 4 && __SIZEOF_LONG_LONG__ == 8 && __CHAR_BIT__ == 8
struct S { unsigned int u[4]; };

static void
foo (struct S *out, struct S const *in, int shift)
{
  unsigned long long th, tl, oh, ol;
  th = ((unsigned long long) in->u[3] << 32) | in->u[2];
  tl = ((unsigned long long) in->u[1] << 32) | in->u[0];
  oh = th >> (shift * 8);
  ol = tl >> (shift * 8);
  ol |= th << (64 - shift * 8);
  out->u[1] = ol >> 32;
  out->u[0] = ol;
  out->u[3] = oh >> 32;
  out->u[2] = oh;
}

static void
bar (struct S *out, struct S const *in, int shift)
{
  unsigned long long th, tl, oh, ol;
  th = ((unsigned long long) in->u[3] << 32) | in->u[2];
  tl = ((unsigned long long) in->u[1] << 32) | in->u[0];
  oh = th << (shift * 8);
  ol = tl << (shift * 8);
  oh |= tl >> (64 - shift * 8);
  out->u[1] = ol >> 32;
  out->u[0] = ol;
  out->u[3] = oh >> 32;
  out->u[2] = oh;
}

__attribute__((noipa)) static void
baz (struct S *r, struct S *a, struct S *b, struct S *c, struct S *d)
{
  struct S x, y;
  bar (&x, a, 1);
  foo (&y, c, 1);
  r->u[0] = a->u[0] ^ x.u[0] ^ ((b->u[0] >> 11) & 0xdfffffefU) ^ y.u[0] ^ (d->u[0] << 18);
  r->u[1] = a->u[1] ^ x.u[1] ^ ((b->u[1] >> 11) & 0xddfecb7fU) ^ y.u[1] ^ (d->u[1] << 18);
  r->u[2] = a->u[2] ^ x.u[2] ^ ((b->u[2] >> 11) & 0xbffaffffU) ^ y.u[2] ^ (d->u[2] << 18);
  r->u[3] = a->u[3] ^ x.u[3] ^ ((b->u[3] >> 11) & 0xbffffff6U) ^ y.u[3] ^ (d->u[3] << 18);
}

int
main ()
{
  struct S a[] = { { 0x000004d3, 0xbc5448db, 0xf22bde9f, 0xebb44f8f },
		   { 0x03a32799, 0x60be8246, 0xa2d266ed, 0x7aa18536 },
		   { 0x15a38518, 0xcf655ce1, 0xf3e09994, 0x50ef69fe },
		   { 0x88274b07, 0xe7c94866, 0xc0ea9f47, 0xb6a83c43 },
		   { 0xcd0d0032, 0x5d47f5d7, 0x5a0afbf6, 0xaea87b24 },
		   { 0, 0, 0, 0 } };
  baz (&a[5], &a[0], &a[1], &a[2], &a[3]);
  if (a[4].u[0] != a[5].u[0] || a[4].u[1] != a[5].u[1]
      || a[4].u[2] != a[5].u[2] || a[4].u[3] != a[5].u[3])
    __builtin_abort ();
  return 0;
}
#else
int
main ()
{
  return 0;
}
#endif

Comment 6 Jakub Jelinek 2019-02-15 14:52:31 UTC

When miscompiled, a[5].u[2] is 0xa40afbf6 instead of expected 0x5a0afbf6 (different upper byte).

Comment 7 Jakub Jelinek 2019-02-15 17:14:12 UTC

So, I think the problem is in the
        rxsbg   %r1,%r11,40,63,56
instruction, %r11 holds the right value here of 0x50ef69fef3e09994ULL and we want to perform %r1_SI ^= (SI) (%r11_DI >> 8), but instead of xoring in 0xfef3e099 it xors in 0xf3e099.
In *.final it is:
(insn 67 65 68 2 (parallel [
            (set (reg:SI 1 %r1 [189])
                (xor:SI (subreg:SI (zero_extract:DI (reg/v:DI 11 %r11 [orig:89 th ] [89])
                            (const_int 32 [0x20])
                            (const_int 24 [0x18])) 4)
                    (reg:SI 1 %r1 [187])))
            (clobber (reg:CC 33 %cc))
        ]) "rh1677602.c":42:73 1415 {*rxsbg_sidi_srl}
     (expr_list:REG_DEAD (reg/v:DI 11 %r11 [orig:89 th ] [89])
        (expr_list:REG_UNUSED (reg:CC 33 %cc)
            (nil))))
which looks probably good, zero extract counts the bits in memory order, so if we have a 64-bit number big-endian, we want to skip first 24 bits, then use 32 bits and finally skip last 8 bits.

Comment 8 Jakub Jelinek 2019-03-01 13:02:29 UTC

Should be fixed in gcc-9.0.1-0.6.fc30.s390x and later.