Bug 1677602 - pseudo-RNG mis-compiled with gcc9
Summary: pseudo-RNG mis-compiled with gcc9
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: gcc
Version: rawhide
Hardware: s390x
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: ZedoraTracker 1675181
TreeView+ depends on / blocked
 
Reported: 2019-02-15 10:46 UTC by Dan Horák
Modified: 2019-03-01 13:02 UTC (History)
13 users (show)

Fixed In Version: gcc-9.0.1-0.6.fc30
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-03-01 13:02:29 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
reduced standalone test case (13.32 KB, text/plain)
2019-02-15 10:46 UTC, Dan Horák
no flags Details
preprocessed reduced standalone test case (63.61 KB, text/plain)
2019-02-15 10:46 UTC, Dan Horák
no flags Details


Links
System ID Private Priority Status Summary Last Updated
GNU Compiler Collection 89369 0 P1 RESOLVED [9 Regression] pseudo-RNG miscompiled on s390x-linux with -O2 -march=zEC12 -mtune=z13 starting with r266203 2020-02-05 13:41:56 UTC
IBM Linux Technology Center 175584 0 None None None 2019-08-07 11:14:09 UTC

Description Dan Horák 2019-02-15 10:46:02 UTC
Created attachment 1535113 [details]
reduced standalone test case

Description of problem:
Looks like gcc9 mis-compiles a pseudo-RNG that's part of jemalloc test suite (https://github.com/jemalloc/jemalloc/blob/dev/test/unit/SFMT.c and https://github.com/jemalloc/jemalloc/blob/dev/test/src/SFMT.c)

When the do_recursion() function in the attached source code is compiled with -O0, then the check passes.


Version-Release number of selected component (if applicable):
gcc-9.0.1-0.4.fc30.s390x

How reproducible:


Steps to Reproduce:
1. gcc -o test -O2 -Wall test.c
2. ./test
3.

Actual results:
Output mismatch for i=2
Output mismatch for i=6


Expected results:
no output

Comment 1 Dan Horák 2019-02-15 10:46:37 UTC
Created attachment 1535114 [details]
preprocessed reduced standalone test case

Comment 2 Jakub Jelinek 2019-02-15 13:39:59 UTC
Needs -march=zEC12 -mtune=z13 -O2 to reproduce (haven't tried other arches admittedly, the default I had in my cross didn't reproduce it).
Started with http://gcc.gnu.org/r266203 .  Let me bisect it manually which function is affected.

Comment 3 Dan Horák 2019-02-15 13:49:41 UTC
do_recursion() breaks the generator

Comment 4 Jakub Jelinek 2019-02-15 13:52:02 UTC
Yeah, verified that too, taking r266202 produced code (which works) and patching r266203 do_recursion makes it fail, while r266203 init_gen_rand and init_by_array are fine (the only two other changed functions).

Comment 5 Jakub Jelinek 2019-02-15 14:42:24 UTC
Reduced testcase:
#if __SIZEOF_INT__ == 4 && __SIZEOF_LONG_LONG__ == 8 && __CHAR_BIT__ == 8
struct S { unsigned int u[4]; };

static void
foo (struct S *out, struct S const *in, int shift)
{
  unsigned long long th, tl, oh, ol;
  th = ((unsigned long long) in->u[3] << 32) | in->u[2];
  tl = ((unsigned long long) in->u[1] << 32) | in->u[0];
  oh = th >> (shift * 8);
  ol = tl >> (shift * 8);
  ol |= th << (64 - shift * 8);
  out->u[1] = ol >> 32;
  out->u[0] = ol;
  out->u[3] = oh >> 32;
  out->u[2] = oh;
}

static void
bar (struct S *out, struct S const *in, int shift)
{
  unsigned long long th, tl, oh, ol;
  th = ((unsigned long long) in->u[3] << 32) | in->u[2];
  tl = ((unsigned long long) in->u[1] << 32) | in->u[0];
  oh = th << (shift * 8);
  ol = tl << (shift * 8);
  oh |= tl >> (64 - shift * 8);
  out->u[1] = ol >> 32;
  out->u[0] = ol;
  out->u[3] = oh >> 32;
  out->u[2] = oh;
}

__attribute__((noipa)) static void
baz (struct S *r, struct S *a, struct S *b, struct S *c, struct S *d)
{
  struct S x, y;
  bar (&x, a, 1);
  foo (&y, c, 1);
  r->u[0] = a->u[0] ^ x.u[0] ^ ((b->u[0] >> 11) & 0xdfffffefU) ^ y.u[0] ^ (d->u[0] << 18);
  r->u[1] = a->u[1] ^ x.u[1] ^ ((b->u[1] >> 11) & 0xddfecb7fU) ^ y.u[1] ^ (d->u[1] << 18);
  r->u[2] = a->u[2] ^ x.u[2] ^ ((b->u[2] >> 11) & 0xbffaffffU) ^ y.u[2] ^ (d->u[2] << 18);
  r->u[3] = a->u[3] ^ x.u[3] ^ ((b->u[3] >> 11) & 0xbffffff6U) ^ y.u[3] ^ (d->u[3] << 18);
}

int
main ()
{
  struct S a[] = { { 0x000004d3, 0xbc5448db, 0xf22bde9f, 0xebb44f8f },
		   { 0x03a32799, 0x60be8246, 0xa2d266ed, 0x7aa18536 },
		   { 0x15a38518, 0xcf655ce1, 0xf3e09994, 0x50ef69fe },
		   { 0x88274b07, 0xe7c94866, 0xc0ea9f47, 0xb6a83c43 },
		   { 0xcd0d0032, 0x5d47f5d7, 0x5a0afbf6, 0xaea87b24 },
		   { 0, 0, 0, 0 } };
  baz (&a[5], &a[0], &a[1], &a[2], &a[3]);
  if (a[4].u[0] != a[5].u[0] || a[4].u[1] != a[5].u[1]
      || a[4].u[2] != a[5].u[2] || a[4].u[3] != a[5].u[3])
    __builtin_abort ();
  return 0;
}
#else
int
main ()
{
  return 0;
}
#endif

Comment 6 Jakub Jelinek 2019-02-15 14:52:31 UTC
When miscompiled, a[5].u[2] is 0xa40afbf6 instead of expected 0x5a0afbf6 (different upper byte).

Comment 7 Jakub Jelinek 2019-02-15 17:14:12 UTC
So, I think the problem is in the
        rxsbg   %r1,%r11,40,63,56
instruction, %r11 holds the right value here of 0x50ef69fef3e09994ULL and we want to perform %r1_SI ^= (SI) (%r11_DI >> 8), but instead of xoring in 0xfef3e099 it xors in 0xf3e099.
In *.final it is:
(insn 67 65 68 2 (parallel [
            (set (reg:SI 1 %r1 [189])
                (xor:SI (subreg:SI (zero_extract:DI (reg/v:DI 11 %r11 [orig:89 th ] [89])
                            (const_int 32 [0x20])
                            (const_int 24 [0x18])) 4)
                    (reg:SI 1 %r1 [187])))
            (clobber (reg:CC 33 %cc))
        ]) "rh1677602.c":42:73 1415 {*rxsbg_sidi_srl}
     (expr_list:REG_DEAD (reg/v:DI 11 %r11 [orig:89 th ] [89])
        (expr_list:REG_UNUSED (reg:CC 33 %cc)
            (nil))))
which looks probably good, zero extract counts the bits in memory order, so if we have a 64-bit number big-endian, we want to skip first 24 bits, then use 32 bits and finally skip last 8 bits.

Comment 8 Jakub Jelinek 2019-03-01 13:02:29 UTC
Should be fixed in gcc-9.0.1-0.6.fc30.s390x and later.


Note You need to log in before you can comment on or make changes to this bug.