Created attachment 1535113 [details] reduced standalone test case Description of problem: Looks like gcc9 mis-compiles a pseudo-RNG that's part of jemalloc test suite (https://github.com/jemalloc/jemalloc/blob/dev/test/unit/SFMT.c and https://github.com/jemalloc/jemalloc/blob/dev/test/src/SFMT.c) When the do_recursion() function in the attached source code is compiled with -O0, then the check passes. Version-Release number of selected component (if applicable): gcc-9.0.1-0.4.fc30.s390x How reproducible: Steps to Reproduce: 1. gcc -o test -O2 -Wall test.c 2. ./test 3. Actual results: Output mismatch for i=2 Output mismatch for i=6 Expected results: no output
Created attachment 1535114 [details] preprocessed reduced standalone test case
Needs -march=zEC12 -mtune=z13 -O2 to reproduce (haven't tried other arches admittedly, the default I had in my cross didn't reproduce it). Started with http://gcc.gnu.org/r266203 . Let me bisect it manually which function is affected.
do_recursion() breaks the generator
Yeah, verified that too, taking r266202 produced code (which works) and patching r266203 do_recursion makes it fail, while r266203 init_gen_rand and init_by_array are fine (the only two other changed functions).
Reduced testcase: #if __SIZEOF_INT__ == 4 && __SIZEOF_LONG_LONG__ == 8 && __CHAR_BIT__ == 8 struct S { unsigned int u[4]; }; static void foo (struct S *out, struct S const *in, int shift) { unsigned long long th, tl, oh, ol; th = ((unsigned long long) in->u[3] << 32) | in->u[2]; tl = ((unsigned long long) in->u[1] << 32) | in->u[0]; oh = th >> (shift * 8); ol = tl >> (shift * 8); ol |= th << (64 - shift * 8); out->u[1] = ol >> 32; out->u[0] = ol; out->u[3] = oh >> 32; out->u[2] = oh; } static void bar (struct S *out, struct S const *in, int shift) { unsigned long long th, tl, oh, ol; th = ((unsigned long long) in->u[3] << 32) | in->u[2]; tl = ((unsigned long long) in->u[1] << 32) | in->u[0]; oh = th << (shift * 8); ol = tl << (shift * 8); oh |= tl >> (64 - shift * 8); out->u[1] = ol >> 32; out->u[0] = ol; out->u[3] = oh >> 32; out->u[2] = oh; } __attribute__((noipa)) static void baz (struct S *r, struct S *a, struct S *b, struct S *c, struct S *d) { struct S x, y; bar (&x, a, 1); foo (&y, c, 1); r->u[0] = a->u[0] ^ x.u[0] ^ ((b->u[0] >> 11) & 0xdfffffefU) ^ y.u[0] ^ (d->u[0] << 18); r->u[1] = a->u[1] ^ x.u[1] ^ ((b->u[1] >> 11) & 0xddfecb7fU) ^ y.u[1] ^ (d->u[1] << 18); r->u[2] = a->u[2] ^ x.u[2] ^ ((b->u[2] >> 11) & 0xbffaffffU) ^ y.u[2] ^ (d->u[2] << 18); r->u[3] = a->u[3] ^ x.u[3] ^ ((b->u[3] >> 11) & 0xbffffff6U) ^ y.u[3] ^ (d->u[3] << 18); } int main () { struct S a[] = { { 0x000004d3, 0xbc5448db, 0xf22bde9f, 0xebb44f8f }, { 0x03a32799, 0x60be8246, 0xa2d266ed, 0x7aa18536 }, { 0x15a38518, 0xcf655ce1, 0xf3e09994, 0x50ef69fe }, { 0x88274b07, 0xe7c94866, 0xc0ea9f47, 0xb6a83c43 }, { 0xcd0d0032, 0x5d47f5d7, 0x5a0afbf6, 0xaea87b24 }, { 0, 0, 0, 0 } }; baz (&a[5], &a[0], &a[1], &a[2], &a[3]); if (a[4].u[0] != a[5].u[0] || a[4].u[1] != a[5].u[1] || a[4].u[2] != a[5].u[2] || a[4].u[3] != a[5].u[3]) __builtin_abort (); return 0; } #else int main () { return 0; } #endif
When miscompiled, a[5].u[2] is 0xa40afbf6 instead of expected 0x5a0afbf6 (different upper byte).
So, I think the problem is in the rxsbg %r1,%r11,40,63,56 instruction, %r11 holds the right value here of 0x50ef69fef3e09994ULL and we want to perform %r1_SI ^= (SI) (%r11_DI >> 8), but instead of xoring in 0xfef3e099 it xors in 0xf3e099. In *.final it is: (insn 67 65 68 2 (parallel [ (set (reg:SI 1 %r1 [189]) (xor:SI (subreg:SI (zero_extract:DI (reg/v:DI 11 %r11 [orig:89 th ] [89]) (const_int 32 [0x20]) (const_int 24 [0x18])) 4) (reg:SI 1 %r1 [187]))) (clobber (reg:CC 33 %cc)) ]) "rh1677602.c":42:73 1415 {*rxsbg_sidi_srl} (expr_list:REG_DEAD (reg/v:DI 11 %r11 [orig:89 th ] [89]) (expr_list:REG_UNUSED (reg:CC 33 %cc) (nil)))) which looks probably good, zero extract counts the bits in memory order, so if we have a 64-bit number big-endian, we want to skip first 24 bits, then use 32 bits and finally skip last 8 bits.
Should be fixed in gcc-9.0.1-0.6.fc30.s390x and later.