Bug 1310467 - CORD__next() miscompiled on s390(x)
CORD__next() miscompiled on s390(x)
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: gcc (Show other bugs)
24
s390x Unspecified
high Severity high
: ---
: ---
Assigned To: Jakub Jelinek
Fedora Extras Quality Assurance
:
Depends On:
Blocks: ZedoraTracker
  Show dependency treegraph
 
Reported: 2016-02-21 17:54 EST by Dan Horák
Modified: 2016-03-03 04:15 EST (History)
5 users (show)

See Also:
Fixed In Version: gcc-6.0.0-0.14.fc24
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-03-03 04:15:02 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
preprocessed source file (171.13 KB, text/plain)
2016-02-26 04:42 EST, Dan Horák
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
GNU Compiler Collection 70025 None None None 2016-03-01 05:07 EST

  None (edit)
Description Dan Horák 2016-02-21 17:54:35 EST
gc-7.4.2-5.fc24 fails to build with f24 gcc because one test (cordtest) doesn't pass. I have identified
void CORD__next(register CORD_pos p)
from cord/cordbscs.c as being possibly miscompiled (many register variables?), because adding  __attribute__((optimize(("O1")))) to this function makes the test to pass.


some details about the failed test:
[sharkcz@devel10 gc]$ coredumpctl gdb 37090
           PID: 37090 (lt-cordtest)
           UID: 1000 (sharkcz)
           GID: 1000 (sharkcz)
        Signal: 6 (ABRT)
     Timestamp: Sun 2016-02-21 17:51:58 EST (25s ago)
  Command Line: /home/sharkcz/gc/gc-7.4.2/.libs/lt-cordtest
    Executable: /home/sharkcz/gc/gc-7.4.2/.libs/lt-cordtest
 Control Group: /user.slice/user-0.slice/session-3.scope
          Unit: session-3.scope
         Slice: user-0.slice
       Session: 3
     Owner UID: 0 (root)
       Boot ID: 770d208119c1493aa558d7c4dfd84244
    Machine ID: 4973d5bbe8e94395bdca6c6d5116d0f6
      Hostname: devel10.s390.bos.redhat.com
      Coredump: /var/lib/systemd/coredump/core.lt-cordtest.1000.770d208119c1493aa558d7c4dfd84244.37090.1456095118000000.lz4
       Message: Process 37090 (lt-cordtest) of user 1000 dumped core.
                
                Stack trace of thread 37090:
                #0  0x000003fffcecc37a __GI_raise (libc.so.6)
                #1  0x000003fffcecddfa __GI_abort (libc.so.6)
                #2  0x000002aa0066d8f0 test_basics (lt-cordtest)
                #3  0x000002aa0066d400 main (lt-cordtest)
                #4  0x000003fffceb278e __libc_start_main (libc.so.6)
                #5  0x000002aa0066d484 _start (lt-cordtest)

GNU gdb (GDB) Fedora 7.10.50.20160131-50.fc24
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "s390x-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/sharkcz/gc/gc-7.4.2/.libs/lt-cordtest...done.
[New LWP 37090]

warning: Could not load shared library symbols for linux-vdso64.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/home/sharkcz/gc/gc-7.4.2/.libs/lt-cordtest'.
Program terminated with signal SIGABRT, Aborted.
#0  0x000003fffcecc37a in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54	  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) where
#0  0x000003fffcecc37a in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x000003fffcecddfa in __GI_abort () at abort.c:89
#2  0x000002aa0066d8f0 in test_basics () at cord/tests/cordtest.c:62
#3  0x000002aa0066d400 in main () at cord/tests/cordtest.c:241
(gdb) up
#1  0x000003fffcecddfa in __GI_abort () at abort.c:89
89	      raise (SIGABRT);
(gdb) up
#2  0x000002aa0066d8f0 in test_basics () at cord/tests/cordtest.c:62
62	    if (!CORD_IS_STRING(x)) ABORT("short cord should usually be a string");
(gdb) l
57	    CORD y;
58	    CORD_pos p;
59	
60	    x = CORD_cat(x,x);
61	    if (x == CORD_EMPTY) ABORT("CORD_cat(x,x) returned empty cord");
62	    if (!CORD_IS_STRING(x)) ABORT("short cord should usually be a string");
63	    if (strcmp(x, "abab") != 0) ABORT("bad CORD_cat result");
64	
65	    for (i = 1; i < 16; i++) {
66	        x = CORD_cat(x,x);


Version-Release number of selected component (if applicable):
gcc-6.0.0-0.12.fc24.s390x
Comment 1 Jan Kurik 2016-02-24 10:47:26 EST
This bug appears to have been reported against 'rawhide' during the Fedora 24 development cycle.
Changing version to '24'.

More information and reason for this action is here:
https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora24#Rawhide_Rebase
Comment 2 Dan Horák 2016-02-26 04:42 EST
Created attachment 1130804 [details]
preprocessed source file

gcc -DHAVE_CONFIG_H -I./include -I./include -DUSE_GET_STACKBASE_FOR_MAIN -fexceptions -DGC_VISIBILITY_HIDDEN_SET -fvisibility=hidden -Wall -Wextra -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -march=z9-109 -mtune=z10 -fno-strict-aliasing -E cord/cordbscs.c  -fPIC -DPIC -o cord/cordbscs.i
Comment 3 Jakub Jelinek 2016-02-26 06:42:05 EST
I'm afraid that is not enough for a wrong-code, I'd need to know how exactly it is called and how to recognize good vs. bad.
Does it keep failing with O2 and working with O1 if you add __attribute__((noinline, noclone)) to CORD__next?  What about CORD__extend_path, does it have to be inlined for the wrong behavior?
Can you cook up a minimal main that will just call __attribute__((noinline, noclone)) to CORD__next with the right arguments on which it reproduces (of course it is not just the argument structure itself, but whatever other points it uses, just malloc them or set pointers to automatic variables in main also initialized with right values.  Can you add code to main (or to CORD__extend_path if it doesn't have to be inlined) to detect the wrong vs. good behavior and __builtin_abort () if it is wrong?
Comment 4 Dan Horák 2016-02-26 17:34:35 EST
(In reply to Jakub Jelinek from comment #3)
> I'm afraid that is not enough for a wrong-code, I'd need to know how exactly
> it is called and how to recognize good vs. bad.

I know, still working on better test case

> Does it keep failing with O2 and working with O1 if you add
> __attribute__((noinline, noclone)) to CORD__next?  What about
> CORD__extend_path, does it have to be inlined for the wrong behavior?

behaves incorrectly with -O2 even with __attribute__((noinline, noclone)) for both functions

> Can you cook up a minimal main that will just call __attribute__((noinline,
> noclone)) to CORD__next with the right arguments on which it reproduces (of
> course it is not just the argument structure itself, but whatever other
> points it uses, just malloc them or set pointers to automatic variables in
> main also initialized with right values.  Can you add code to main (or to
> CORD__extend_path if it doesn't have to be inlined) to detect the wrong vs.
> good behavior and __builtin_abort () if it is wrong?

the "main" looks like this (reduced from cord/tests/cordtest.c after realizing gdb is lying about the lines), reproduceable also with upstream (https://github.com/ivmai/bdwgc)

char id_cord_fn(size_t i, void * client_data)
{
    if (client_data != 0) ABORT("id_cord_fn: bad client data");
    return((char)i);
}

void test_basics(void)
{
    register int i;
    char c;
    CORD y;
    CORD_pos p;

    y = CORD_from_fn(id_cord_fn, 0, 13);
    i = 0;
    CORD_set_pos(p, y, i);
    while(CORD_pos_valid(p)) {
        c = CORD_pos_fetch(p);
        if(c != i) ABORT("Traversal of function node failed");
    CORD_next(p); i++;
    }
    if (i != 13) ABORT("Bad apparent length for function node");
}

int main(void)
{
    GC_INIT();
    test_basics();
    CORD_fprintf(stdout, "SUCCEEDED\n");
    return(0);
}

result is
[sharkcz@devel10 bdwgc]$ ./cordtest 
FAILED: Traversal of function node failed
Aborted (core dumped)
Comment 5 Jakub Jelinek 2016-03-01 05:07:51 EST
Managed to create small self-contained testcase and bisect, tracking upstream in PR70025.
Comment 6 Dan Horák 2016-03-03 04:07:15 EST
I can confirm that using gcc-6.0.0-0.14.fc24 the test-suite passes.
Comment 7 Jakub Jelinek 2016-03-03 04:15:02 EST
Fixed then.

Note You need to log in before you can comment on or make changes to this bug.