Bug 1094975 - Using GCC 4.9.0-8 with opt flags to build Henry Spencer regex lib causes test failure on ppc64 and s390(x)
Using GCC 4.9.0-8 with opt flags to build Henry Spencer regex lib causes test...
Status: CLOSED EOL
Product: Fedora
Classification: Fedora
Component: gcc (Show other bugs)
22
ppc64 Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Jakub Jelinek
Fedora Extras Quality Assurance
:
Depends On:
Blocks: ZedoraTracker PPCTracker
  Show dependency treegraph
 
Reported: 2014-05-06 15:43 EDT by Terje Røsten
Modified: 2016-07-19 07:28 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-07-19 07:28:33 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
regexec.i (119.00 KB, text/plain)
2014-06-11 09:39 EDT, Jakub Jelinek
no flags Details

  None (edit)
Description Terje Røsten 2014-05-06 15:43:59 EDT
Description of problem:

When building MySQL 5.6.17 with any kind of optimizations
(O1/O2/O3) and running 

make test VERBOSE=1

Test 12 (which I think executes mysql-5.6.17/build/regex/re -I)
just hangs.

When build without optimizations, make test and rest of
MySQL test suite passes.

Version-Release number of selected component (if applicable):

gcc-4.9.0-1
community-mysql-5.6.17-2

How reproduce:

Build MySQL 5.6.17 on ppc or ppc64 and run make test VERBOSE=1

Btw: gcc 4.8 works fine.
Comment 1 Terje Røsten 2014-05-07 14:03:06 EDT
Similar report: 
 https://bugzilla.redhat.com/show_bug.cgi?id=1095292
Comment 2 Jakub Jelinek 2014-05-12 08:34:24 EDT
Assigning this to gcc is way too premature, it is more likely a bug in the package itself, even the fact that with gcc 4.8 it works fine and with 4.9 it doesn't doesn't change anything on that.
Try to build it with -fsanitize=undefined and/or -fsanitize=address, try some of the -fno-aggressive-loop-optimizations -fno-strict-aliasing -fwrapv options, if none of these show any issues with it, see if it works e.g. with gcc 4.9 -O0 and if it does, bisect which *.o file matters for the test failure (always link some subset of object files built with -O0 and some built normally to narrow it down to a particular object file with which ideally it fails if it is built with e.g. -O2 and everything else is built with -O0, and it works when it is built with -O0 and everything else is built with -O2).
Look at -Wall -W warnings on that file, try -fno-inline, perhaps bisect with __attribute__((optimize (0))) on some functions, and/or debug in which function things go wrong.  If you haven't discovered any bugs on the package side with that, reassign back to gcc with all the details you've discovered.
Comment 3 Dan Horák 2014-05-29 03:16:43 EDT
same problem happens also on s390/s390x
Comment 4 Jeff Law 2014-05-29 12:16:48 EDT
I'd also suggest building with -fno-delete-null-pointer-checks.  We've seen many applications failing due to improvements in the code related to removing useless null pointer checks.

See "Null pointer checks may be optimized away more aggressively":

https://gcc.gnu.org/gcc-4.9/porting_to.html

We've seen several applications fail due to that issue.

Jeff
Comment 5 Terje Røsten 2014-05-29 15:13:32 EDT
Thanks Jeff, with -fno-delete-null-pointer-checks added I get segfaults for
tests hanging without -fno-delete-null-pointer-checks. 

Hard to say more without access to ppc hardware.
Comment 7 Jeff Law 2014-05-29 16:40:16 EDT
BTW, in regards to c#1, the referenced bug report likely has nothing to do with the problem we're seeing with mysql.  That referenced report really looks like a zorba problem -- most likely due to zorba mis-managing heap memory.

I'm quickly putting together some support in memstomp for detecting cases where code passes NULL to routines where that's not allowed by ISO.  Once I commit that change and rebuild memstomp, an interesting test would be to install the memstomp preload, then run the build/test of mysql and see if it flags anything.

Anyway, back to packing the update to memstomp :-)
Comment 8 Terje Røsten 2014-06-05 11:25:17 EDT
Hi guys,

the problematic file is regex/regexec.c, compiling this file when any optimizations (O1 or higher), the unit test fails.

What's strange is that compiling with all flags included in -O1 (according to
gcc manual page) things works, for example this works:

  -Wall -Werror \
  -fauto-inc-dec -fcompare-elim -fcprop-registers -fdce -fdefer-pop  \
  -fdse -fguess-branch-probability -fif-conversion2 \
  -fif-conversion -fipa-pure-const -fipa-profile -fipa-reference 
  -fmergeconstants \
  -fsplit-wide-types -ftree-bit-ccp \
  -ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-copyrename -ftree-dce \
  -ftree-dominator-opts -ftree-dse -ftree-forwprop -ftree-fre \
  -ftree-phiprop -ftree-slsr -ftree-sra -ftree-pta -ftree-ter -funit-at-a-time \
  -Wextra -Wformat-security \
  -Wvla -Wwrite-strings -Wdeclaration-after-statement \
  -g -fabi-version=2 -fno-omit-frame-pointer -fno-strict-aliasing \
  -fPIC

while

   -Wall -Werror \
   -O1 \
  -Wextra -Wformat-security \
  -Wvla -Wwrite-strings -Wdeclaration-after-statement \
  -g -fabi-version=2 -fno-omit-frame-pointer -fno-strict-aliasing \
  -fPIC

make the unit test fail. When failing (using -O1), adding 

-fno-aggressive-loop-optimizations -fwrapv -fno-delete-null-pointer-checks
-fno-inline

have no effect, still trouble. memstomp did not find anything wrong.


Any ideas?


Note: when the test fails, the test driver segfaults, that's due to error in driver.
Simple fix to that issue:

--- regex/main.c
+++ regex/main.c
@@ -517,7 +517,6 @@
        }
 
        len = (size_t)(sub.rm_eo - sub.rm_so);
-       shlen = strlen(should);
        p = str + sub.rm_so;
 
        /* check for not supposed to match */
@@ -526,6 +525,8 @@
                return(grump);
        }
 
+       shlen = strlen(should);
+
        /* check for wrong match */
        if (len != shlen || strncmp(p, should, shlen) != 0) {
           sprintf(grump, "matched `%.*s' instead", (int)len, p);
Comment 9 Terje Røsten 2014-06-06 07:14:30 EDT
Did some more digging, using the __attribute__((optimize (0))) hint from Jacub,
I was able to pass the complete test suite by just adding patch below. Adding
patch to mysql package is a simple workaround to get mysql working on ppc and
s390 with gcc 4.9.0.

I am tempted to reassign bug to gcc? Agree?

--- regex/engine.c
+++ regex/engine.c
@@ -235,6 +235,14 @@
  == static char *dissect(register struct match *m, char *start, \
  ==    char *stop, sopno startst, sopno stopst);
  */
+
+static char * dissect(const CHARSET_INFO *charset,
+                     struct match *m,
+                     char *start,
+                     char *stop,
+                     sopno startst,
+                     sopno stopst) __attribute__((optimize (0)));
+
 static char *                  /* == stop (success) always */
 dissect(charset, m, start, stop, startst, stopst)
 const CHARSET_INFO *charset;
Comment 10 Terje Røsten 2014-06-11 05:57:11 EDT
The sources gcc have trouble with is from a regex lib, in fact the famous
Henry Spencer regex lib.

Using these sources a more simple reproducer of the issue is available.

Download the regex lib from http://www.arglist.com/regex/ , for example:

 http://www.arglist.com/regex/files/regex3.8a.tar.gz

Unpack and apply the following patch:


diff --git a/Makefile b/Makefile
index 3882b37..a33f4ca 100644
--- a/Makefile
+++ b/Makefile
@@ -4,7 +4,7 @@
 # Put -Dconst= in for a pre-ANSI compiler.
 # Do not take -DPOSIX_MISTAKE out.
 # REGCFLAGS isn't important to you (it's for my use in some special contexts).
-CFLAGS=-I. -DPOSIX_MISTAKE -DREDEBUG $(REGCFLAGS)
+CFLAGS= -O1 -I. -DPOSIX_MISTAKE -DREDEBUG $(REGCFLAGS)

diff --git a/failed.test b/failed.test
new file mode 100644
index 0000000..71f9c60
--- /dev/null
+++ b/failed.test
@@ -0,0 +1 @@
+[[:alnum:]]+   -       -%@a0X- a0X


Compile with "make", ignore test failure.

Run single problematic test case:

./re < failed.test

Note test failure.

Save re to re.bogus.

Apply the following patch on top:

diff --git a/engine.c b/engine.c
index 919fe3f..5743bc7 100644
--- a/engine.c
+++ b/engine.c
@@ -333,6 +333,10 @@ sopno stopst;
                        oldssp = ssp;
                        for (;;) {      /* find last match of innards */
                                sep = slow(m, ssp, rest, ssub, esub);
+                               if (ssp == NULL) {
+                                       printf("%s\n", "ssp NULL");
+                                       break;
+                               }
                                if (sep == NULL || sep == ssp)
                                        break;  /* failed or matched null */
                                oldssp = ssp;   /* on to next try */


Recompile with make. Try problematic test case again:

./re < failed.test

Now is works, however note added printf() is never reached.

This seems to be a gcc bug to me, changing component and title.
Comment 11 Jakub Jelinek 2014-06-11 09:39:49 EDT
Created attachment 907677 [details]
regexec.i

Bisected (with x86_64-linux -> s390x-linux cross-compiler) to
http://gcc.gnu.org/r205279
when regexec.i is compiled with -O1, copyrename3 is identical between r205278 and r205279, dom1 starts to differ in the sdissect function.  That of course doesn't tell anything where the bug is (not even if in gcc or engine.c), but could help us narrow it down.  Jeff, this was your patch, can you please have a look?
Comment 12 Jakub Jelinek 2014-06-11 11:09:27 EDT
The problematic case is the loop with the sslow call:

  <bb 36>:
  # start_11 = PHI <start_157(35), [engine.c : 339:9] start_74(38)>
  # start_14 = PHI <start_157(35), [engine.c : 338:12] start_11(38)>
  [engine.c : 335:9] start_74 = sslow (m_24(D), start_11, start_68, ssub_72, startst_3);
  [engine.c : 336:8] if (start_74 == 0B)
    goto <bb 39>;
  else
    goto <bb 37>;
  
  <bb 37>:
  [engine.c : 336:28] if (start_74 == start_11)
    goto <bb 39>;
  else
    goto <bb 38>;
  
  <bb 38>:
  [engine.c : 340 : 4] goto <bb 36>;

  <bb 39>:
  [engine.c : 341:7] if (start_74 == 0B)
    goto <bb 40>;
  else
    goto <bb 41>;
  
  <bb 40>:
  [engine.c : 343:9] start_75 = start_11;
  [engine.c : 344:9] start_76 = start_14;
  
  <bb 41>:
  # start_12 = PHI <start_11(39), [engine.c : 344:9] start_76(40)>
  # start_13 = PHI <[engine.c : 335:9] start_74(39), [engine.c : 343:9] start_75(40)>
...

in *.copyrename3, note the not so common PHIs where the second PHI depends on the SSA_NAME set by the first PHI, but I believe all PHIs are supposed to be evaluated concurrently and thus in this case it should mean start_14 remembers the value from the previous iteration, so e.g. on line 336 start_74 is the return value from sslow in the current iteration, start_11 from the previous iteration (or start_157 if it is the first iteration) and start_14 is sslow returned value from 2 iterations ago (or start_157 if it is 1st or 2nd iteration).

Now, dom1 r205279 makes:

  <bb 34>:
  # start_11 = PHI <start_157(33)>
  # start_14 = PHI <start_157(33)>
  [engine.c : 335:9] start_74 = sslow (m_24(D), start_11, start_68, ssub_72, startst_3);
  [engine.c : 336:8] if (start_74 == 0B)
    goto <bb 37>;
  else
    goto <bb 35>;

  <bb 35>:
  [engine.c : 336:28] if (start_74 == start_11)
    goto <bb 36>;
  else
    goto <bb 84>;

  <bb 84>:
  # start_1 = PHI <[engine.c : 335:9] start_74(35)>
  # start_2 = PHI <start_11(35)>
  [engine.c : 335:9] start_32 = sslow (m_24(D), start_1, start_68, ssub_72, startst_3);
  goto <bb 36>;

  <bb 36>:
  # start_162 = PHI <[engine.c : 335:9] start_1(84), start_11(35)>
  # start_164 = PHI <start_2(84), start_14(35)>
  # start_171 = PHI <[engine.c : 335:9] start_32(84), [engine.c : 335:9] start_74(35)>
  [engine.c : 341:7] if (start_171 == 0B)
    goto <bb 37>;
  else
    goto <bb 38>;

  <bb 37>:
  # start_163 = PHI <start_162(36), start_11(34)>
  # start_165 = PHI <start_164(36), start_14(34)>
  [engine.c : 343:9] start_75 = start_163;
  [engine.c : 344:9] start_76 = start_165;

  <bb 38>:
  # start_12 = PHI <start_162(36), start_165(37)>
  # start_13 = PHI <start_171(36), start_163(37)>
...

The loop is gone, instead of calling sslow until it returns NULL or returns a value equal to the second argument it will call sslow once, and if it doesn't return NULL or value equal to the second argument, it will call sslow once more.
Comment 13 Jeff Law 2014-06-11 12:41:29 EDT
There's a reasonable chance a recent fix to this code I installed on the trunk will fix this bug.  Let me do some testing.
Comment 14 Jeff Law 2014-06-11 16:40:46 EDT
This is effectively the same as upstream 61009.  It's not fixed by 61009 as this test shows another way we might process part of a block and as a result fail to process statements which would invalidate previously recorded equivalences.

Testing a fix now.
Comment 15 Jeff Law 2014-06-12 14:39:52 EDT
I've back ported the fix for this to the upstream 4.9 branch.  The next time Jakub syncs up gcc and pushes a build into koji this problem should be fixed.
Comment 16 Terje Røsten 2014-06-12 15:25:13 EDT
Thanks for the very quick analysis and fix!
Comment 17 Jaroslav Reznik 2015-03-03 10:46:44 EST
This bug appears to have been reported against 'rawhide' during the Fedora 22 development cycle.
Changing version to '22'.

More information and reason for this action is here:
https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora22
Comment 18 Fedora End Of Life 2016-07-19 07:28:33 EDT
Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.