Bug 432309 - Starting to see segfaults in qhull after GCC 4.3 compile
Starting to see segfaults in qhull after GCC 4.3 compile
Status: CLOSED WONTFIX
Product: Fedora
Classification: Fedora
Component: qhull (Show other bugs)
9
All Linux
low Severity low
: ---
: ---
Assigned To: Ralf Corsepius
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-02-11 00:35 EST by Orion Poplawski
Modified: 2009-07-14 12:13 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-07-14 12:13:00 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Traceback from mockbuilding plplot (6.85 KB, text/plain)
2008-02-11 01:32 EST, Ralf Corsepius
no flags Details
patch (1.02 KB, patch)
2008-02-27 16:02 EST, Orion Poplawski
no flags Details | Diff
qh_setappend compiled without -fno-strict-aliasing (46.14 KB, text/plain)
2008-02-27 17:11 EST, Orion Poplawski
no flags Details
qh_setappend compiled with -fno-strict-aliasing (46.12 KB, application/octet-stream)
2008-02-27 17:11 EST, Orion Poplawski
no flags Details
qh_setappend.i preprocessed source. (44.19 KB, text/plain)
2008-02-27 18:49 EST, Orion Poplawski
no flags Details
spec file patch (1.66 KB, patch)
2008-02-29 16:08 EST, Orion Poplawski
no flags Details | Diff
Alias patch (637 bytes, patch)
2008-02-29 16:09 EST, Orion Poplawski
no flags Details | Diff
test script (Source1) (14.13 KB, text/plain)
2008-02-29 16:09 EST, Orion Poplawski
no flags Details

  None (edit)
Description Orion Poplawski 2008-02-11 00:35:41 EST
Description of problem:

I'm trying to build plplot 5.9.0 in devel, but the tests are segfaulting in
qhull.  Appears to be trying to free unallocated memory.  Don't have much info
yet.  Will be looking into it in more detail soon, but thought I'd file a bug
for tracking and to alert any other users.

Version-Release number of selected component (if applicable):
2003.1-9.f9
Comment 1 Ralf Corsepius 2008-02-11 01:01:22 EST
Interesting to see code that hasn't changed much for almost a decade and widely
been used on many platforms to start segfaulting :/
Comment 2 Ralf Corsepius 2008-02-11 01:32:40 EST
Created attachment 294541 [details]
Traceback from mockbuilding plplot
Comment 3 Orion Poplawski 2008-02-24 10:02:23 EST
Some more info:

- You can get a segfault just running some simple qhull commands like "rbox c |
qhull n"

- The segfault goes away if you compile without -O2.

Jakub -  perhaps you could lend a hand as this may be a GCC regression?
 
Comment 4 Orion Poplawski 2008-02-27 16:02:32 EST
Created attachment 296120 [details]
patch

* Wed Feb 27 2008 Orion Poplawski <orion@cora.nwra.com> - 2003.1-10
- Add -fno-strict-aliasing to avoid segfault
- Add simple test to check for segfaults

I can check in and build if you'd like.  Note that there were no warnings from
the compiler about aliasing issues.
Comment 5 Jakub Jelinek 2008-02-27 16:15:15 EST
This is not a fix, just a workaround.  You really should do a binary search
between -O2 and -O2 -fno-strict-aliasing compiled objects to find the
problematic one, then find out which function is a problem.  GCC doesn't (and
can't) warn about all possible aliasing violations.  If you don't see a bug
yourself, you can create a small self-contained testcase from it (add main which
calls the problematic function with the right args, stub anything that it
calls), attach here and I can have a look.
Comment 6 Orion Poplawski 2008-02-27 16:55:37 EST
This appears to be the culprit function.  If I split it out of qset.c and
compile only it with -fno-strict-aliasing, the program runs.

void qh_setappend(setT **setp, void *newelem) {
  int *sizep;
  void **endp;

  if (!newelem)
    return;
  if (!*setp || !*(sizep= SETsizeaddr_(*setp))) {
    qh_setlarger(setp);
    sizep= SETsizeaddr_(*setp);
  }
  *(endp= &((*setp)->e[(*sizep)++ - 1].p))= newelem;
  *(++endp)= NULL;
} /* setappend */

As I noted on the -devel list, I *appears* that the last line is the culprit -
that endp really doesn't get incremented.
Comment 7 Orion Poplawski 2008-02-27 17:11:11 EST
Created attachment 296131 [details]
qh_setappend compiled without -fno-strict-aliasing
Comment 8 Orion Poplawski 2008-02-27 17:11:54 EST
Created attachment 296132 [details]
qh_setappend compiled with -fno-strict-aliasing
Comment 9 Jakub Jelinek 2008-02-27 17:34:16 EST
SETsizeaddr_ must be either a macro or inlined function, a self-contained testcase
would at least need to have setT type definition (and any types that uses),
SETsizeaddr_ definition and qh_setlarget prototype.  Also, can you add main
which calls this with the right arguments to trigger it?  Is qh_setlarger called
or not?
Comment 10 Orion Poplawski 2008-02-27 18:49:00 EST
Created attachment 296142 [details]
qh_setappend.i preprocessed source.

It gets called:
qh_setappend(simplex, maxpoint) ->
qh_setappend ()

setlarger() is not called in this context.

This is with -fno-strict-aliasing:

gdb) print (**simplex)
$13 = {maxsize = 4, e = {{p = 0x8419208, i = 138514952}}}
(gdb) print (*simplex)->e[0].p
$14 = (void *) 0x8419208
(gdb) print (*simplex)->e[1].p
$15 = (void *) 0x8419268
(gdb) print (*simplex)->e[2].p
$16 = (void *) 0x8419238
(gdb) print (*simplex)->e[3].p
$17 = (void *) 0x0
(gdb) print (*simplex)->e[4].p
$18 = (void *) 0x4
(gdb) step
qh_setappend (setp=<value optimized out>, newelem=<value optimized out>) at
qset.c:132
132	  if (!newelem)
(gdb) next
128	void qh_setappend(setT **setp, void *newelem) {
(gdb)
132	  if (!newelem)
(gdb)
128	void qh_setappend(setT **setp, void *newelem) {
(gdb)
132	  if (!newelem)
(gdb)
134	  if (!*setp || !*(sizep= SETsizeaddr_(*setp))) {
(gdb)
138	  *(endp= &((*setp)->e[(*sizep)++ - 1].p))= newelem;
(gdb)
139	  *(++endp)= NULL;
(gdb) up
#1  0x0032bfed in qh_maxsimplex (dim=3, maxpoints=0x8419530, points=0x8419208,
    numpoints=8, simplex=0xbffd8874) at geom2.c:1174
1174	    qh_setappend(simplex, maxpoint);
(gdb) print (*simplex)->e[3].p
$19 = (void *) 0x8419220
(gdb) print (*simplex)->e[4].p
$20 = (void *) 0x4
(gdb) down
#0  qh_setappend (setp=<value optimized out>, newelem=<value optimized out>)
    at qset.c:139
139	  *(++endp)= NULL;
(gdb) next
138	  *(endp= &((*setp)->e[(*sizep)++ - 1].p))= newelem;
(gdb)
140	} /* setappend */
(gdb)
qh_maxsimplex (dim=3, maxpoints=0x8419530, points=0x8419208, numpoints=8,
    simplex=0xbffd8874) at geom2.c:1175
1175	    trace1((qh ferr, "qh_maxsimplex: selected point p%d for %d`th
initial vertex, det=%2.2g\n",
(gdb) print (*simplex)->e[4].p
$21 = (void *) 0x5
(gdb) print (*simplex)->e[3].p
$22 = (void *) 0x8419220


Now, this is with -fno-strict-aliasing:

(gdb) print (*simplex)->e[2].p
$4 = (void *) 0x814f238
(gdb) print (*simplex)->e[3].p
$5 = (void *) 0x0
(gdb) print (*simplex)->e[4].p
$6 = (void *) 0x4
(gdb) step
qh_setappend (setp=0xbffdf074, newelem=0x814f220) at qh_setappend.c:46
46	  if (!newelem)
(gdb)
42	void qh_setappend(setT **setp, void *newelem) {
(gdb)
46	  if (!newelem)
(gdb)
42	void qh_setappend(setT **setp, void *newelem) {
(gdb)
46	  if (!newelem)
(gdb)
48	  if (!*setp || !*(sizep= SETsizeaddr_(*setp))) {
(gdb)
52	  *(endp= &((*setp)->e[(*sizep)++ - 1].p))= newelem;
(gdb)
53	  *(++endp)= NULL;
(gdb)
54	} /* setappend */
(gdb)
qh_maxsimplex (dim=3, maxpoints=0x814f530, points=0x814f208, numpoints=8,
    simplex=0xbffdf074) at geom2.c:1175
1175	    trace1((qh ferr, "qh_maxsimplex: selected point p%d for %d`th
initial vertex, det=%2.2g\n",
(gdb) print (*simplex)->e[2].p
$7 = (void *) 0x814f238
(gdb) print (*simplex)->e[3].p
$8 = (void *) 0x814f220
(gdb) print (*simplex)->e[4].p
$9 = (void *) 0x0

Note that the last entry in the list is properly set to null here.
Comment 11 Jakub Jelinek 2008-02-28 03:10:39 EST
If so, then it is an obvious aliasing violation.  sizep is set to
&(*setp)->e[4].i (so *sizep is 4).  The code then increments *sizep using int
pointer, rather than union, sets endp to &(*setp)->e[3].p, sets endp[0] to newelem
and endp[1] to NULL (these 2 stores are done using void * pointer).
So (*setp)->e[4] is stored using both int and void * pointer, as these 2 types
cannot alias, the compiler can reorder the writes any way it likes, as it can
and does assume they don't alias.  I have no idea what the code really means to
do (either it wants to store NULL, and the increment of (*setp)->e[4].i is not
needed in this case, or the other way around), but the obvious fix is to do at
least one
of the writes using the union.  E.g. instead of:
  *(endp= &((*setp)->e[(*sizep)++ - 1].p))= newelem;
  *(++endp)= NULL;
do say:
  int end_idx = (*sizep)++ - 1;
  (*setp)->e[end_idx].p = newelem;
  (*setp)->e[end_idx + 1].p = (void *) 0;
Here the pointer stores are done using the int/void * union, which can alias int
and so *sizep = *sizep + 1 increment must not be stored after the NULL store.
Comment 12 Orion Poplawski 2008-02-29 16:08:38 EST
Created attachment 296414 [details]
spec file patch

Here's what I propose now.  The attached patch uses Jakub's suggestion and
seems to work.	I've also added a %check section and script which fails without
the patch.  This should hopefully find problems earlier in the future. 

I'd like to get this checked in ASAP (I can do it) so that I can build plplot. 
I'll do it on Tuesday March 4 if I don't hear anything before then.
Comment 13 Orion Poplawski 2008-02-29 16:09:13 EST
Created attachment 296416 [details]
Alias patch
Comment 14 Orion Poplawski 2008-02-29 16:09:43 EST
Created attachment 296417 [details]
test script (Source1)
Comment 15 Alex Lancaster 2008-03-04 06:37:45 EST
Please check in these fixes soon and rebuild as the freeze is approaching and
plplot is still broken in rawhide.
Comment 16 Ralf Corsepius 2008-03-04 07:19:29 EST
(In reply to comment #15)
> Please check in these fixes soon and rebuild as the freeze is approaching and
> plplot is still broken in rawhide.
Thanks for pushing me around ...
Comment 17 Ralf Corsepius 2008-03-04 08:04:09 EST
Koji is in limbo:

http://koji.fedoraproject.org/koji/taskinfo?taskID=490408
Result   BuildrootError: could not init mock buildroot, mock exited with status 30
Comment 18 Ralf Corsepius 2008-03-04 11:01:40 EST
Now that Alex L. is pushing around volunteers, I've pushed an untested package
containing Orion's patch to koji - For reasons unknown to me, this time, koji
didn't bomb out.
Comment 19 Orion Poplawski 2008-03-04 11:49:54 EST
Thank you Ralf.
Comment 20 Alex Lancaster 2008-03-04 14:50:57 EST
(In reply to comment #18)
> Now that Alex L. is pushing around volunteers

I'm not pushing anybody around (I'm a volunteer too, not a RHATer). I was merely
requesting that Orion push the package to avoid the broken deps that means that
users can't install plplot-perl: 

        plplot-perl-5.8.0-10.fc9.i386 requires perl(PDL::GSL::RNG)

but there's no *requirement* that anybody follow my request. ;)   

Orion suggested that he would do this on March 4 in any case.  
Comment 21 Bug Zapper 2008-05-14 01:07:48 EDT
Changing version to '9' as part of upcoming Fedora 9 GA.
More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 22 Bug Zapper 2009-06-09 19:32:14 EDT
This message is a reminder that Fedora 9 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 9.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '9'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 9's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 9 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 23 Bug Zapper 2009-07-14 12:13:00 EDT
Fedora 9 changed to end-of-life (EOL) status on 2009-07-10. Fedora 9 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.