Hide Forgot
Description of problem: I'm trying to build plplot 5.9.0 in devel, but the tests are segfaulting in qhull. Appears to be trying to free unallocated memory. Don't have much info yet. Will be looking into it in more detail soon, but thought I'd file a bug for tracking and to alert any other users. Version-Release number of selected component (if applicable): 2003.1-9.f9
Interesting to see code that hasn't changed much for almost a decade and widely been used on many platforms to start segfaulting :/
Created attachment 294541 [details] Traceback from mockbuilding plplot
Some more info: - You can get a segfault just running some simple qhull commands like "rbox c | qhull n" - The segfault goes away if you compile without -O2. Jakub - perhaps you could lend a hand as this may be a GCC regression?
Created attachment 296120 [details] patch * Wed Feb 27 2008 Orion Poplawski <orion.com> - 2003.1-10 - Add -fno-strict-aliasing to avoid segfault - Add simple test to check for segfaults I can check in and build if you'd like. Note that there were no warnings from the compiler about aliasing issues.
This is not a fix, just a workaround. You really should do a binary search between -O2 and -O2 -fno-strict-aliasing compiled objects to find the problematic one, then find out which function is a problem. GCC doesn't (and can't) warn about all possible aliasing violations. If you don't see a bug yourself, you can create a small self-contained testcase from it (add main which calls the problematic function with the right args, stub anything that it calls), attach here and I can have a look.
This appears to be the culprit function. If I split it out of qset.c and compile only it with -fno-strict-aliasing, the program runs. void qh_setappend(setT **setp, void *newelem) { int *sizep; void **endp; if (!newelem) return; if (!*setp || !*(sizep= SETsizeaddr_(*setp))) { qh_setlarger(setp); sizep= SETsizeaddr_(*setp); } *(endp= &((*setp)->e[(*sizep)++ - 1].p))= newelem; *(++endp)= NULL; } /* setappend */ As I noted on the -devel list, I *appears* that the last line is the culprit - that endp really doesn't get incremented.
Created attachment 296131 [details] qh_setappend compiled without -fno-strict-aliasing
Created attachment 296132 [details] qh_setappend compiled with -fno-strict-aliasing
SETsizeaddr_ must be either a macro or inlined function, a self-contained testcase would at least need to have setT type definition (and any types that uses), SETsizeaddr_ definition and qh_setlarget prototype. Also, can you add main which calls this with the right arguments to trigger it? Is qh_setlarger called or not?
Created attachment 296142 [details] qh_setappend.i preprocessed source. It gets called: qh_setappend(simplex, maxpoint) -> qh_setappend () setlarger() is not called in this context. This is with -fno-strict-aliasing: gdb) print (**simplex) $13 = {maxsize = 4, e = {{p = 0x8419208, i = 138514952}}} (gdb) print (*simplex)->e[0].p $14 = (void *) 0x8419208 (gdb) print (*simplex)->e[1].p $15 = (void *) 0x8419268 (gdb) print (*simplex)->e[2].p $16 = (void *) 0x8419238 (gdb) print (*simplex)->e[3].p $17 = (void *) 0x0 (gdb) print (*simplex)->e[4].p $18 = (void *) 0x4 (gdb) step qh_setappend (setp=<value optimized out>, newelem=<value optimized out>) at qset.c:132 132 if (!newelem) (gdb) next 128 void qh_setappend(setT **setp, void *newelem) { (gdb) 132 if (!newelem) (gdb) 128 void qh_setappend(setT **setp, void *newelem) { (gdb) 132 if (!newelem) (gdb) 134 if (!*setp || !*(sizep= SETsizeaddr_(*setp))) { (gdb) 138 *(endp= &((*setp)->e[(*sizep)++ - 1].p))= newelem; (gdb) 139 *(++endp)= NULL; (gdb) up #1 0x0032bfed in qh_maxsimplex (dim=3, maxpoints=0x8419530, points=0x8419208, numpoints=8, simplex=0xbffd8874) at geom2.c:1174 1174 qh_setappend(simplex, maxpoint); (gdb) print (*simplex)->e[3].p $19 = (void *) 0x8419220 (gdb) print (*simplex)->e[4].p $20 = (void *) 0x4 (gdb) down #0 qh_setappend (setp=<value optimized out>, newelem=<value optimized out>) at qset.c:139 139 *(++endp)= NULL; (gdb) next 138 *(endp= &((*setp)->e[(*sizep)++ - 1].p))= newelem; (gdb) 140 } /* setappend */ (gdb) qh_maxsimplex (dim=3, maxpoints=0x8419530, points=0x8419208, numpoints=8, simplex=0xbffd8874) at geom2.c:1175 1175 trace1((qh ferr, "qh_maxsimplex: selected point p%d for %d`th initial vertex, det=%2.2g\n", (gdb) print (*simplex)->e[4].p $21 = (void *) 0x5 (gdb) print (*simplex)->e[3].p $22 = (void *) 0x8419220 Now, this is with -fno-strict-aliasing: (gdb) print (*simplex)->e[2].p $4 = (void *) 0x814f238 (gdb) print (*simplex)->e[3].p $5 = (void *) 0x0 (gdb) print (*simplex)->e[4].p $6 = (void *) 0x4 (gdb) step qh_setappend (setp=0xbffdf074, newelem=0x814f220) at qh_setappend.c:46 46 if (!newelem) (gdb) 42 void qh_setappend(setT **setp, void *newelem) { (gdb) 46 if (!newelem) (gdb) 42 void qh_setappend(setT **setp, void *newelem) { (gdb) 46 if (!newelem) (gdb) 48 if (!*setp || !*(sizep= SETsizeaddr_(*setp))) { (gdb) 52 *(endp= &((*setp)->e[(*sizep)++ - 1].p))= newelem; (gdb) 53 *(++endp)= NULL; (gdb) 54 } /* setappend */ (gdb) qh_maxsimplex (dim=3, maxpoints=0x814f530, points=0x814f208, numpoints=8, simplex=0xbffdf074) at geom2.c:1175 1175 trace1((qh ferr, "qh_maxsimplex: selected point p%d for %d`th initial vertex, det=%2.2g\n", (gdb) print (*simplex)->e[2].p $7 = (void *) 0x814f238 (gdb) print (*simplex)->e[3].p $8 = (void *) 0x814f220 (gdb) print (*simplex)->e[4].p $9 = (void *) 0x0 Note that the last entry in the list is properly set to null here.
If so, then it is an obvious aliasing violation. sizep is set to &(*setp)->e[4].i (so *sizep is 4). The code then increments *sizep using int pointer, rather than union, sets endp to &(*setp)->e[3].p, sets endp[0] to newelem and endp[1] to NULL (these 2 stores are done using void * pointer). So (*setp)->e[4] is stored using both int and void * pointer, as these 2 types cannot alias, the compiler can reorder the writes any way it likes, as it can and does assume they don't alias. I have no idea what the code really means to do (either it wants to store NULL, and the increment of (*setp)->e[4].i is not needed in this case, or the other way around), but the obvious fix is to do at least one of the writes using the union. E.g. instead of: *(endp= &((*setp)->e[(*sizep)++ - 1].p))= newelem; *(++endp)= NULL; do say: int end_idx = (*sizep)++ - 1; (*setp)->e[end_idx].p = newelem; (*setp)->e[end_idx + 1].p = (void *) 0; Here the pointer stores are done using the int/void * union, which can alias int and so *sizep = *sizep + 1 increment must not be stored after the NULL store.
Created attachment 296414 [details] spec file patch Here's what I propose now. The attached patch uses Jakub's suggestion and seems to work. I've also added a %check section and script which fails without the patch. This should hopefully find problems earlier in the future. I'd like to get this checked in ASAP (I can do it) so that I can build plplot. I'll do it on Tuesday March 4 if I don't hear anything before then.
Created attachment 296416 [details] Alias patch
Created attachment 296417 [details] test script (Source1)
Please check in these fixes soon and rebuild as the freeze is approaching and plplot is still broken in rawhide.
(In reply to comment #15) > Please check in these fixes soon and rebuild as the freeze is approaching and > plplot is still broken in rawhide. Thanks for pushing me around ...
Koji is in limbo: http://koji.fedoraproject.org/koji/taskinfo?taskID=490408 Result BuildrootError: could not init mock buildroot, mock exited with status 30
Now that Alex L. is pushing around volunteers, I've pushed an untested package containing Orion's patch to koji - For reasons unknown to me, this time, koji didn't bomb out.
Thank you Ralf.
(In reply to comment #18) > Now that Alex L. is pushing around volunteers I'm not pushing anybody around (I'm a volunteer too, not a RHATer). I was merely requesting that Orion push the package to avoid the broken deps that means that users can't install plplot-perl: plplot-perl-5.8.0-10.fc9.i386 requires perl(PDL::GSL::RNG) but there's no *requirement* that anybody follow my request. ;) Orion suggested that he would do this on March 4 in any case.
Changing version to '9' as part of upcoming Fedora 9 GA. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
This message is a reminder that Fedora 9 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 9. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '9'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 9's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 9 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Fedora 9 changed to end-of-life (EOL) status on 2009-07-10. Fedora 9 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.