Bug 1599550
| Summary: | Restraint crashes on startup on RHEL6 S/390 | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Retired] Restraint | Reporter: | Matt Tyson 🤬 <mtyson> | ||||||
| Component: | general | Assignee: | beaker-dev-list | ||||||
| Status: | CLOSED NEXTRELEASE | QA Contact: | |||||||
| Severity: | unspecified | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 0.1.35 | CC: | asavkov, azelinka, bpeck, makopec, mastyk | ||||||
| Target Milestone: | 0.1.37 | Keywords: | Regression | ||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2019-01-11 09:53:40 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
After digging through some logs from jobs running on other arches, it seems that s390 / RHEL 6.9 was the only one with the error. s390 RHEL 7.2 was fine. s390/ RHEL 5.11 was also fine. Other arches and distros are also fine (although I have not checked them all). Ok so after using bkr job-logs and grepping every console long from the task, the assertions only ever happened on job 24669. I'm beginning to suspect a compiler bug. This only happens on RHEL 6.9, s390x, gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18) The crash happens in libsoup. In soup_server_listen_local() two GInetAddress objects are created. At the end of the function they are freed with calls to g_clear_object(). If libsoup is built with optimisations, g_clear_object() will be passed a NULL pointer. > (gdb) s > g_object_unref (_object=0x0) at gobject.c:3233 If libsoup is built without optimisations ( CFLAGS=-O0 ), g_clear_object() will be passed a valid pointer. (also with -O1) > (gdb) s > g_object_unref (_object=0xb8ad55e0) at gobject.c:3233 This can be reproduced by a trivial libsoup stub program, so it's not restraint related. The previous version of restraint (0.1.33) was built with gcc 4.4.4-13.el6 according to the koji logs. This build of restraint works fine. Created attachment 1457997 [details]
stub program
Filed bug 1600346 with a reproducer. Maybe the GCC team can shed some light on this. *** Bug 1611583 has been marked as a duplicate of this bug. *** We don't think this can be compiler related as we have always been building with the same gcc (RHEL6.0). This regressed in Restraint 0.1.33->0.1.35, during which period we upgraded glib. So most likely there is undefined behaviour in new glib, or in libsoup's use of glib that we are now triggering. Matt found a trivial workaround which is to compile with -O1 on RHEL6 S/390 so we should do that ASAP (for 0.1.37) so that at least we can ship working builds, while we further narrow down this problem. I've done some more digging. The crash seems to have been introduced with this commit: https://gitlab.gnome.org/GNOME/glib/commit/b1dd594a22e3499caafdeccd7fa223a032b9e177 Backing this patch out of the version of glib we are using (2.56.1) is pretty easy and it fixes the crash on RHEL6 s390 build. I'm still not sure what exactly is happening or why. My testing shows that it's compiling libsoup with -01 that fixes the issue. That would mean the macro g_clear_pointer() in glib/gmem.h is what is being invoked in this case. I've pushed a workaround for RHEL6. This should get the s390 builds going until we figure out what the real problem is. Restraint 0.1.37 has been released. |
Created attachment 1457649 [details] glib assertions Dan found restraint spewing a pile of glib assertion failures on beaker-devel. This happened on one of the s390 machines running restraint 0.1.35. Stopping and starting the service resulted in the same glib assertion output. Installing restraint 0.1.33 resulted in the test completing successfully.