Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1599550

Summary: Restraint crashes on startup on RHEL6 S/390
Product: [Retired] Restraint Reporter: Matt Tyson 🤬 <mtyson>
Component: generalAssignee: beaker-dev-list
Status: CLOSED NEXTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 0.1.35CC: asavkov, azelinka, bpeck, makopec, mastyk
Target Milestone: 0.1.37Keywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-01-11 09:53:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
glib assertions
none
stub program none

Description Matt Tyson 🤬 2018-07-10 04:15:31 UTC
Created attachment 1457649 [details]
glib assertions

Dan found restraint spewing a pile of glib assertion failures on beaker-devel.

This happened on one of the s390 machines running restraint 0.1.35.

Stopping and starting the service resulted in the same glib assertion output.

Installing restraint 0.1.33 resulted in the test completing successfully.

Comment 2 Matt Tyson 🤬 2018-07-10 05:42:27 UTC
After digging through some logs from jobs running on other arches, it seems that s390 / RHEL 6.9 was the only one with the error.

s390 RHEL 7.2 was fine.  s390/ RHEL 5.11 was also fine.  Other arches and distros are also fine (although I have not checked them all).

Comment 3 Matt Tyson 🤬 2018-07-10 06:15:04 UTC
Ok so after using bkr job-logs and grepping every console long from the task, the assertions only ever happened on job 24669.

Comment 4 Matt Tyson 🤬 2018-07-11 05:45:40 UTC
I'm beginning to suspect a compiler bug.

This only happens on RHEL 6.9, s390x, gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18)

The crash happens in libsoup.

In soup_server_listen_local() two GInetAddress objects are created.  At the end of the function they are freed with calls to g_clear_object().

If libsoup is built with optimisations, g_clear_object() will be passed a NULL pointer.

> (gdb) s
> g_object_unref (_object=0x0) at gobject.c:3233

If libsoup is built without optimisations ( CFLAGS=-O0 ), g_clear_object() will be passed a valid pointer. (also with -O1)

> (gdb) s
> g_object_unref (_object=0xb8ad55e0) at gobject.c:3233

This can be reproduced by a trivial libsoup stub program, so it's not restraint related.

The previous version of restraint (0.1.33) was built with gcc 4.4.4-13.el6 according to the koji logs.  This build of restraint works fine.

Comment 5 Matt Tyson 🤬 2018-07-11 05:57:33 UTC
Created attachment 1457997 [details]
stub program

Comment 6 Matt Tyson 🤬 2018-07-12 03:58:20 UTC
Filed bug 1600346 with a reproducer.  Maybe the GCC team can shed some light on this.

Comment 7 Roman Joost 2018-08-02 23:44:37 UTC
*** Bug 1611583 has been marked as a duplicate of this bug. ***

Comment 8 Dan Callaghan 2018-08-30 03:13:01 UTC
We don't think this can be compiler related as we have always been building with the same gcc (RHEL6.0).

This regressed in Restraint 0.1.33->0.1.35, during which period we upgraded glib. So most likely there is undefined behaviour in new glib, or in libsoup's use of glib that we are now triggering.

Matt found a trivial workaround which is to compile with -O1 on RHEL6 S/390 so we should do that ASAP (for 0.1.37) so that at least we can ship working builds, while we further narrow down this problem.

Comment 9 Matt Tyson 🤬 2018-08-30 06:16:01 UTC
I've done some more digging.  The crash seems to have been introduced with this commit: https://gitlab.gnome.org/GNOME/glib/commit/b1dd594a22e3499caafdeccd7fa223a032b9e177

Backing this patch out of the version of glib we are using (2.56.1) is pretty easy and it fixes the crash on RHEL6 s390 build.

I'm still not sure what exactly is happening or why.

My testing shows that it's compiling libsoup with -01 that fixes the issue.  That would mean the macro g_clear_pointer() in glib/gmem.h is what is being invoked in this case.

Comment 10 Matt Tyson 🤬 2018-09-04 02:56:27 UTC
I've pushed a workaround for RHEL6.  This should get the s390 builds going until we figure out what the real problem is.

Comment 12 Martin Styk 2019-01-28 10:09:57 UTC
Restraint 0.1.37 has been released.