Bug 1260577 - problem with signals with enum typed arguments
problem with signals with enum typed arguments
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: glib2 (Show other bugs)
7.2
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Colin Walters
Desktop QE
:
Depends On:
Blocks: 1062301
  Show dependency treegraph
 
Reported: 2015-09-07 05:54 EDT by Thomas Haller
Modified: 2015-11-19 03:19 EST (History)
5 users (show)

See Also:
Fixed In Version: glib2-2.42.2-5.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-11-19 03:19:31 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
test program to reproduce issue (4.17 KB, text/plain)
2015-09-07 05:54 EDT, Thomas Haller
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
GNOME Desktop 754882 None None None Never

  None (edit)
Description Thomas Haller 2015-09-07 05:54:40 EDT
Created attachment 1070906 [details]
test program to reproduce issue

On s390x and ppc64 architectures, we have an issue with NetworkManager that I think is a bug in glib ( https://bugzilla.redhat.com/show_bug.cgi?id=1062301#c38 )


It seems that when you have a glib signal
  - with several arguments
  - with a GEnum typed argument
  - with several subscribers,
then the passed enum value becomes zero.


The attached test program fails on s390x machine:

glib2: 2.42.2-4.el7

gcc -Og -g `pkg-config --cflags --libs glib-2.0 gobject-2.0` ./test-rh1062301.c 




Using host libthread_db library "/lib64/libthread_db.so.1".
** Message: (_signal_cb1) signal accepted
**
ERROR:./test-rh1062301.c:109:_signal_cb1: assertion failed (test_flags == *expected): (0 == 2)

Program received signal SIGABRT, Aborted.
0x000003fffdcbf8e0 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
Missing separate debuginfos, use: debuginfo-install libffi-3.0.13-16.el7.s390x
(gdb) bt
#0  0x000003fffdcbf8e0 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x000003fffdcc1170 in __GI_abort () at abort.c:90
#2  0x000003fffdea115e in g_assertion_message (domain=0x0, file=0x80001378 "./test-rh1062301.c", line=<optimized out>, func=<optimized out>, message=<optimized out>) at gtestutils.c:2292
#3  0x000003fffdea154a in g_assertion_message_cmpnum (domain=0x0, file=0x80001378 "./test-rh1062301.c", line=<optimized out>, func=0x80001428 <__FUNCTION__.11051> "_signal_cb1", expr=<optimized out>, arg1=<optimized out>, cmp=0x800013a4 "==", arg2=<optimized out>, numtype=105 'i') at gtestutils.c:2340
#4  0x0000000080000f4a in _signal_cb1 (obj=<optimized out>, dummy1=<optimized out>, test_flags=<optimized out>, dummy2=<optimized out>, expected=<optimized out>) at ./test-rh1062301.c:109
#5  0x000003fffdc618b4 in ffi_call_SYSV () at /lib64/libffi.so.6
#6  0x000003fffdc6217e in ffi_call () at /lib64/libffi.so.6
#11 0x000003fffdfa3a62 in <emit signal ??? on instance 0x8000bc00 [MamanBar]> (instance=instance@entry=0x8000bc00, signal_id=<optimized out>, detail=detail@entry=0) at gsignal.c:3365
    #7  0x000003fffdf882e4 in g_cclosure_marshal_generic (closure=0x8000bb40, return_gvalue=0x0, n_param_values=<optimized out>, param_values=<optimized out>, invocation_hint=<optimized out>, marshal_data=0x0) at gclosure.c:1448
    #8  0x000003fffdf87984 in g_closure_invoke (closure=0x8000bb40, return_value=0x0, n_param_values=<optimized out>, param_values=0x3ffffffebb8, invocation_hint=0x3ffffffead8) at gclosure.c:768
    #9  0x000003fffdf9b62e in signal_emit_unlocked_R (node=node@entry=0x8000f310, detail=detail@entry=0, instance=instance@entry=0x8000bc00, emission_return=emission_return@entry=0x0, instance_and_params=instance_and_params@entry=0x3ffffffebb8) at gsignal.c:3553
    #10 0x000003fffdfa37bc in g_signal_emit_valist (instance=<optimized out>, signal_id=<optimized out>, detail=<optimized out>, var_args=<optimized out>) at gsignal.c:3309
#12 0x0000000080000fa2 in _signal_emit (obj=obj@entry=0x8000bc00 [MamanBar], flags=flags@entry=TEST_FLAG_2) at ./test-rh1062301.c:123
#13 0x000000008000124a in main () at ./test-rh1062301.c:147
Comment 2 Thomas Haller 2015-09-07 06:05:26 EDT
(In reply to Thomas Haller from comment #0)

>   - with a GEnum typed argument

GFlags actually... didn't test with GEnum.
Comment 3 Lubomir Rintel 2015-09-07 11:43:02 EDT
This indeed looks like a glib or ffi bug. Could we do with something less invasive to work around it? Something along the lines of

diff --git a/src/nm-config.c b/src/nm-config.c
index d805948..c782616 100644
--- a/src/nm-config.c
+++ b/src/nm-config.c
@@ -1890,7 +1890,13 @@ nm_config_class_init (NMConfigClass *config_class)
                          G_SIGNAL_RUN_FIRST,
                          G_STRUCT_OFFSET (NMConfigClass, config_changed),
                          NULL, NULL, NULL,
-                         G_TYPE_NONE, 3, NM_TYPE_CONFIG_DATA, NM_TYPE_CONFIG_CHANGE_FLAGS, NM_TYPE_CONFIG_DATA);
+                         G_TYPE_NONE, 3, NM_TYPE_CONFIG_DATA,
+                         /* https://bugzilla.redhat.com/show_bug.cgi?id=1260577 */
+                         /* NM_TYPE_CONFIG_CHANGE_FLAGS, */
+                         G_TYPE_UINT,
+                         NM_TYPE_CONFIG_DATA);
+
+       G_STATIC_ASSERT (sizeof (guint) == sizeof (NMConfigChangeFlags));
 }
 
 static void

or maybe even ifdef'd with defined(__s390x__) || defined (__PPC64__) or whichever platforms are known broken; and maybe a test to make sure we didn't leave out any platform?
Comment 4 Colin Walters 2015-09-08 10:09:25 EDT
See https://bugzilla.gnome.org/show_bug.cgi?id=686662
Comment 5 Thomas Haller 2015-09-08 10:25:09 EDT
(In reply to Colin Walters from comment #4)
> See https://bugzilla.gnome.org/show_bug.cgi?id=686662

I don't think this is related.

Note that for both NetworkManager and the attached reproducer, the enum is of 4 bytes (same as gint/guint).

Also note, that the test doesn't fail, if only one handler is connected to the signal.
Comment 6 Colin Walters 2015-09-08 10:40:52 EDT
You could determine if it's FFI by using a generated marshaller instead of g_cclosure_marshal_generic.

Hardware watchpoints might help figure out when the variable is being changed?  Or alternatively, just add some debugging printfs into libgobject?
Comment 7 Thomas Haller 2015-09-09 06:03:10 EDT
(In reply to Colin Walters from comment #6)
> You could determine if it's FFI by using a generated marshaller instead of
> g_cclosure_marshal_generic.
> 
> Hardware watchpoints might help figure out when the variable is being
> changed?  Or alternatively, just add some debugging printfs into libgobject?

The test-program also fails with glib from upstream master.
Also, the failure is for GEnum too (not only GFlags).


While gclosure.c has some special handling for GEnum types, GFlags are treated just like a guint. As the error doesn't happen with G_TYPE_UINT/G_TYPE_INT as signal type, that hints that the error is not in libffi.

But since gclosure.c doesn't handly GFlags differently from guint, it seems to hint that the error isn't in gclosure either.


Anyway, I don't know...
Comment 8 Thomas Haller 2015-09-09 06:45:35 EDT
(In reply to Lubomir Rintel from comment #3)
> This indeed looks like a glib or ffi bug. Could we do with something less
> invasive to work around it? Something along the lines of

added this workaround to NM: http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=e7d66f1df61ebdc7652ba34a4e1baddcc86b9e26
Comment 9 Colin Walters 2015-09-16 15:30:58 EDT
Fixed upstream now: https://bugzilla.gnome.org/show_bug.cgi?id=754882

I can do the backport as soon as the flags come in.
Comment 11 Colin Walters 2015-09-29 11:57:55 EDT
This is a low risk fix.  Can someone please set the blocker flag?
Comment 16 Martin Simon 2015-10-15 09:38:48 EDT
I've verified this bug according the reproducer in comment 0 on all archs: x86_64, s390x, ppc64 and ppc64le. On all these archs the reproducer passed.

The reproducer has been added into our standard glib2 automation testset.
Comment 17 errata-xmlrpc 2015-11-19 03:19:31 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2116.html

Note You need to log in before you can comment on or make changes to this bug.