Bug 864196

Summary: GClosure C marshaller does not handle correctly unsigned enum values on ppc64
Product: [Fedora] Fedora Reporter: Gustavo Luiz Duarte <gustavold>
Component: glib2Assignee: Colin Walters <walters>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 18CC: ajax, bbaude, gustavold, karsten, mclasen
Target Milestone: ---   
Target Release: ---   
Hardware: ppc64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-01-20 15:51:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gustavo Luiz Duarte 2012-10-08 20:21:59 UTC
Description of problem:

The glib's test case for unsigned enums fails on ppc64 with the following error message:
GLib-GObject:ERROR:signals.c:415:on_generic_marshaller_2: assertion failed (v_uenum == TEST_UNSIGNED_ENUM_BAR): (-2147483648 == 2147483648)

This is caused by glib's incorrect handling of enums. GCC decides what type an enum is on compile time:
An enum *with* negative values is defined as *signed* int.
An enum with *no* negative values is defined as *unsigned* int.

Glib's C marshaller has the following code excerpt in the function va_to_ffi_type():

    case G_TYPE_ENUM:
      rettype = &ffi_type_sint;
      storage->_gint = va_arg (*va, gint);
      break;

Since Glib has no differentiation between signed and unsigned enums, at this point it is impossible to know what  is the type of the parameter and glib just incorrectly assumes it is a signed int.
Using va_arg() with a mismatching type has undefined behavior. See:
http://pubs.opengroup.org/onlinepubs/009695399/basedefs/stdarg.h.html


Version-Release number of selected component (if applicable):
glib2-2.34.0-2.fc18

How reproducible:
Always


Steps to Reproduce:
1. glib-2.34.0/gobject/
2. make check
3.
  
Actual results:

[root@localhost glib-2.34.0]# ./gobject/tests/.libs/lt-signals
/gobject/signals/all-types: OK
/gobject/signals/variant: OK
/gobject/signals/generic-marshaller-1: OK
/gobject/signals/generic-marshaller-2: **
GLib-GObject:ERROR:signals.c:415:on_generic_marshaller_2: assertion failed (v_uenum == TEST_UNSIGNED_ENUM_BAR): (-2147483648 == 2147483648)
Aborted



(gdb) bt
#0  0x0000008054f78b10 in __GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:63
#1  0x0000008054f7ab88 in __GI_abort () at abort.c:90
#2  0x00000fffb7e0a6e4 in g_assertion_message (domain=<optimized out>, file=0x10007128 "signals.c", line=<optimized out>, func=<optimized out>, 
    message=<optimized out>) at gtestutils.c:1877
#3  0x00000fffb7e0ac60 in g_assertion_message_cmpnum (domain=0x10007118 "GLib-GObject", file=0x10007128 "signals.c", line=<optimized out>, 
    func=0x10006e90 <__PRETTY_FUNCTION__.11540> "on_generic_marshaller_2", expr=<optimized out>, arg1=<optimized out>, cmp=<optimized out>, 
    arg2=<optimized out>, numtype=105 'i') at gtestutils.c:1911
#4  0x0000000010001c40 in on_generic_marshaller_2 (obj=<optimized out>, v_int1=<optimized out>, v_enum=<optimized out>, v_int2=<optimized out>, 
    v_uenum=<optimized out>, v_int3=<optimized out>) at signals.c:413
#5  0x0000008055d07f7c in .ffi_call_LINUX64 () at ../src/powerpc/linux64.S:103
#6  0x0000008055d0756c in ffi_call (cif=<error reading variable: value has been optimized out>, fn=<error reading variable: value has been optimized out>, 
    rvalue=<optimized out>, avalue=0xfffffffdf20) at ../src/powerpc/ffi.c:917
#7  0x00000fffb7f6cf24 in g_cclosure_marshal_generic_va (closure=<optimized out>, return_value=0x0, instance=0x10047040, args_list=<optimized out>, 
    marshal_data=0x10020108 <on_generic_marshaller_2>, n_params=<optimized out>, param_types=0x10040af0) at gclosure.c:1550
#8  0x00000fffb7f6be28 in _g_closure_invoke_va (closure=0x10051810, return_value=0x0, instance=0x10047040, args=0xfffffffe5d0 "", n_params=<optimized out>, 
    param_types=0x10040af0) at gclosure.c:840
#9  0x00000fffb7f8e334 in g_signal_emit_valist (instance=0x10047040, signal_id=<optimized out>, detail=<optimized out>, var_args=0xfffffffe5d0 "")
    at gsignal.c:3211
#10 0x00000fffb7f8f218 in g_signal_emit_by_name (instance=0x10047040, detailed_signal=0x100072d0 "generic-marshaller-2") at gsignal.c:3393
#11 0x00000000100036b4 in test_generic_marshaller_signal_2 () at signals.c:427
#12 0x00000fffb7e0affc in test_case_run (tc=0x1004a890) at gtestutils.c:1679
#13 g_test_run_suite_internal (suite=suite@entry=0x10046d20, path=<optimized out>) at gtestutils.c:1732
#14 0x00000fffb7e0b1d8 in g_test_run_suite_internal (suite=suite@entry=0x10046d00, path=<optimized out>) at gtestutils.c:1743
#15 0x00000fffb7e0b1d8 in g_test_run_suite_internal (suite=<optimized out>, path=<optimized out>) at gtestutils.c:1743
#16 0x00000fffb7e0b6a0 in g_test_run_suite (suite=0x10046c20) at gtestutils.c:1796
#17 0x00000fffb7e0b758 in g_test_run () at gtestutils.c:1308
#18 0x000000001000186c in main (argc=1, argv=0xfffffffefe8) at signals.c:823



Expected results:


Additional info:

Comment 1 Matthias Clasen 2012-10-16 13:25:24 UTC
Colin, can you take a look at this ? I know you've looked at problems with enums on bigendian platforms before.

Comment 2 Colin Walters 2012-10-16 15:16:29 UTC
See https://bugzilla.gnome.org/show_bug.cgi?id=659881

Did this re-regress?  Or is this a case that wasn't fixed by that patch?

> Since Glib has no differentiation between signed and unsigned enums, at this
> point it is impossible to know what  is the type of the parameter and glib just
> incorrectly assumes it is a signed int.

Right...the GLib enum handling is a major historical mistake.  We may eventually have to either:

1) Force callers to make enums signed
2) Force callers to explictly specify G_TYPE_ENUM_INT, G_TYPE_ENUM_UINT64, etc.

Both are pretty awful of course =/

But can you comment on the upstream bug above?  Does http://git.gnome.org/browse/glib/commit/?id=8e82225aedf81ea8a33deb3eb27a8878cd606521 actually help at all?

Comment 3 Gustavo Luiz Duarte 2012-10-16 22:39:01 UTC
(In reply to comment #2)
> See https://bugzilla.gnome.org/show_bug.cgi?id=659881
> 
> Did this re-regress?  Or is this a case that wasn't fixed by that patch?

It is a different issue. The issue addressed in that upstream bug is related to GValue encapsulated enum. GValue stores enums in a long so it can fit both signed or unsigned enums.

The issue here is when you use the varargs variant of the C marshaller. In this case, the parameter is always stored in a signed int using va_arg(), which has 2 issues:
1. va_arg() should not be used with mismatching types because it has undefined behavior.
2. A signed int cannot correctly store all possible values of an unsigned enum.
Blindly copying the 32 bits back and forth does not work because parameter passing on ppc64 involves sign extension into 64 bits registers, which renders totally different values for ((signed int)0x80000000) and ((unsigned int)0x80000000).

> > Since Glib has no differentiation between signed and unsigned enums, at this
> > point it is impossible to know what  is the type of the parameter and glib just
> > incorrectly assumes it is a signed int.
> 
> Right...the GLib enum handling is a major historical mistake.  We may
> eventually have to either:
> 
> 1) Force callers to make enums signed
> 2) Force callers to explictly specify G_TYPE_ENUM_INT, G_TYPE_ENUM_UINT64,
> etc.
> 
> Both are pretty awful of course =/

#1 doesn't seem practical specially for the GClosure since the caller may not have control over the callee function signature.

Can you comment on the effort needed to get #2 implemented? Where could we start?

If there is no reasonable short term solution, could we at least call enums not supported by the varargs variant of the C marshaller? Or a warning indicating the GValue variant as a substitute? (Assuming the GValue variant works fine for signed and unsigned enums).

> But can you comment on the upstream bug above?  Does
> http://git.gnome.org/browse/glib/commit/
> ?id=8e82225aedf81ea8a33deb3eb27a8878cd606521 actually help at all?

As for my comment above, it is a different issue.

Comment 4 Colin Walters 2012-10-17 16:50:07 UTC
If it's only the varargs marshaller, then this would be a regression introduced by http://git.gnome.org/browse/glib/commit/?h=a3e91088ce40ed201cfe1514b0adb252394027b1

I had thought that 8e8222 had actually been tested on ppc64 and confirmed to work, which would point more strongly to a3e9108 introducing the regression.

What I don't understand though is how it worked even before either of them (or maybe it didn't?).  In order to get into the GValue slow-path in the first place, we had to collect it from the va_list somehow too.  And looking at the source, genums.c has collect_format "i", which means G_VALUE_COLLECT_INIT is just going to call va_arg ((var_args), gint).

The only thing that would make sense to me here is that while it's true that using va_arg (l, int) on an unsigned int is undefined behavior, what's actually getting us into trouble is trying to *store* that result in an int.  If we were loading the whole thing into a long, we wouldn't hit the 32-bit sign extension issue?  I dunno...

Anyways, can you try this trivial patch, which should disable the fastpath, and we'll see whether the generic marshaller works?


diff --git a/gobject/tests/signals.c b/gobject/tests/signals.c
index b8cf26a..aee7ba2 100644
--- a/gobject/tests/signals.c
+++ b/gobject/tests/signals.c
@@ -128,7 +128,7 @@ test_class_init (TestClass *klass)
                 G_TYPE_CHAR, G_TYPE_UCHAR, G_TYPE_INT, G_TYPE_LONG, G_TYPE_POINTER, G_TYPE_DOUBLE, G_TYPE_FLOAT);
   g_signal_new ("generic-marshaller-2",
                 G_TYPE_FROM_CLASS (klass),
-                G_SIGNAL_RUN_LAST,
+                G_SIGNAL_RUN_LAST | G_SIGNAL_NO_RECURSE,
                 0,
                 NULL, NULL,
                 NULL,

Comment 5 Gustavo Luiz Duarte 2012-10-18 18:10:10 UTC
I tried your patch and it still fails the same way. I can confirm it is using the generic marshaller (not the varargs variant) from the gdb backtrace. So my assumption was wrong.

I can try to check 8e8222 and a3e9108 revisions to see if it ever worked.

Comment 6 Colin Walters 2012-10-18 18:56:48 UTC
(In reply to comment #5)
> I tried your patch and it still fails the same way. I can confirm it is
> using the generic marshaller (not the varargs variant) from the gdb
> backtrace. So my assumption was wrong.

That's reassuring in that it supports my reading of the code.

> I can try to check 8e8222 and a3e9108 revisions to see if it ever worked.

That'd be appreciated.  Also, is there a way I could get login access to a ppc64 machine with userspace sufficient to compile glib?

Comment 7 Colin Walters 2012-10-18 19:26:30 UTC
But if the bottom line here is that we're provoking undefined behavior using va_arg (int) on an unsigned int (enum), then we're in for some serious pain.

I don't have any data, but I'd hazard a guess that upwards of 80% of the enums in GNOME are unsigned.  It's the default, after all.

Let me refine the earlier options more precisely:

1) Change the va_arg () calls to use (unsigned int).  Add G_TYPE_ENUM_INT, G_TYPE_ENUM_UINT64. 

 Pro: Requires patching fewer GLib consumers.
 Con: Very subtle semantic change that would potentially break existing code.
 I can easily imagine us having to find and fix the modules using signed
 enums, since they'd break too on ppc64 right?

2) Add G_TYPE_ENUM_UINT, G_TYPE_ENUM_UINT64

 Pro: Won't break anything
 Con: Requires patching a *lot* of GLib-using modules.  In theory, they'd only
 have to do that to get ppc64 support though.

Comment 8 Gustavo Luiz Duarte 2012-10-18 19:32:54 UTC
Colin, could you join #fedora-ppc on freenode? We can get you access to a ppc64 machine.

Comment 9 Colin Walters 2012-10-18 21:42:41 UTC
Some new test code here:

https://github.com/cgwalters/ppcsignal

So...we discovered that optimization influences behavior here.  Concretely, the above test code works with -O0, and fails with -O2.  This dependence on optimization flags should explain some of the confusion above.  I bet at least some developers have been building with -O0, so you don't have to stare at <optimized out> in gdb and shake your fist.

Comment 10 Colin Walters 2012-10-18 21:55:23 UTC
From quick discussion with Gustavo - it's not sounding like there is an easy fix here.  Compiling gsignal.c or whatever with -O0 is pretty unappealing.  So that leaves us with the options from https://bugzilla.redhat.com/show_bug.cgi?id=864196#c7

Comment 11 Gustavo Luiz Duarte 2012-10-22 20:18:27 UTC
Our main issue here seems to be the use of varargs. I see that g_signal_emitv() does not use varargs, instead it uses GValue*, what is much more desirable because GValue stores enums in a long.
Changing the users of g_signal_emit_by_name() to use instead g_signal_emitv() would be much more easy than changing every reference to G_TYPE_ENUM.
I haven't tested yet, so varargs could be being used deeper in the stack.

The same counts for g_cclosure_marshal_generic_va() vs. g_cclosure_marshal_generic().



Regarding the options for a long term fix, I would add:

3) Document G_TYPE_ENUM as deprecated. Add G_TYPE_ENUM_UINT, G_TYPE_ENUM_SINT, G_TYPE_ENUM_UINT64, G_TYPE_ENUM_SINT64.

 Pro: Easier to find broken code. Smooth transition through deprecation.
 Con: Requires patching almost every glib-using module.

Also, I don't see this as a ppc64 only, as gcc just happen to work as "expected" and could break anytime after a compiler update.

Comment 12 Colin Walters 2012-10-22 20:46:25 UTC
(In reply to comment #11)
> Our main issue here seems to be the use of varargs. I see that
> g_signal_emitv() does not use varargs, instead it uses GValue*, what is much
> more desirable because GValue stores enums in a long.
> Changing the users of g_signal_emit_by_name() to use instead
> g_signal_emitv() would be much more easy than changing every reference to
> G_TYPE_ENUM.

Yes...though it's not ideal since you'd only have to do it for signals that use enums, and thus people would forget.  Also, packing stuff into GValues by hand is tedious.

We could find and fix some critical-path type users, but it's going to be a trap that we repeatedly fall into until we do the full deprecation of G_TYPE_ENUM =(

Actually, step 0 of this bug is really to find which G_TYPE_ENUM users are hitting this bug.  It almost certainly does make sense to fix some of them and not block on the full solution.

> I haven't tested yet, so varargs could be being used deeper in the stack.

Well, of course we need to unpack the GValue when invoking handlers.  So at present, the way this works is simply by assuming that we only have a 32 bit enum, even though the actual storage is long in the GValue.
 
> Also, I don't see this as a ppc64 only, as gcc just happen to work as
> "expected" and could break anytime after a compiler update.

Technically true, though in practice I think we're going to get away with the status quo on x86/x86_64.  On the other hand, I can easily imagine us hitting the same issues on AArch64.

Comment 13 Colin Walters 2012-10-22 21:10:21 UTC
I filed https://bugzilla.gnome.org/show_bug.cgi?id=686666 to track implementation of new enum types in GLib.

Comment 14 Colin Walters 2012-11-08 19:07:18 UTC
Correct bug link: https://bugzilla.gnome.org/show_bug.cgi?id=686662

Executive summary: We found no real-world users that were equivalent to this test case, so decided to just disable the test case for now.