Bug 864196
Summary: | GClosure C marshaller does not handle correctly unsigned enum values on ppc64 | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Gustavo Luiz Duarte <gustavold> |
Component: | glib2 | Assignee: | Colin Walters <walters> |
Status: | CLOSED WONTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 18 | CC: | ajax, bbaude, gustavold, karsten, mclasen |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | ppc64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-01-20 15:51:06 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Gustavo Luiz Duarte
2012-10-08 20:21:59 UTC
Colin, can you take a look at this ? I know you've looked at problems with enums on bigendian platforms before. See https://bugzilla.gnome.org/show_bug.cgi?id=659881 Did this re-regress? Or is this a case that wasn't fixed by that patch? > Since Glib has no differentiation between signed and unsigned enums, at this > point it is impossible to know what is the type of the parameter and glib just > incorrectly assumes it is a signed int. Right...the GLib enum handling is a major historical mistake. We may eventually have to either: 1) Force callers to make enums signed 2) Force callers to explictly specify G_TYPE_ENUM_INT, G_TYPE_ENUM_UINT64, etc. Both are pretty awful of course =/ But can you comment on the upstream bug above? Does http://git.gnome.org/browse/glib/commit/?id=8e82225aedf81ea8a33deb3eb27a8878cd606521 actually help at all? (In reply to comment #2) > See https://bugzilla.gnome.org/show_bug.cgi?id=659881 > > Did this re-regress? Or is this a case that wasn't fixed by that patch? It is a different issue. The issue addressed in that upstream bug is related to GValue encapsulated enum. GValue stores enums in a long so it can fit both signed or unsigned enums. The issue here is when you use the varargs variant of the C marshaller. In this case, the parameter is always stored in a signed int using va_arg(), which has 2 issues: 1. va_arg() should not be used with mismatching types because it has undefined behavior. 2. A signed int cannot correctly store all possible values of an unsigned enum. Blindly copying the 32 bits back and forth does not work because parameter passing on ppc64 involves sign extension into 64 bits registers, which renders totally different values for ((signed int)0x80000000) and ((unsigned int)0x80000000). > > Since Glib has no differentiation between signed and unsigned enums, at this > > point it is impossible to know what is the type of the parameter and glib just > > incorrectly assumes it is a signed int. > > Right...the GLib enum handling is a major historical mistake. We may > eventually have to either: > > 1) Force callers to make enums signed > 2) Force callers to explictly specify G_TYPE_ENUM_INT, G_TYPE_ENUM_UINT64, > etc. > > Both are pretty awful of course =/ #1 doesn't seem practical specially for the GClosure since the caller may not have control over the callee function signature. Can you comment on the effort needed to get #2 implemented? Where could we start? If there is no reasonable short term solution, could we at least call enums not supported by the varargs variant of the C marshaller? Or a warning indicating the GValue variant as a substitute? (Assuming the GValue variant works fine for signed and unsigned enums). > But can you comment on the upstream bug above? Does > http://git.gnome.org/browse/glib/commit/ > ?id=8e82225aedf81ea8a33deb3eb27a8878cd606521 actually help at all? As for my comment above, it is a different issue. If it's only the varargs marshaller, then this would be a regression introduced by http://git.gnome.org/browse/glib/commit/?h=a3e91088ce40ed201cfe1514b0adb252394027b1 I had thought that 8e8222 had actually been tested on ppc64 and confirmed to work, which would point more strongly to a3e9108 introducing the regression. What I don't understand though is how it worked even before either of them (or maybe it didn't?). In order to get into the GValue slow-path in the first place, we had to collect it from the va_list somehow too. And looking at the source, genums.c has collect_format "i", which means G_VALUE_COLLECT_INIT is just going to call va_arg ((var_args), gint). The only thing that would make sense to me here is that while it's true that using va_arg (l, int) on an unsigned int is undefined behavior, what's actually getting us into trouble is trying to *store* that result in an int. If we were loading the whole thing into a long, we wouldn't hit the 32-bit sign extension issue? I dunno... Anyways, can you try this trivial patch, which should disable the fastpath, and we'll see whether the generic marshaller works? diff --git a/gobject/tests/signals.c b/gobject/tests/signals.c index b8cf26a..aee7ba2 100644 --- a/gobject/tests/signals.c +++ b/gobject/tests/signals.c @@ -128,7 +128,7 @@ test_class_init (TestClass *klass) G_TYPE_CHAR, G_TYPE_UCHAR, G_TYPE_INT, G_TYPE_LONG, G_TYPE_POINTER, G_TYPE_DOUBLE, G_TYPE_FLOAT); g_signal_new ("generic-marshaller-2", G_TYPE_FROM_CLASS (klass), - G_SIGNAL_RUN_LAST, + G_SIGNAL_RUN_LAST | G_SIGNAL_NO_RECURSE, 0, NULL, NULL, NULL, I tried your patch and it still fails the same way. I can confirm it is using the generic marshaller (not the varargs variant) from the gdb backtrace. So my assumption was wrong. I can try to check 8e8222 and a3e9108 revisions to see if it ever worked. (In reply to comment #5) > I tried your patch and it still fails the same way. I can confirm it is > using the generic marshaller (not the varargs variant) from the gdb > backtrace. So my assumption was wrong. That's reassuring in that it supports my reading of the code. > I can try to check 8e8222 and a3e9108 revisions to see if it ever worked. That'd be appreciated. Also, is there a way I could get login access to a ppc64 machine with userspace sufficient to compile glib? But if the bottom line here is that we're provoking undefined behavior using va_arg (int) on an unsigned int (enum), then we're in for some serious pain. I don't have any data, but I'd hazard a guess that upwards of 80% of the enums in GNOME are unsigned. It's the default, after all. Let me refine the earlier options more precisely: 1) Change the va_arg () calls to use (unsigned int). Add G_TYPE_ENUM_INT, G_TYPE_ENUM_UINT64. Pro: Requires patching fewer GLib consumers. Con: Very subtle semantic change that would potentially break existing code. I can easily imagine us having to find and fix the modules using signed enums, since they'd break too on ppc64 right? 2) Add G_TYPE_ENUM_UINT, G_TYPE_ENUM_UINT64 Pro: Won't break anything Con: Requires patching a *lot* of GLib-using modules. In theory, they'd only have to do that to get ppc64 support though. Colin, could you join #fedora-ppc on freenode? We can get you access to a ppc64 machine. Some new test code here: https://github.com/cgwalters/ppcsignal So...we discovered that optimization influences behavior here. Concretely, the above test code works with -O0, and fails with -O2. This dependence on optimization flags should explain some of the confusion above. I bet at least some developers have been building with -O0, so you don't have to stare at <optimized out> in gdb and shake your fist. From quick discussion with Gustavo - it's not sounding like there is an easy fix here. Compiling gsignal.c or whatever with -O0 is pretty unappealing. So that leaves us with the options from https://bugzilla.redhat.com/show_bug.cgi?id=864196#c7 Our main issue here seems to be the use of varargs. I see that g_signal_emitv() does not use varargs, instead it uses GValue*, what is much more desirable because GValue stores enums in a long. Changing the users of g_signal_emit_by_name() to use instead g_signal_emitv() would be much more easy than changing every reference to G_TYPE_ENUM. I haven't tested yet, so varargs could be being used deeper in the stack. The same counts for g_cclosure_marshal_generic_va() vs. g_cclosure_marshal_generic(). Regarding the options for a long term fix, I would add: 3) Document G_TYPE_ENUM as deprecated. Add G_TYPE_ENUM_UINT, G_TYPE_ENUM_SINT, G_TYPE_ENUM_UINT64, G_TYPE_ENUM_SINT64. Pro: Easier to find broken code. Smooth transition through deprecation. Con: Requires patching almost every glib-using module. Also, I don't see this as a ppc64 only, as gcc just happen to work as "expected" and could break anytime after a compiler update. (In reply to comment #11) > Our main issue here seems to be the use of varargs. I see that > g_signal_emitv() does not use varargs, instead it uses GValue*, what is much > more desirable because GValue stores enums in a long. > Changing the users of g_signal_emit_by_name() to use instead > g_signal_emitv() would be much more easy than changing every reference to > G_TYPE_ENUM. Yes...though it's not ideal since you'd only have to do it for signals that use enums, and thus people would forget. Also, packing stuff into GValues by hand is tedious. We could find and fix some critical-path type users, but it's going to be a trap that we repeatedly fall into until we do the full deprecation of G_TYPE_ENUM =( Actually, step 0 of this bug is really to find which G_TYPE_ENUM users are hitting this bug. It almost certainly does make sense to fix some of them and not block on the full solution. > I haven't tested yet, so varargs could be being used deeper in the stack. Well, of course we need to unpack the GValue when invoking handlers. So at present, the way this works is simply by assuming that we only have a 32 bit enum, even though the actual storage is long in the GValue. > Also, I don't see this as a ppc64 only, as gcc just happen to work as > "expected" and could break anytime after a compiler update. Technically true, though in practice I think we're going to get away with the status quo on x86/x86_64. On the other hand, I can easily imagine us hitting the same issues on AArch64. I filed https://bugzilla.gnome.org/show_bug.cgi?id=686666 to track implementation of new enum types in GLib. Correct bug link: https://bugzilla.gnome.org/show_bug.cgi?id=686662 Executive summary: We found no real-world users that were equivalent to this test case, so decided to just disable the test case for now. |