=Comment: #0================================================= David L. Stevens <dlstevens.com> - 2008-04-22 19:15 EDT ---Problem Description--- 32-bit setsockopt calls fail on 64-bit kernel The default on 64-bit kernels is to compile 32-bit apps. The C compiler does not add padding between the leading u32 integer and a following structure, but the (64 bit) kernel version of the structure does have padding. So, the structure fields don't align and the socket option data is garbage. Contact Information = dlstevens.com, sri.com ---uname output--- Linux elm3a136 2.6.18-prep #2 SMP Mon Apr 21 14:11:21 PDT 2008 ppc64 ppc64 ppc64 GNU/Linux Machine Type = all ppc64 ---Debugger--- A debugger is not configured ---Steps to Reproduce--- all MCAST_* socket options fail when a 32-bit binary calls them on a 64-bit kernel, due to added padding in the structs they pass ---Kernel - PPC Component Data--- Stack trace output: no Oops output: no System Dump Info: The system is not configured to capture a system dump. *Additional Instructions for dlstevens.com, sri.com: -Attach sysctl -a output output to the bug. =Comment: #1================================================= David L. Stevens <dlstevens.com> - 2008-04-22 19:16 EDT please mirror this to Redhat =Comment: #3================================================= David L. Stevens <dlstevens.com> - 2008-04-28 16:16 EDT patch to add compat support for MCAST* functions
Created attachment 304104 [details] patch to add compat support for MCAST* functions
can I please get a link to the upstream conversation on this patch? It looks fine to me, but I've been told this was submitted/accepted upstream, but I don't see it in Linus' git tree, nor do I see any reference to the conversation about this patch in the archives. Thanks!
------- Comment From dlstevens.com 2008-04-30 11:54 EDT------- Neil, The discussion was on netdev, and the fix is in Dave Miller's git tree. Here's the thread: http://marc.info/?l=linux-netdev&m=120928365925019&w=2 There's another thread for the getsockopt portion. The patch I submitted there: http://marc.info/?l=linux-netdev&m=120936134900789&w=2 Is correct. Dave actually applied a a split-up version from Yoshifuji, which introduced a bug that was found and fixed today. The patch in the bugzilla is the code I tested with both setsockopt() and getsockopt() support on RHEL5.2, and the end result of the 3 patches in the threads above is the same code for mainline kernel.
ok, thank you for that. I'll backport/post this shortly.
just FYI, this is going to take me just a little bit. There already exists compat functions for x86_64 that these seem to have some level of conflict with, whcih are changing ABI. I'll need to work around those
------- Comment From dlstevens.com 2008-05-01 11:48 EDT------- Neil, can you go into more detail?
yeah, my intia concern regarding duplicate definitions was premature. It turns out it was a false positive. Our kabi check methodology in our build system computes the checksum of each exported symbol, expanding structure contents as it goes in the case of exported functions (via the c preprocessor). This is nominally sufficient, however, in some cases it can produce a false positive in the event that the visibility of a member structure changes. I.e. if you have the following: =============================== struct foo; struct bar { struct foo a; int b; }; int func(struct bar *a) { code; } EXPORT_SYMBOL(func); ================================= func will have a given hash value when the abi checksums are copmputed. If struct foo is defined in a header file to be : foo_def.h: struct foo { int a; int b; }; and you then include that file in the file that defines func, func's checksum will change because the forward declaration of struct foo is now visible, even though the function signature is actually still the same from an ABI standpoint. Thats exactly what happened in this case, by including compat.h in ip_sockglue.c, and ipv6_sockglue.c, the visibility of some parameter definitions changed, and as a result the checksum changed. We have the __GENKSYMS__ mechanism to deal with this, and I've already got it fixed in the build system. I'll be posting it shortly. Thanks!
------- Comment From dlstevens.com 2008-05-01 15:04 EDT------- Sounds good. Thanks for the detailed explanation!
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
in kernel-2.6.18-93.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
------- Comment From dlstevens.com 2008-05-22 18:22 EDT------- I retested with your -93 kernel on PPC64 -- works. Thanks!
*** Bug 447923 has been marked as a duplicate of this bug. ***
Partners, this bug should be fixed in the latest RHEL 5.3 Snapshot. We believe that you have some interest in its correct functionality, so we're making a friendly request to send us some testing feedback. If you have a chance to test it, please share with us your findings. If you have successfully VERIFIED the fix, please add PartnerVerified to the Bugzilla keywords, along with a description of the results. Thanks!
This report is already closed at IBM side. We validated this feature is integrated into 2.6.18-91.1.6. Sorry not to report to Red Hat side.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html