Description of problem: For a 32-bit binary running on a ppc64 kernel, calling setsockopt for the IPV6_V6ONLY option always enables the option regardless of the value passed. Version-Release number of selected component (if applicable): Linux pseries.cambridge.redhat.com 2.6.7-1.451.2.3 #1 SMP Wed Jul 14 17:48:42 EDT 2004 ppc64 ppc64 ppc64 GNU/Linux How reproducible: always Actual results: [root@pseries root]# gcc ./bind.c [root@pseries root]# ./a.out Setting V6ONLY to 0 V6ONLY setting is now: 1 Expected results: as per -m64 build or any other kernel: [root@pseries root]# gcc -m64 ./bind.c [root@pseries root]# ./a.out Setting V6ONLY to 0 V6ONLY setting is now: 0 Additional info:
Created attachment 102147 [details] repro case
(net effect of this bug is that httpd doesn't accept connections to IPv4 addresses in the default config on pseries in the current RHEL4 tree)
This works perfectly fine for a 32-bit binary running on sparc64, so this looks entirely like some kind of ppc64 specific problem with it's compat layer handling. This should be thus assigned to a ppc64 expert.
Interesting you mention sparc64, we had a report upstream with exactly the same symptoms as this problem from a 2.6.7 user on SPARC. Maybe you can try the whole test case with the #if 0 -> #if 1 to check it really does accept connections to IPv4 addresses.
Yep, works perfectly fine under 2.6.8-rc2 davem@nuts:/disk1/BK/net-2.6$ uname -a Linux nuts 2.6.8-rc2 #4 SMP Fri Jul 23 00:10:29 PDT 2004 sparc64 GNU/Linux davem@nuts:/disk1/BK/net-2.6$ gcc -m32 -O2 -o bind bind.c davem@nuts:/disk1/BK/net-2.6$ ./bind Setting V6ONLY to 0 V6ONLY setting is now: 0 Accepted connection from ::ffff:127.0.0.1, port 32782 davem@nuts:/disk1/BK/net-2.6$
Bug still present in: Linux fish.cambridge.redhat.com 2.6.7-1.503 #1 SMP Mon Aug 2 14:05:20 EDT 2004 ppc64 ppc64 ppc64 GNU/Linux
Bug is in generic compat_sys_setsockopt(). It mangles the arguments if optname == SO_ATTACH_FILTER. SO_ATTACH_FILTER == 26 == IPV6_V6ONLY. Surely when deciding how to mangle we ought to be looking at level too, not just optname? I'm going to chase down bug #129905 first -- the mangling for some of the special cases also looks broken.
Created attachment 103442 [details] Patch to be more selective about when we mangle sockopt args. This makes compat_sys_setsockopt() a little more selective about when it mangles the arguments. It does fix the observed problem but it's still not right. Is do_netfilter_replace() actually correct to do the same thing for IPT_SO_SET_REPLACE and IPT6_SO_SET_REPLACE and anything else which happens to have optname 64? Should we move the struct-conversion to the actual sockopt handlers rather than having it in the syscall wrapper?
cf. bug #129905 -- when we _do_ munge, we seem to get it wrong somehow. Does _that_ work OK on sparc64?
If you want to move it into the sockopt handlers you'll need to somehow pass down "this is compat sockopt" down the call chain. Every time I've suggested to add a "is_compat_task()" call Andi Kleen and friends shoot it down. Don't get me started with do_netfilter_replace(), that area needs a lot of work. But it is true that each and every piece of code needs to check the socket level too, the sockopt numbers are only unique within a socket level space.
is_compat_task() might be nice in the general case but in this specific case we have optlen. Isn't that sufficient to tell which kind of userspace we're dealing with -- at least in the cases where it _matters_?
Not at all, there are cases where the structure size is the same between 32-bit and 64-bit, but due to alignment rules the actual layout is different. You have to pass a flag down to the routines if you want to do this. See MSG_CMSG_COMPAT in net/*.c for an idea about how we deal with this in other cases.
Patch accepted upstream. Should propagate to our kernel soon.
Hmm. MSG_CMSG_COMPAT seems to be required because that's not all happening in the same process. But is_compat_task() should be trivial to implement -- surely? #define is_compat_task(tsk) (tsk->personality & PER_MASK == PER_LINUX32) What was Andi's objection?
Just because you're PER_LINUX32 doesn't mean you can't invoke the 64-bit system calls. And likewise, a 64-bit native task could use one of the 32-bit system calls for some reason. So compat'ness is an attribute of the code path we're in, not the overall state of the process. Frankly I think Andi's right. And that's why you have to pass flags around down into the call to implement stuff like this.
Is everyone happy with how this ended up in the RHEL4 kernel?
Yes, this is fixed.