=Comment: #0================================================= Joy M. Latten <latten.com> - 2008-04-16 19:01 EDT ---Problem Description--- TAHI IPsec IPv6 Ready Logo testcase #16 fails on ppc64 when using ESP with encryption algorithm set to 3des-cbc and authentication set to "null". I tried this manually using setkey and sure enough it works on i386 but fails on ppc64. Contact Information = Joy Latten/latten.com ---uname output--- 2.6.18-88.el5 #1 SMP Tue Apr 1 19:06:25 EDT 2008 ppc64 ppc64 ppc64 GNU/Linux Machine Type = p520 ---Steps to Reproduce--- Use following configuration for setkey. flush; spdflush; add <ipv6addrA> <ipv6addrB> esp 0x1000 -m transport -E 3des-cbc "ipv6readylogo3descbcin01" -A null; add <ipv6addrB> <ipv6addrA> esp 0x1458 -m transport -E 3des-cbc "ipv6readylogo3descbcin01" -A null; spdadd <ipv6addrA> <ipv6addrB> any -P out ipsec esp/transport//require; spdadd ipv6addrB ipv6addrA any -P in ipsec esp/transport//require; On machine A do a ping6 <ipv6addrB>. ping fails for ppc64.
------- Comment From latten.com 2008-04-17 00:15 EDT------- It seems that authentication fails verification on ppc64 when "null" integrity is used... so packets are dropped.
Joy, can you give me packet dump taken on the receiving host (tcpdump -w -s 1600 should do the trick)? Alternatively if Linda can get me onto a ppc test machine I can try this myself. BTW, does it fail if you transmit from i386 to ppc64 and vice versa?
Created attachment 302766 [details] tcpdump when pinging from ppc64 to i386 Herbert, yes it fails in both directions. I will include a dump for each direction. This is the dump from ppc64 to i386. I added the following in my setkey config file because I didn't want to be confused by ICMPv6 neighbor solicitation pkts, so I don't apply ipsec to them. spdadd ::/0 ::/0 icmp6 135,0 -P in none; spdadd ::/0 ::/0 icmp6 135,0 -P out none; The weird thing is are those ESP pkts really ICMPv6 echo reply pkts? And if so, where are the echo requests from ppc64? I think I will do a tcpdump on the ppc64 too to see if any pkts go out.
Created attachment 302767 [details] tcpdump when pinging from i386 to ppc64 Here is the tcpdump hen pinging from i386 to ppc64.
Created attachment 302787 [details] tcpdump using nc command to send from ppc64 to i386. the below policy only prevents neighbor solicitation packets from being ESP'd and not the neighbor advertisements. So in the dump_i386, those ESP packets are probably not echo replies but advertisements perhaps. spdadd ::/0 ::/0 icmp6 135,0 -P in none; spdadd ::/0 ::/0 icmp6 135,0 -P out none; So, I removed the above lines from my setkey config file and decided not to use ping or icmp for debugging. Instead I used below setkey setup which changes protocol from "any" to "tcp" and use the "nc" command to generate some tcp traffic. The new attachment is tcpdump from ppc64 to i386. nc listens on i386 and sends on ppc64. fec00:0:0:105::64 is ppc64 ipv6address and fec00:0:0:105::35 id i386 ipv6address. nc fails to establish a connection on the i386 box. my setkey.conf looks like: flush; spdflush; add fc00:0:0:105::35 fc00:0:0:105::64 esp 0x1000 -m transport -E 3des-cbc "ipv6readylogo3descbcin01" -A null ; add fc00:0:0:105::64 fc00:0:0:105::35 esp 0x1458 -m transport -E 3des-cbc "ipv6readylogo3descbcin01" -A null; spdadd fc00:0:0:105::35 fc00:0:0:105::64 tcp -P out ipsec esp/transport//require; spdadd fc00:0:0:105::64 fc00:0:0:105::35 tcp -P in ipsec esp/transport//require;
Created attachment 302788 [details] tcpdump using nc command to send tcp traffic from i386 to ppc64
------- Comment From latten.com 2008-04-17 18:49 EDT------- Odd thing is, if you remove "-A null;" from ppc64 setkey config, this works. Because of the way we call authenc(), specifying "-A null" or just specifying an encryption algorithm should both result in authenc() being called with "hmac(digest_null)". Only difference is we will have an x->aalg and an aalg_desc when using "-A null"...
------- Comment From latten.com 2008-04-18 16:39 EDT------- according to the debugging I have done, the integrity verifies ok. I am not sure what is going on... the only difference between saying "-A null" in ESP and not, is the first one gets an x->aalg and aalg_desc.
That's really strange. If only I had a ppc to test this on :) So what does ip -s x s say on the ppc? If I got the directions right then it would appear that the i386 can receive packets from the ppc but not vice versa. Is the integrity counter going up or perhaps the packets are being dropped somewhere else? Thanks!
------- Comment From latten.com 2008-04-21 14:06 EDT------- Herbert, yes, the i386 appears to receive packets from the ppc ok. Just not vice versa. Thus nc doesn't work since it needs to "handshake" or whatever to establich connection. I wish you had access to a ppc64 too! :-) According to my debugging, integrity check succeeds. Also, below ouput of "ip -s x s" confirmes it. So the packet must be dropping elsewhere. Will look further and see where. So far, xfrm_rcv_spi() appears to succeed and returns a "1". Confusing! And it works on i386... just ppc64. I wonder if memory somewhere needs to be bzero'd or something... [root@nachos]# ip -s x s src fc00:0:0:105::64 dst fc00:0:0:105::35 proto esp spi 0x00001458(5208) reqid 0(0x00000000) mode transport replay-window 0 seq 0x00000000 flag (0x00000000) auth hmac(digest_null) 0x (0 bits) enc cbc(des3_ede) 0x6970763672656164796c6f676f33646573636263696e3031 (192 bits) lifetime config: limit: soft (INF)(bytes), hard (INF)(bytes) limit: soft (INF)(packets), hard (INF)(packets) expire add: soft 0(sec), hard 0(sec) expire use: soft 0(sec), hard 0(sec) lifetime current: 0(bytes), 0(packets) add 2008-04-21 07:39:05 use - stats: replay-window 0 replay 0 failed 0 src fc00:0:0:105::35 dst fc00:0:0:105::64 proto esp spi 0x00001000(4096) reqid 0(0x00000000) mode transport replay-window 0 seq 0x00000000 flag (0x00000000) auth hmac(digest_null) 0x (0 bits) enc cbc(des3_ede) 0x6970763672656164796c6f676f33646573636263696e3031 (192 bits) lifetime config: limit: soft (INF)(bytes), hard (INF)(bytes) limit: soft (INF)(packets), hard (INF)(packets) expire add: soft 0(sec), hard 0(sec) expire use: soft 0(sec), hard 0(sec) lifetime current: 120(bytes), 3(packets) add 2008-04-21 07:39:05 use 2008-04-21 07:41:01 stats: replay-window 0 replay 0 failed 0
------- Comment From latten.com 2008-04-21 15:02 EDT------- hmmm... this looks likes it is failing all the way up in the tcp layer... in xfrm6_policy_check() in tcp_v6.c...
------- Comment From latten.com 2008-04-21 18:37 EDT------- It seems __xfrm_policy_check() fails. I was able to trace this down to a failure in xfrm_state_ok() when we check the auth algo. trace on i386: tmpl->aalgos = 4294967295 x->props.aalgo = 251 1<<x->props.aalgo = 134217728 tmpl->aalgos & (1<<x->props.aalgo) = 134217728 trace on ppc64: tmpl->aalgos = 4294967295 x->props.aalgo = 251 1<<x->props.aalgo = 0 tmpl->aalgos & (1<<x->props.aalgo) = 0 I did a test program to verify: main(int c, char **argv) { u_int8_t p; p = 5; printf("p=5: 1<<p is %u\n", 1<<p); p = 251; printf("p=251: 1<<p is %u\n", 1<<p); } On i386 i get: p=5: 1<<p is 32 p=251: 1<<p is 134217728 On ppc64 I get: p=5: 1<<p is 32 p=251: 1<<p is 0 That would explain why this only fails when auth algo is "null". Because SADB_X_AALG_NULL = 251. All the other auth algorithms are a value <= 9... Not sure of the purpose of "1<<x->props.algo", so don't know how to fix this. Herbert, do you have any ideas on how to fix this?
Created attachment 303276 [details] [IPSEC]: Fix catch-22 with algorithm IDs above 31 I just posted this patch upstream: As it stands it's impossible to use any authentication algorithms with an ID above 31 portably. It just happens to work on x86 but fails miserably on ppc64. The reason is that we're using a bit mask to check the algorithm ID but the mask is only 32 bits wide. After looking at how this is used in the field, I have concluded that in the long term we should phase out state matching by IDs because this is made superfluous by the reqid feature. For current applications, the best solution IMHO is to allow all algorithms when the bit masks are all ~0. The following patch does exactly that. This bug was identified by IBM when testing on the ppc64 platform using the NULL authentication algorithm which has an ID of 251. Signed-off-by: Herbert Xu <herbert.org.au>
------- Comment From latten.com 2008-04-22 10:52 EDT------- Thanks, Herbert!! :-)
herbert, thank you for the patch. I'm good with it, but I wanted to check with you regarding ABI compatibility. In the patch you add a member to xfrm_templ. Even if this member were added to the end of the structure, its embedded in xfrm_policy, which means the allocated size of that structure will change. It looks like all allocators of xfrm_policy in the kernel use xfrm_policy_alloc to create new instances of that structure, so that should be no problem. However, are you sufficiently confident that no out of tree modules will statically declare an xfrm_policy struct? If they do that, we have a potential memory corruptor with this patch. I'm fine with it if you feel like no one will really be doing that, but I wanted to check before I went ahead with this. Thanks!
Created attachment 303597 [details] RHEL5 back-port That's a good point Neil. I suggest that we do something like this for RHEL5.
Thanks herbert, this looks very close to what I was working on last night to ensure abi compatibility. I'll smoke test it and get it posted for 5.3 today. Thanks!
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
------- Comment From tchicks.com 2008-04-30 17:19 EDT------- Red Hat - Can we get confirmation that a fix for this bug is targeted for the zstream release? Thanks!
in kernel-2.6.18-93.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
Partners, this bug should be fixed in the latest RHEL 5.3 Snapshot. We believe that you have some interest in its correct functionality, so we're making a friendly request to send us some testing feedback. If you have a chance to test it, please share with us your findings. If you have successfully VERIFIED the fix, please add PartnerVerified to the Bugzilla keywords, along with a description of the results. Thanks!
Joy says in an internal comment, "This has been successfully tested in 5.3". This bug is already closed on our side.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html