Bug 631833
Summary: | Big performance regression found on connect/request/response test through IPSEC (openswan) transport | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Adam Okuliar <aokuliar> |
Component: | kernel | Assignee: | Herbert Xu <herbert.xu> |
Status: | CLOSED ERRATA | QA Contact: | Network QE <network-qe> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 6.0 | CC: | aokuliar, jolsa, jpirko, jwest, kzhang, nhorman, rmusil, sgrubb, tgraf |
Target Milestone: | rc | Keywords: | Regression, ZStream |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | kernel-2.6.32-130.el6 | Doc Type: | Bug Fix |
Doc Text: |
The XFRM_SUB_POLICY feature causes all bundles to be at the finest
granularity possible. As a result of the data structure used to implement
this, the system performance would drop considerably. This update disables
a part of XFRM_SUB_POLICY, eliminating the poor performance at the cost of
sub-IP address selection granularity in the policy.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2011-05-23 20:51:37 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 710645 | ||
Attachments: |
Description
Adam Okuliar
2010-09-08 13:45:04 UTC
Created attachment 446951 [details]
ipsec barf output to print rhel6 IKE establishment time.
Created attachment 446952 [details]
ipsec barf output to print rhel55 IKE establishment time.
Hi all Sorry for long delay - we hawe new hw for testing, I was dealing with hardware stuff. Results from new hardware are werry similar as were on old - we still have a big performance loss - please see http://download.englab.brq.redhat.com/perf-results//netperf/rhel6_rc4/#crr4_172-16-20-11_to_172-16-20-21 I'am running netperf at point where SA is established, and all tests does not takes long than 3600secs, so there is not need to renegotiate SA. I suppose that problem will be probably in NETKEY part. If you need any other info please let me know. Cheers! Adam Can I get access to the setup to find out what is going on netkey side? Not sure right now where is the actual problem? As I reported in the comment #5 that I do not see this performance issue caused by Openswan's IKE protocol. One way to ensure that the problem is in NETKEY (kernel side ipsec) is to setup ipsec policies manually using "ip xfrm" commands, and then do the performance verification. This will eliminate the need for openswan, and if you still see the performance degradation, we can say for sure that NETKEY is the issue, and we can transfer the bug to kernel people to have a look. So please either you can do this by yourself and let me know, or give me access to your hardware so that I can try by myself, as I have tried several times on my VMs locally but this is not useful at all. Hi I need a little bit help with this. I am using flowing commands: On Machine A ip xfrm policy update dir in src 172.16.20.21/32 dst 172.16.20.11/32 proto any action allow priority 31798 tmpl src 172.16.20.21 dst 172.16.20.11 proto esp mode transport reqid 1 level required ip xfrm policy update dir out src 172.16.20.11/32 dst 172.16.20.21/32 proto any action allow priority 31798 tmpl src 172.16.20.11 dst 172.16.20.21 proto esp mode transport reqid 1 level required ip xfrm state add src 172.16.20.11 dst 172.16.20.21 proto esp spi 0x00000301 mode transport auth md5 0x96358c90783bbfa3d7b196ceabe0536b enc des3_ede 0xf6ddb555acfd9d77b03ea3843f2653255afe8eb5573965df ip xfrm state add src 172.16.20.21 dst 172.16.20.11 proto esp spi 0x00000302 mode transport auth md5 0x96358c90783bbfa3d7b196ceabe0536b enc des3_ede 0xf6ddb555acfd9d77b03ea3843f2653255afe8eb5573965df On Machine B ip xfrm policy update dir in src 172.16.20.11/32 dst 172.16.20.21/32 proto any action allow priority 31798 tmpl src 172.16.20.11 dst 172.16.20.21 proto esp mode transport reqid 1 level required ip xfrm policy update dir out src 172.16.20.21/32 dst 172.16.20.11/32 proto any action allow priority 31798 tmpl src 172.16.20.21 dst 172.16.20.11 proto esp mode transport reqid 1 level required ip xfrm state add src 172.16.20.11 dst 172.16.20.21 proto esp spi 0x00000301 mode transport auth md5 0x96358c90783bbfa3d7b196ceabe0536b enc des3_ede 0xf6ddb555acfd9d77b03ea3843f2653255afe8eb5573965df ip xfrm state add src 172.16.20.21 dst 172.16.20.11 proto esp spi 0x00000302 mode transport auth md5 0x96358c90783bbfa3d7b196ceabe0536b enc des3_ede 0xf6ddb555acfd9d77b03ea3843f2653255afe8eb5573965df When trying to ping from A to B I always get: ping: sendmsg: No such process Have you any idea what I am doing wrong? Thanks a lot Adam Few things you can try: 1. Check your firewall settings. 2. You have set same keys (for both md5 and des3) for both directions (in and out), try to give different keys. 3. Also check the output of the "ip xfrm policy" and "ip xfrm state" is same as when you use IKE (openswan) (except the keys) to make sure that your commands are fine. Hi, I can give you acces to those machines. Could you please investigate this? Thanks Adam Yes please, I can try. Hi, Any updates? I need those systems now. But if you will need them for experimentation feel free to mail me or ping me on irc, i can give you access instantaneously. Thanks a lot Adam Not really. Yes, you can take them. Thanks. Created attachment 474357 [details]
tcpdump of one test run
Hi, I posted tcpdump of one run of the benchmark. There is only one ISAKMP negotiation, so I assume, that problem will be in kernel ipsec implementation. Cheers, Adam I just had a look and this test is good. You are right because there is just one ISAKMP negotiation, that means Openswan is just creating the connection, and doing nothing else on top of that, and this overhead is negligible. So this must be related to kernel ipsec or somewhere else. Due to this, I am now changing it component so that kernel people can have a look at it. Triage assignment. If you feel this bug doesn't belong to you, or that it cannot be handled in a timely fashion, please contact me for re-assignment Adam, do you have a comparative tcpdump from the RHEL5 system? The fastest way to see whats slowing the RHEL6 system down is to compare the same test on RHEL5 to the test on RHEL6. looking at teh RHEL6 tcpdump, I agree it looks like we do only have one negotiation phase in ISAKMP, but it appears that it takes at least ~10 seconds to complete negotiation. If this tcpdump records the entire test, then it appears that, out of a 32 second test, we're spending almost half that test time waiting to start transmitting ESP frames (I don't see the first ESP frame until 17 seconds into the trace). Hi, this first negotiation happened actually before test. It takes ~10 sec because I entered command 'service ipsec restart' manually on both machines after start of dumping. There are few seconds before test included in this dump. I created another dump with established isakmp security association. In this dump there are no ISAKMP messages, only encrypted ESP test packets. Sorry for confusing. I can provide also comparative rhel5 dump if you want. Created attachment 475149 [details]
tcpdump of one test run with isakmp session established
yes, please attach the RHEL5 trace for comparison, I'd appreciate it. Created attachment 475215 [details]
tcpdump of one test run with isakmp session established on rhel 5.5
well, so the tcpdumps make it pretty clear I think. The packet rate on the RHEL6 box is really just about 30% of that of the RHEL5.5 box. Thats bad. I can't tell of course what the root cause is given that (not sure if the crypto library is taking longer to encrypt packets of if the tcp layer is slower, or if the application just isn't running as often, and generating less data. To determine that, I'll really just need to poke about on these system. Adam, is it possible for you to loan these systems to me so that I can dig into where our bottleneck is? will do, I'll start poking at this asap. Interesting note to self: I did the test in comment 29 iteratively a few times and came up with this: [root@hp-dl380g7-01 ~]# netperf -H 172.16.20.21 -L 172.16.20.11 -t TCP_CRR TCP Connect/Request/Response TEST from 172.16.20.11 (172.16.20.11) port 0 AF_INET to 172.16.20.21 (172.16.20.21) port 0 AF_INET : histogram : interval Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate bytes Bytes bytes bytes secs. per sec 16384 87380 1 1 10.00 1155.90 16384 87380 [root@hp-dl380g7-01 ~]# netperf -H 172.16.20.21 -L 172.16.20.11 -t TCP_CRR TCP Connect/Request/Response TEST from 172.16.20.11 (172.16.20.11) port 0 AF_INET to 172.16.20.21 (172.16.20.21) port 0 AF_INET : histogram : interval Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate bytes Bytes bytes bytes secs. per sec 16384 87380 1 1 10.00 493.00 16384 87380 [root@hp-dl380g7-01 ~]# netperf -H 172.16.20.21 -L 172.16.20.11 -t TCP_CRR TCP Connect/Request/Response TEST from 172.16.20.11 (172.16.20.11) port 0 AF_INET to 172.16.20.21 (172.16.20.21) port 0 AF_INET : histogram : interval Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate bytes Bytes bytes bytes secs. per sec 16384 87380 1 1 10.00 357.50 16384 87380 [root@hp-dl380g7-01 ~]# netperf -H 172.16.20.21 -L 172.16.20.11 -t TCP_CRR TCP Connect/Request/Response TEST from 172.16.20.11 (172.16.20.11) port 0 AF_INET to 172.16.20.21 (172.16.20.21) port 0 AF_INET : histogram : interval Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate bytes Bytes bytes bytes secs. per sec 16384 87380 1 1 10.00 292.40 16384 87380 It seems that as I run the test performance is collapsing rapidly. I have no idea what would cause behavior like that. Looking further. perf data indicates the following: 46.81% netperf [kernel.kallsyms] [k] xfrm_bundle_ok 38.94% netperf [kernel.kallsyms] [k] __xfrm4_find_bundle We're spending about %80 of our time in xfrm_bundle_ok and __xfrm4_find_bundle. I'd be willing to bet that that stat (If I had prevous perf samples) gets larger as time goes on. I expect we're looping on some list that has lost entries that hang around on it forever or some such. I'll start digging in tomorrow and see if I can't find the root cause. Adam, feel free to reclaim the machines for testing purposes. I'll update the bz when I have a patch/theory that I need them to explore again. Note to self: Based on the above testing, I think we need this commit: 80c802f3073e84c956846e921e8a0b02dfa3755f I'm constructing a kernel with this patch backported now. Ugh, looks like that helped but not enough. I'm inclined, given that xfrm only has 60 changelog entries, and only a few minor abi breakers to just sync us with upstream. Neil, I'm confused here. Did we test with the latest upstream kernel to conclude that it doesn't suffer from this regression? Also, any chance I can get access to these machines for a couple of days? Thanks! We did not, Based on my testing in comments 31, 32 and 33 that we had an upstream patch that fit the bill for what this bug looked liked, that I figured it was faster to backport the patch than grab the full upstream kernel, but the patch turned out to require more backporting than I initially though. As for the machines, I can't give you access to them. Comment 29 has the system details, but they are owned/managed by adam and shared between several people so you'll need to ask him for time on them. Thanks Neil! Adam, please let me know when it's convenient for me to test those machines. Hi Herbert, There are some tests scheduled on this systems today and during weekend, but I can give you access next week. Thanks a lot, Adam Hi Herbert. Machines are now ready - sorry for delay, we were dealing with another urgent issue. Please use these systems: hp-dl380g7-01.lab.eng.brq.redhat.com hp-dl385g7-01.lab.eng.brq.redhat.com systems are provisioned with latest 6.1 which also suffers from this regression. For setting up configuration please run /root/prepare_sys.py. To test actual performance please use netperf -L 172.16.20.11 -H 172.16.20.21 -t TCP_CRR on hp-dl380g7-01.lab.eng.brq.redhat.com Please notify me before you are going to use these systems. I just want to know that you are logged in. This is only because I don't want to interrupt your work with my experiments. Thanks, Adam Hi Adam: Can you use those systems now? Hi herbert In few minutes they can be prepared for you. Please tell me yours irc login. I'll ping you when they are ready. Adam It's "herbert". Thanks Adam. Created attachment 485560 [details]
Fix runaway bundles
OK, the problem is that we're creating a new bundle every time and the bundle list gets longer and longer.
Please apply this patch and try again.
Thanks!
Hi Herbert. I applied your patch on 2.6.32-112.el6 https://brewweb.devel.redhat.com/buildinfo?buildID=159163 Unfortunately there is no improvement at all. CRR performance is same as with original kernel. I can provide you patched .srpm and built kernel rpm as well as access to our systems as well for further investigation. Thanks, Adam Thanks Adam. What I will try to do now is to reproduce the problem on my machines as from what I've on yours have eliminated the hardware as the cause. I'll let you know how I go. OK, I've tracked it down to CONFIG_XFRM_SUB_POLICY. Apparently we've completely broken it upstream during the policy/bundle cache rework, while RHEL6 still retained the original semantics which is also stuffed. So my recommendation for RHEL6 is to disable this option. I just tried it here and it solved the problem. Created attachment 487257 [details]
Disable granular bundles
Adam, please try this patch (without disabling XFRM_SUB_POLICY) and let me know if it fixes the problem.
Thanks!
Hi Herbert, This helped a lot. With this patch we have ~91% performance of RHEL5.5. There is still 9% regression, but I believe that this is because of Bug 652311, which is plain-text CRR regression via bnx2. I believe that your patch resolved this ipsec problem. Thanks a lot, Adam Patch(es) available on kernel-2.6.32-130.el6 Reproduced on: 2.6.32-125.el6.x86_64 verified on: 2.6.32-130.el6.x86_64 Changing to verified status. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0542.html Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: The XFRM_SUB_POLICY feature causes all bundles to be at the finest granularity possible. As a result of the data structure used to implement this, the system performance would drop considerably. This update disables a part of XFRM_SUB_POLICY, eliminating the poor performance at the cost of sub-IP address selection granularity in the policy. |