Hide Forgot
Description of problem: When an IPsec tunnel between a RHEL 5.4 server (2.6.18-194.17.1.el5 #1 SMP Mon Sep 20 07:12:06 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux) and an external gateway is take down for some reason. A subsequent connect() call to a socket on the other end of the openswan IPsec tunnel hangs for more than 30 seconds. The socket is set to non-blocking. Version-Release number of selected component (if applicable): RHEL Kernel: 2.6.18-194.17.1.el5 #1 SMP Mon Sep 20 07:12:06 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux Openswan version: openswan-2.6.21-5.el5_5.3 How reproducible: 100% reproducible. Happens every time. Steps to Reproduce: 1. Establish an IPsec tunnel between an RHEL server mentioned above and an external gateway 2. Take the tunnel down 3. Try to connect() to a socket on the other end of the tunnel Actual results: connect() hangs for over 30 secs. Expected results: The connect() should fail right away. The socket is set to non-blocking. Additional info: Output from ps command that shows that thread in question is stuck on the wait chan xfrm_l which is an ipsec kernel module. bmc-hp1-0-0-0:/root-# ps -aefLl | grep __http_stack_asr 1 S nxuser 10847 19118 10905 0 15 75 0 - 134939 xfrm_l 17:28 ? 00:00:00 /opt/proxy/bin/http_stack Http Stack Bmc__http_stack_asr_1 257 asr-1 10847 localhost 108 bmc-hp1-0-0-0:/root-# date Fri Jan 21 19:13:25 CET 2011 bmc-hp1-0-0-0:/root-# ps -aefLl | grep __http_stack_asr 1 S nxuser 10847 19118 10905 0 15 75 0 - 134939 xfrm_l 17:28 ? 00:00:00 /opt/proxy/bin/http_stack Http Stack Bmc__http_stack_asr_1 257 asr-1 10847 localhost 108 bmc-hp1-0-0-0:/root-# ps -aefLl | grep __http_stack_asr 1 S nxuser 10847 19118 10905 0 15 75 0 - 134939 xfrm_l 17:28 ? 00:00:00 /opt/proxy/bin/http_stack Http Stack Bmc__http_stack_asr_1 257 asr-1 10847 localhost 108 1 S nxuser 10847 19118 10905 0 15 75 0 - 134939 xfrm_l 17:28 ? 00:00:00 /opt/proxy/bin/http_stack Http Stack Bmc__http_stack_asr_1 257 asr-1 10847 localhost 108 bmc-hp1-0-0-0:/root-# date Fri Jan 21 19:13:48 CET 2011 bmc-hp1-0-0-0:/root-# ps -aefLl | grep __http_stack_asr 1 S nxuser 10847 19118 10905 0 15 75 0 - 134939 xfrm_l 17:28 ? 00:00:00 /opt/proxy/bin/http_stack Http Stack Bmc__http_stack_asr_1 257 asr-1 10847 localhost 108 bmc-hp1-0-0-0:/root-# date Fri Jan 21 19:13:54 CET 2011 bmc-hp1-0-0-0:/root-# date Fri Jan 21 19:13:56 CET 2011 bmc-hp1-0-0-0:/root-# ps -aefLl | grep __http_stack_asr 1 S nxuser 10847 19118 10905 0 15 75 0 - 134811 - 17:28 ? 00:00:00 /opt/proxy/bin/http_stack Http Stack Bmc__http_stack_asr_1 257 asr-1 10847 localhost 108 bmc-hp1-0-0-0:/root-# date Fri Jan 21 19:14:00 CET 2011 bmc-hp1-0-0-0:/root-# gdb info: > (gdb) up > #5 0xf7e16bac in stSanityCheck () at dispatch.c:1351 > 1351 in dispatch.c > (gdb) p d > $7 = (dispatcher_t *) 0x9be6ef4 > (gdb) p d->cpuInfo > $8 = {pid = 19205, cpu = 0, _idx = 0, _cpu = {0, 0, 0, 0, 0}, admissionRate = 100, selfAdmissionRate = 100, agentsAdmissionRate = 100, highTreshold = 90, > emergencyTreshold = 99, gainUpFactor = 1, gainDownFactor = 1, loadFactor = 0, count = 0, _oldtime = {tv_sec = 0, tv_usec = 0}, olduclicks = 0, oldsclicks = 0, > procFileFd = 0, procFileName = "/proc/19041/task/19205/stat", '\0' > <repeats 228 times>, alarmTimer = 0, alarmRaised = false} > (gdb) thread 8 > [Switching to thread 8 (Thread 4088118160 (LWP 19205))]#0 0xffffe410 > in __kernel_vsyscall () > (gdb) bt > #0 0xffffe410 in __kernel_vsyscall () > #1 0xf7bb3fa8 in connect () from /lib/libpthread.so.0 > #2 0xf7e2dfd2 in stCreateSock (dstAddr=0xf5903378, > srcAddr=0xf59032f8) at sock.c:519 > #3 0xf7e2ff65 in stNewSocket (dispatcher=0x9be6ef4, > dstAddr=0xf5903378, srcAddr=0xf59032f8, fd=-1) at socket.c:242 > #4 0xf7e2ec74 in stGetStreamSocketByAddr (stream=0xf59031ac, ssl=0) > at > sock.c:734 > #5 0xf7e2ef80 in stGetStreamSocket (stream=0xf59031ac, ssl=0) at > sock.c:813 > #6 0x0806438f in ?? () > #7 0xf59031ac in ?? () > #8 0x00000000 in ?? () > (gdb)
Heres some more info I gathered by strace on this process: I see that the socket is being set to NONBLOCK and then the connect() is being called to the IP that is the other end of the IPsec tunnel: [pid 20528] fcntl64(58, F_SETFL, O_RDWR|O_NONBLOCK <unfinished ...> [pid 20502] gettimeofday( <unfinished ...> [pid 20528] <... fcntl64 resumed> ) = 0 [pid 20502] <... gettimeofday resumed> {1295642530, 830768}, NULL) = 0 [pid 20528] connect(58, {sa_family=AF_INET, sin_port=htons(7772), sin_addr=inet_addr("10.155.199.101")}, 128 <unfinished ...> and then much later in the strace output, I see the same thread (20528) return from the connect() as such: [pid 20528] <... connect resumed> ) = -1 ESRCH (No such process) Time elapsed between these 2 strace segments was more than 30 seconds.
And heres the output of "ipsec auto --status". There are 2 tunnels configured but both are down: 000 "ag-10.155.199.101": 10.155.155.78<10.155.155.78>[C=US, ST=OH, O=ALU, OU=MSG, CN=BMC1, E=BMC1,+S=C]---10.155.155.5...10.155.199.101<10.155.199.101>[C=US, ST=OH, O=ALU, OU=MSG, CN=AG01, E=AG01,+S=C]; prospective erouted; eroute owner: #0 000 "ag-10.155.199.101": myip=unset; hisip=unset; mycert=bmc; 000 "ag-10.155.199.101": CAs: 'C=US, ST=OH, L=COLUMBUS, O=ALU, OU=MSG, CN=CA, E=CA'...'%any' 000 "ag-10.155.199.101": ike_life: 86400s; ipsec_life: 28800s; rekey_margin: 600s; rekey_fuzz: 100%; keyingtries: 0 000 "ag-10.155.199.101": policy: RSASIG+ENCRYPT+TUNNEL+PFS+UP+IKEv2ALLOW; prio: 32,32; interface: bond1:alrt; 000 "ag-10.155.199.101": newest ISAKMP SA: #0; newest IPsec SA: #0; 000 "ag-10.155.199.101": IKE algorithms wanted: AES_CBC(7)_128-SHA1(2)-MODP2048(14); flags=-strict 000 "ag-10.155.199.101": IKE algorithms found: AES_CBC(7)_128-SHA1(2)_160-14, 000 "ag-10.155.199.101": ESP algorithms wanted: AES(12)_128-SHA2_256(5); flags=-strict 000 "ag-10.155.199.101": ESP algorithms loaded: AES(12)_128-SHA2_256(5)_256 000 "ag-10.155.199.102": 10.155.155.78<10.155.155.78>[C=US, ST=OH, O=ALU, OU=MSG, CN=BMC1, E=BMC1,+S=C]---10.155.155.5...10.155.199.102<10.155.199.102>[C=US, ST=OH, O=ALU, OU=MSG, CN=AG02, E=AG02,+S=C]; prospective erouted; eroute owner: #0 000 "ag-10.155.199.102": myip=unset; hisip=unset; mycert=bmc; 000 "ag-10.155.199.102": CAs: 'C=US, ST=OH, L=COLUMBUS, O=ALU, OU=MSG, CN=CA, E=CA'...'%any' 000 "ag-10.155.199.102": ike_life: 86400s; ipsec_life: 28800s; rekey_margin: 600s; rekey_fuzz: 100%; keyingtries: 0 000 "ag-10.155.199.102": policy: RSASIG+ENCRYPT+TUNNEL+PFS+UP+IKEv2ALLOW; prio: 32,32; interface: bond1:alrt; 000 "ag-10.155.199.102": newest ISAKMP SA: #0; newest IPsec SA: #0; 000 "ag-10.155.199.102": IKE algorithms wanted: AES_CBC(7)_128-SHA1(2)-MODP2048(14); flags=-strict 000 "ag-10.155.199.102": IKE algorithms found: AES_CBC(7)_128-SHA1(2)_160-14, 000 "ag-10.155.199.102": ESP algorithms wanted: AES(12)_128-SHA2_256(5); flags=-strict 000 "ag-10.155.199.102": ESP algorithms loaded: AES(12)_128-SHA2_256(5)_256 000 000 #297: "ag-10.155.199.101":500 STATE_MAIN_I1 (sent MI1, expecting MR1); EVENT_RETRANSMIT in 6s; nodpd; idle; import:admin initiate 000 #297: pending Phase 2 for "ag-10.155.199.101" replacing #0 000 #296: "ag-10.155.199.102":500 STATE_MAIN_I1 (sent MI1, expecting MR1); EVENT_RETRANSMIT in 6s; nodpd; idle; import:admin initiate 000 #296: pending Phase 2 for "ag-10.155.199.102" replacing #0 000
Issue resolved in Customer Portal case 00410706. Closing NOTABUG. Feel free to open a case with Red Hat Support via the Customer Portal if this is still an issue.