Description of problem: Establishing a network-to-network IPSec VPN connection between RHEL3 servers, when these are running on ADSL (PPPoE) connections, where the IP addresses change, causes a kernel panic. Originaly logged in Bugzilla (<a href="https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=166531">#166531</a> and <a href="https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=164730">#164730</a>) as a kernel issue but reading through newsgroups/ipsec-tools development maillogs gives me the indication that this may have been filed incorrectly (kernel still shouldn't panic though, no?)... Version-Release number of selected component (if applicable): ipsec-tools-0.2.5-0.7 Also tested with ipsec-tools-0.3.3-5.6 from http://people.redhat.com/notting/ipsec/ How reproducible: Run a network-to-network IPSec tunnel between PPPoE servers, servers will usually lock up after +-2 days (IP changes daily)... Steps to Reproduce: 1. Patch 'ifup-ipsec' and 'ifdown-ipsec' to provide for network-network tunnels Details in Bugzilla <a href="https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=150179">#150179</a> 2. Configure /etc/sysconfig/network-scripts/ifcfg-ipsec0: [-------------- /etc/sysconfig/network-scripts/ifcfg-ipsec0 --------------] TYPE=IPsec DEVICE=ipsec0 ONBOOT=no IKE_METHOD=PSK IKE_PSK=654321 SRCGW=192.168.4.1 DSTGW=192.168.0.1 SRCNET=192.168.4.0/22 DSTNET=192.168.0.0/22 DST=196.41.206.216 [-------------- /etc/sysconfig/network-scripts/ifcfg-ipsec0 --------------] 3. Establish ADSL (PPPoE) connections on the server 4. Installed the following script to reconfigure ifcfg-ipsec0 when remote machine's IP changes #!/bin/sh ############################################################################## # Define Variables # vpncfg='/etc/sysconfig/network-scripts/ifcfg-ipsec1'; export vpnip=`dig vpn.gtct.co.za +tcp +short | tail -n 1`; vpncheckip='192.168.8.1'; racoondir='/etc/racoon'; vpndev=${vpncfg#/etc/sysconfig/network-scripts/}; # Done # ############################################################################## vpnup () { local VAR; VAR=`ping -s 1 -c 1 $vpncheckip > /dev/null 2>&1; echo $?`; if [ $VAR -ne 0 ]; then VAR=`ping -s 1 -c 5 $vpncheckip > /dev/null 2>&1; echo $?`; fi return $VAR; } newvpngateway () { export oldip=`fgrep 'DST=' $vpncfg | perl -pe 's/.*DST= (\d+\.\d+\.\d+\.\d+).*/\1/g'`; if [ $vpnip = $oldip ]; then return 1; fi return 0; } if ! vpnup; then if newvpngateway; then ifdown $vpndev > /dev/null 2>&1; if [ ! `echo $vpnip | grep '^[0-9]\{1,3\}.[0-9]\{1,3\}.[0-9] \{1,3\}.[0-9]\{1,3\}$' > /dev/null 2>&1; echo $?` == 0 ]; then exit; fi perl -i -pe 's/.*$ENV{'oldip'}.*\n//g' $racoondir/racoon.conf; rm -f $racoondir/$oldip.conf > /dev/null 2>&1; perl -i -pe 's/$ENV{'oldip'}/$ENV{'vpnip'}/g' $vpncfg; ifup $vpndev > /dev/null 2>&1; fi fi 5. System normally locks up with a kernel panic shortly after performing a key exchange (old keys expired)... Actual results: Log entries showing kernel panic: Jul 27 17:24:32 unix-01 racoon: INFO: isakmp.c:1387:isakmp_open(): 10.0.0.1 [500] used as isakmp port (fd=8) Jul 27 17:24:32 unix-01 racoon: INFO: isakmp.c:1387:isakmp_open(): 192.168.4.1[500] used as isakmp port (fd=9) Jul 27 17:24:32 unix-01 racoon: INFO: isakmp.c:1387:isakmp_open(): 127.0.0.1 [500] used as isakmp port (fd=10) Jul 27 17:24:35 unix-01 kernel: KERNEL: assertion (x->km.state == XFRM_STATE_DEAD) failed at xfrm_state.c(193) Jul 27 17:24:35 unix-01 kernel: KERNEL: assertion (x->km.state == XFRM_STATE_DEAD) failed at xfrm_state.c(193) Jul 27 17:24:35 unix-01 kernel: ------------[ cut here ]------------ Jul 27 17:24:35 unix-01 kernel: kernel BUG at xfrm_state.c:54! Jul 27 17:24:35 unix-01 kernel: invalid operand: 0000 Jul 27 17:24:35 unix-01 kernel: esp4 ah4 cls_u32 sch_sfq sch_cbq ipt_TOS ipt_limit ip_nat_irc ppp_synctty ppp_async ppp_generic slhc ipt_state ipt_owner ipt_REDIRECT ipt_REJECT ipt_LOG iptab And another: Jul 30 05:59:07 unix-01 kernel: KERNEL: assertion (x->km.state == XFRM_STATE_DEAD) failed at xfrm_state.c(193) <nothing else logged> Yet another: syslog: Aug 24 18:15:00 unix-01 modprobe: modprobe: Can't locate module ripemd160 Aug 24 18:15:00 unix-01 modprobe: modprobe: Can't locate module cast128 Aug 24 18:15:00 unix-01 modprobe: modprobe: Can't locate module lzs Aug 24 18:15:01 unix-01 modprobe: modprobe: Can't locate module lzjh Aug 24 18:15:01 unix-01 kernel: KERNEL: assertion (x->km.state == XFRM_STATE_DEAD) failed at xfrm_state.c(193) Aug 25 08:00:12 unix-01 syslogd 1.4.1: restart. Aug 25 08:00:12 unix-01 syslog: syslogd startup succeeded Aug 25 08:00:12 unix-01 kernel: klogd 1.4.1, log source = /proc/kmsg started. Screen: Kernel bug at xfrm_state.c:54! invalid operand : 0000 ide_cd cdrom esp4 ah4 cls_u32 sch_sfq sch_cbq ipt_TOS (did not finish writing all of these there where a couple more) CPU1 EIP: 0060 [<c028b15a>] Not tained EFLAGS:0010202 EIP is at xfrm_state_gc destroy [KERNEL] 0x1a (2.4.21-32.0.1 Elmp /i686) (Then there where a whole bunch of numbers) Kernel panic: Fatal exception Yet another: Aug 25 07:46:14 unix-01 pppd[3077]: LCP terminated by peer Aug 25 07:46:14 unix-01 pppoe[3078]: Session 4481 terminated -- received PADT from peer Aug 25 07:46:14 unix-01 pppoe[3078]: Sent PADT Aug 25 07:46:14 unix-01 pppd[3077]: Modem hangup Aug 25 07:46:14 unix-01 pppd[3077]: Connection terminated. Aug 25 07:46:14 unix-01 pppd[3077]: Connect time 1440.2 minutes. Aug 25 07:46:14 unix-01 pppd[3077]: Sent 112961249 bytes, received 345804065 bytes. Aug 25 07:46:14 unix-01 pppd[3077]: Exit. Aug 25 07:46:14 unix-01 adsl-connect: ADSL connection lost; attempting re- connection. Aug 25 07:46:14 unix-01 /etc/hotplug/net.agent: NET unregister event not supported Aug 25 07:46:18 unix-01 kernel: KERNEL: assertion (x->km.state == XFRM_STATE_DEAD) failed at xfrm_state.c(193) Aug 25 08:14:43 unix-01 syslogd 1.4.1: restart. Aug 25 08:14:43 unix-01 syslog: syslogd startup succeeded Aug 25 08:14:43 unix-01 kernel: klogd 1.4.1, log source = /proc/kmsg started. And another: Sep 16 00:39:09 unix-01 kernel: Cleanup-rule:IN=ppp0 OUT= MAC= SRC=165.165.244.16 DST=165.165.168.214 LEN=48 TOS=0x00 PREC=0x00 TTL=126 ID=36475 DF PROTO=TCP SPT=3362 DPT=445 WINDOW=32000 RES=0x00 SYN URGP=0 Sep 16 00:39:12 unix-01 kernel: Cleanup-rule:IN=ppp0 OUT= MAC= SRC=165.165.244.16 DST=165.165.168.214 LEN=48 TOS=0x00 PREC=0x00 TTL=126 ID=36822 DF PROTO=TCP SPT=3362 DPT=445 WINDOW=32000 RES=0x00 SYN URGP=0 Sep 16 00:39:17 unix-01 mgetty[21962]: init chat failed, exiting...: Interrupted system call Sep 16 00:39:17 unix-01 mgetty[21962]: failed in mg_init_data, dev=ttyS0, pid=21962 Sep 16 00:39:28 unix-01 racoon: INFO: IPsec-SA expired: AH/Tunnel 196.25.242.202->165.165.168.214 spi=204036047(0xc2957cf) Sep 16 00:39:28 unix-01 racoon: INFO: IPsec-SA expired: ESP/Tunnel 196.25.242.202->165.165.168.214 spi=63135470(0x3c35eee) Sep 16 00:39:28 unix-01 racoon: INFO: IPsec-SA expired: AH/Tunnel 165.165.168.214->196.25.242.202 spi=153459586(0x9259b82) Sep 16 08:27:16 unix-01 syslogd 1.4.1: restart. Sep 16 08:27:16 unix-01 syslog: syslogd startup succeeded Sep 16 08:27:16 unix-01 kernel: klogd 1.4.1, log source = /proc/kmsg started. Another: Sep 15 05:17:45 unix-01 kernel: Cleanup-rule:IN=ppp0 OUT= MAC= SRC=165.165.36.205 DST=165.165.209.187 LEN=48 TOS=0x00 PREC=0x00 TTL=126 ID=46845 DF PROTO=TCP SPT=3526 DPT=445 WINDOW=16384 RES=0x00 SYN URGP=0 Sep 15 05:17:48 unix-01 kernel: Cleanup-rule:IN=ppp0 OUT= MAC= SRC=165.165.36.205 DST=165.165.209.187 LEN=48 TOS=0x00 PREC=0x00 TTL=126 ID=47176 DF PROTO=TCP SPT=3526 DPT=445 WINDOW=16384 RES=0x00 SYN URGP=0 Sep 15 05:17:53 unix-01 kernel: Cleanup-rule:IN=eth0 OUT= MAC=ff:ff:ff:ff:ff:ff:00:c0:9f:34:7a:99:08:00 SRC=192.168.12.2 DST=192.168.15.255 LEN=239 TOS=0x00 PREC=0x00 TTL=128 ID=33994 PROTO=UDP SPT=138 DPT=138 LEN=219 Sep 15 05:17:56 unix-01 racoon: INFO: IPsec-SA expired: AH/Tunnel 196.41.206.218->165.165.209.187 spi=119918304(0x725cee0) Sep 15 05:17:56 unix-01 racoon: INFO: IPsec-SA expired: ESP/Tunnel 196.41.206.218->165.165.209.187 spi=40383847(0x2683567) Sep 15 05:17:57 unix-01 racoon: INFO: IPsec-SA expired: AH/Tunnel 165.165.209.187->196.41.206.218 spi=19213497(0x1252cb9) Sep 15 05:17:57 unix-01 kernel: KERNEL: assertion (x->km.state == XFRM_STATE_DEAD) failed at xfrm_state.c(193) Sep 15 07:09:42 unix-01 syslogd 1.4.1: restart. Sep 15 07:09:42 unix-01 syslog: syslogd startup succeeded Sep 15 07:09:42 unix-01 kernel: klogd 1.4.1, log source = /proc/kmsg started. Expected results: Should remain up/operational... Additional info: NB: The same system works flawlessly when we switch the connection over to a fractional T1 link (diginet) instead of using PPPoE (ADSL)... These bugzilla cases sound similar: Bugzilla 151044 - Code to different to compare directly https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=151044) Bugzilla 118885 - Appears to have been patched already https://bugzilla.redhat.com/bugzilla/long_list.cgi?buglist=118885 Sounds relevant to the followingposting: Kernel Bug posting http://www.uwsg.indiana.edu/hypermail/linux/net/0307.3/0030.html IPSec deadlock: http://lists.openswan.org/pipermail/users/2005-April/004540.html Previously entered as: Bugzilla 164730 - x509 Certificate based IPSec VPN tunnels cause Kernel panic https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=164730 Bugzilla 166531 - IPSec VPN Tunnels cause kernel panic when run over PPPoE (ADSL) https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=166531
Assigning to kernel; the kernel should not panic, regardless of ipsec-tools.
DavidH, how is this bug report different from bug 166531?
No difference really, I had hoped to have found a fix for bug 166531 as things evolved and simply wanted to post a clean start. Appologies about the duplication, I can understand it may actually make it harder to resolve in the long run. We've moved to RHEL4 which doesn't exhibit this problem but at least we found two work-arounds for this problem: 1. Install certain packages from RHEL4 to get 2.6.9 kernel running: Bug 166531 (https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=166531) 2. Set machine to automatically restart when it panics: echo "10" > /proc/sys/kernel/panic
This bug is filed against RHEL 3, which is in maintenance phase. During the maintenance phase, only security errata and select mission critical bug fixes will be released for enterprise products. Since this bug does not meet that criteria, it is now being closed. For more information of the RHEL errata support policy, please visit: http://www.redhat.com/security/updates/errata/ If you feel this bug is indeed mission critical, please contact your support representative. You may be asked to provide detailed information on how this bug is affecting you.