Bug 168497 - Kernel Panic: IPSec VPN Tunnels on PPPoE with dynamic IPs
Summary: Kernel Panic: IPSec VPN Tunnels on PPPoE with dynamic IPs
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: David Miller
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-09-16 18:04 UTC by David Herselman
Modified: 2007-11-30 22:07 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-10-19 18:54:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description David Herselman 2005-09-16 18:04:54 UTC
Description of problem:
Establishing a network-to-network IPSec VPN connection between RHEL3 servers, 
when these are running on ADSL (PPPoE) connections, where the IP addresses 

change, causes a kernel panic.

Originaly logged in Bugzilla (<a 
href="https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=166531">#166531</a> 
and <a 

href="https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=164730">#164730</a>) 
as a kernel issue but reading through newsgroups/ipsec-tools development 

maillogs gives me the indication that this may have been filed incorrectly 
(kernel still shouldn't panic though, no?)...



Version-Release number of selected component (if applicable):
ipsec-tools-0.2.5-0.7
  Also tested with ipsec-tools-0.3.3-5.6 from 
http://people.redhat.com/notting/ipsec/



How reproducible:
Run a network-to-network IPSec tunnel between PPPoE servers, servers will 
usually lock up after +-2 days (IP changes daily)...



Steps to Reproduce:
1. Patch 'ifup-ipsec' and 'ifdown-ipsec' to provide for network-network tunnels
   Details in Bugzilla <a 
href="https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=150179">#150179</a>
2. Configure /etc/sysconfig/network-scripts/ifcfg-ipsec0:
   [-------------- /etc/sysconfig/network-scripts/ifcfg-ipsec0 --------------]
   TYPE=IPsec
     DEVICE=ipsec0
     ONBOOT=no
     IKE_METHOD=PSK
     IKE_PSK=654321
     SRCGW=192.168.4.1
     DSTGW=192.168.0.1
     SRCNET=192.168.4.0/22
     DSTNET=192.168.0.0/22
     DST=196.41.206.216
   [-------------- /etc/sysconfig/network-scripts/ifcfg-ipsec0 --------------]
3. Establish ADSL (PPPoE) connections on the server
4. Installed the following script to reconfigure ifcfg-ipsec0 when remote 
machine's IP changes
   #!/bin/sh
   
##############################################################################
   # Define Variables                                                           
#
   
         vpncfg='/etc/sysconfig/network-scripts/ifcfg-ipsec1';
   export vpnip=`dig vpn.gtct.co.za +tcp +short | tail -n 1`;
     vpncheckip='192.168.8.1';
   
      racoondir='/etc/racoon';
         vpndev=${vpncfg#/etc/sysconfig/network-scripts/};
   # Done                                                                       
#
   
##############################################################################
   
   vpnup () {
           local VAR;
           VAR=`ping -s 1 -c 1 $vpncheckip > /dev/null 2>&1; echo $?`;
           if [ $VAR -ne 0 ]; then
                   VAR=`ping -s 1 -c 5 $vpncheckip > /dev/null 2>&1; echo $?`;
           fi
           return $VAR;
   }
   
   newvpngateway () {
           export oldip=`fgrep 'DST=' $vpncfg | perl -pe 's/.*DST=
(\d+\.\d+\.\d+\.\d+).*/\1/g'`;
           if [ $vpnip = $oldip ]; then return 1; fi
           return 0;
   }
   
   if ! vpnup; then
           if newvpngateway; then
                   ifdown $vpndev > /dev/null 2>&1;
                   if [ ! `echo $vpnip | grep '^[0-9]\{1,3\}.[0-9]\{1,3\}.[0-9]
\{1,3\}.[0-9]\{1,3\}$' > /dev/null 2>&1; echo $?` == 0 ]; then exit; fi
                   perl -i -pe 's/.*$ENV{'oldip'}.*\n//g' 
$racoondir/racoon.conf;
                   rm -f $racoondir/$oldip.conf > /dev/null 2>&1;
                   perl -i -pe 's/$ENV{'oldip'}/$ENV{'vpnip'}/g' $vpncfg;
                   ifup $vpndev > /dev/null 2>&1;
           fi
   fi
5. System normally locks up with a kernel panic shortly after performing a key 
exchange (old keys expired)...



Actual results:
  Log entries showing kernel panic:
    Jul 27 17:24:32 unix-01 racoon: INFO: isakmp.c:1387:isakmp_open(): 10.0.0.1 
[500] used as isakmp port (fd=8)
    Jul 27 17:24:32 unix-01 racoon: INFO: isakmp.c:1387:isakmp_open(): 
192.168.4.1[500] used as isakmp port (fd=9)
    Jul 27 17:24:32 unix-01 racoon: INFO: isakmp.c:1387:isakmp_open(): 127.0.0.1
[500] used as isakmp port (fd=10)
    Jul 27 17:24:35 unix-01 kernel: KERNEL: assertion (x->km.state == 
XFRM_STATE_DEAD) failed at xfrm_state.c(193)
    Jul 27 17:24:35 unix-01 kernel: KERNEL: assertion (x->km.state == 
XFRM_STATE_DEAD) failed at xfrm_state.c(193)
    Jul 27 17:24:35 unix-01 kernel: ------------[ cut here ]------------
    Jul 27 17:24:35 unix-01 kernel: kernel BUG at xfrm_state.c:54!
    Jul 27 17:24:35 unix-01 kernel: invalid operand: 0000
    Jul 27 17:24:35 unix-01 kernel: esp4 ah4 cls_u32 sch_sfq sch_cbq ipt_TOS 
ipt_limit ip_nat_irc ppp_synctty ppp_async ppp_generic slhc ipt_state 

ipt_owner ipt_REDIRECT ipt_REJECT ipt_LOG iptab

  And another:
    Jul 30 05:59:07 unix-01 kernel: KERNEL: assertion (x->km.state == 
XFRM_STATE_DEAD) failed at xfrm_state.c(193)
    <nothing else logged>

  Yet another:
    syslog:
    Aug 24 18:15:00 unix-01 modprobe: modprobe: Can't locate module ripemd160
    Aug 24 18:15:00 unix-01 modprobe: modprobe: Can't locate module cast128
    Aug 24 18:15:00 unix-01 modprobe: modprobe: Can't locate module lzs
    Aug 24 18:15:01 unix-01 modprobe: modprobe: Can't locate module lzjh
    Aug 24 18:15:01 unix-01 kernel: KERNEL: assertion (x->km.state == 
XFRM_STATE_DEAD) failed at xfrm_state.c(193)
    Aug 25 08:00:12 unix-01 syslogd 1.4.1: restart.
    Aug 25 08:00:12 unix-01 syslog: syslogd startup succeeded
    Aug 25 08:00:12 unix-01 kernel: klogd 1.4.1, log source = /proc/kmsg 
started.
    
    Screen:
    Kernel bug at xfrm_state.c:54!
    invalid operand : 0000
    ide_cd cdrom esp4 ah4 cls_u32 sch_sfq sch_cbq ipt_TOS (did not finish
    writing all of these there where a couple more)
    
    CPU1
    EIP: 0060 [<c028b15a>] Not tained
    EFLAGS:0010202
    
    EIP is at xfrm_state_gc destroy [KERNEL] 0x1a (2.4.21-32.0.1 Elmp /i686)
    
    (Then there where a whole bunch of numbers)
    
    Kernel panic: Fatal exception


  Yet another:
    Aug 25 07:46:14 unix-01 pppd[3077]: LCP terminated by peer
    Aug 25 07:46:14 unix-01 pppoe[3078]: Session 4481 terminated -- received 
PADT from peer
    Aug 25 07:46:14 unix-01 pppoe[3078]: Sent PADT
    Aug 25 07:46:14 unix-01 pppd[3077]: Modem hangup
    Aug 25 07:46:14 unix-01 pppd[3077]: Connection terminated.
    Aug 25 07:46:14 unix-01 pppd[3077]: Connect time 1440.2 minutes.
    Aug 25 07:46:14 unix-01 pppd[3077]: Sent 112961249 bytes, received 
345804065 bytes.
    Aug 25 07:46:14 unix-01 pppd[3077]: Exit.
    Aug 25 07:46:14 unix-01 adsl-connect: ADSL connection lost; attempting re-
connection.
    Aug 25 07:46:14 unix-01 /etc/hotplug/net.agent: NET unregister event not 
supported
    Aug 25 07:46:18 unix-01 kernel: KERNEL: assertion (x->km.state == 
XFRM_STATE_DEAD) failed at xfrm_state.c(193)
    Aug 25 08:14:43 unix-01 syslogd 1.4.1: restart.
    Aug 25 08:14:43 unix-01 syslog: syslogd startup succeeded
    Aug 25 08:14:43 unix-01 kernel: klogd 1.4.1, log source = /proc/kmsg 
started.


  And another:
    Sep 16 00:39:09 unix-01 kernel: Cleanup-rule:IN=ppp0 OUT= MAC= 
SRC=165.165.244.16 DST=165.165.168.214 LEN=48 TOS=0x00 PREC=0x00 TTL=126 
ID=36475 DF 

PROTO=TCP SPT=3362 DPT=445 WINDOW=32000 RES=0x00 SYN URGP=0
    Sep 16 00:39:12 unix-01 kernel: Cleanup-rule:IN=ppp0 OUT= MAC= 
SRC=165.165.244.16 DST=165.165.168.214 LEN=48 TOS=0x00 PREC=0x00 TTL=126 
ID=36822 DF 

PROTO=TCP SPT=3362 DPT=445 WINDOW=32000 RES=0x00 SYN URGP=0
    Sep 16 00:39:17 unix-01 mgetty[21962]: init chat failed, exiting...: 
Interrupted system call
    Sep 16 00:39:17 unix-01 mgetty[21962]: failed in mg_init_data, dev=ttyS0, 
pid=21962
    Sep 16 00:39:28 unix-01 racoon: INFO: IPsec-SA expired: AH/Tunnel 
196.25.242.202->165.165.168.214 spi=204036047(0xc2957cf)
    Sep 16 00:39:28 unix-01 racoon: INFO: IPsec-SA expired: ESP/Tunnel 
196.25.242.202->165.165.168.214 spi=63135470(0x3c35eee)
    Sep 16 00:39:28 unix-01 racoon: INFO: IPsec-SA expired: AH/Tunnel 
165.165.168.214->196.25.242.202 spi=153459586(0x9259b82)
    Sep 16 08:27:16 unix-01 syslogd 1.4.1: restart.
    Sep 16 08:27:16 unix-01 syslog: syslogd startup succeeded
    Sep 16 08:27:16 unix-01 kernel: klogd 1.4.1, log source = /proc/kmsg 
started.


  Another:
    Sep 15 05:17:45 unix-01 kernel: Cleanup-rule:IN=ppp0 OUT= MAC= 
SRC=165.165.36.205 DST=165.165.209.187 LEN=48 TOS=0x00 PREC=0x00 TTL=126 
ID=46845 DF 

PROTO=TCP SPT=3526 DPT=445 WINDOW=16384 RES=0x00 SYN URGP=0
    Sep 15 05:17:48 unix-01 kernel: Cleanup-rule:IN=ppp0 OUT= MAC= 
SRC=165.165.36.205 DST=165.165.209.187 LEN=48 TOS=0x00 PREC=0x00 TTL=126 
ID=47176 DF 

PROTO=TCP SPT=3526 DPT=445 WINDOW=16384 RES=0x00 SYN URGP=0
    Sep 15 05:17:53 unix-01 kernel: Cleanup-rule:IN=eth0 OUT= 
MAC=ff:ff:ff:ff:ff:ff:00:c0:9f:34:7a:99:08:00 SRC=192.168.12.2 
DST=192.168.15.255 LEN=239 

TOS=0x00 PREC=0x00 TTL=128 ID=33994 PROTO=UDP SPT=138 DPT=138 LEN=219
    Sep 15 05:17:56 unix-01 racoon: INFO: IPsec-SA expired: AH/Tunnel 
196.41.206.218->165.165.209.187 spi=119918304(0x725cee0)
    Sep 15 05:17:56 unix-01 racoon: INFO: IPsec-SA expired: ESP/Tunnel 
196.41.206.218->165.165.209.187 spi=40383847(0x2683567)
    Sep 15 05:17:57 unix-01 racoon: INFO: IPsec-SA expired: AH/Tunnel 
165.165.209.187->196.41.206.218 spi=19213497(0x1252cb9)
    Sep 15 05:17:57 unix-01 kernel: KERNEL: assertion (x->km.state == 
XFRM_STATE_DEAD) failed at xfrm_state.c(193)
    Sep 15 07:09:42 unix-01 syslogd 1.4.1: restart.
    Sep 15 07:09:42 unix-01 syslog: syslogd startup succeeded
    Sep 15 07:09:42 unix-01 kernel: klogd 1.4.1, log source = /proc/kmsg 
started.



Expected results:
Should remain up/operational...



Additional info:
NB: The same system works flawlessly when we switch the connection over to a 
fractional T1 link (diginet) instead of using PPPoE (ADSL)...

These bugzilla cases sound similar:
 Bugzilla 151044 - Code to different to compare directly
    https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=151044)
 Bugzilla 118885 - Appears to have been patched already
    https://bugzilla.redhat.com/bugzilla/long_list.cgi?buglist=118885

Sounds relevant to the followingposting:
  Kernel Bug posting
    http://www.uwsg.indiana.edu/hypermail/linux/net/0307.3/0030.html
  IPSec deadlock:
    http://lists.openswan.org/pipermail/users/2005-April/004540.html

Previously entered as:
  Bugzilla 164730 - x509 Certificate based IPSec VPN tunnels cause Kernel panic
    https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=164730
  Bugzilla 166531 - IPSec VPN Tunnels cause kernel panic when run over PPPoE 
(ADSL)
    https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=166531

Comment 1 Bill Nottingham 2005-09-21 18:04:25 UTC
Assigning to kernel; the kernel should not panic, regardless of ipsec-tools.

Comment 2 Ernie Petrides 2005-09-21 20:06:50 UTC
DavidH, how is this bug report different from bug 166531?

Comment 3 David Herselman 2006-05-31 20:42:09 UTC
No difference really, I had hoped to have found a fix for bug 166531 as things 
evolved and simply wanted to post a clean start. Appologies about the 
duplication, I can understand it may actually make it harder to resolve in the 
long run.

We've moved to RHEL4 which doesn't exhibit this problem but at least we
found two work-arounds for this problem:
  1. Install certain packages from RHEL4 to get 2.6.9 kernel running:
       Bug 166531 (https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=166531)
  2. Set machine to automatically restart when it panics:
       echo "10" > /proc/sys/kernel/panic

Comment 4 RHEL Program Management 2007-10-19 18:54:15 UTC
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.


Note You need to log in before you can comment on or make changes to this bug.