Bug 166531 - IPSec VPN Tunnels cause kernel panic when run over PPPoE (ADSL)
IPSec VPN Tunnels cause kernel panic when run over PPPoE (ADSL)
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: David Miller
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-08-22 20:08 EDT by David Herselman
Modified: 2007-11-30 17:07 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-10-19 14:55:38 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description David Herselman 2005-08-22 20:08:33 EDT
Description of problem:


Version-Release number of selected component (if applicable): 2.4.21-32.0.1.EL


How reproducible: Always (can take up to 3 days)


Steps to Reproduce:
1. Configure network-network IPSec VPN tunnel over PPPoE (ADSL)
2. Does not require high utilisation
3. Wait for up to a maximum of 3 days (usually 1-2 days)
  
Actual results: Kernel panic
Log entries showing kernel panic:
Jul 27 17:24:32 unix-01 racoon: INFO: isakmp.c:1387:isakmp_open(): 10.0.0.1
[500] used as isakmp port (fd=8)
Jul 27 17:24:32 unix-01 racoon: INFO: isakmp.c:1387:isakmp_open(): 192.168.4.1
[500] used as isakmp port (fd=9)
Jul 27 17:24:32 unix-01 racoon: INFO: isakmp.c:1387:isakmp_open(): 127.0.0.1
[500] used as isakmp port (fd=10)
Jul 27 17:24:35 unix-01 kernel: KERNEL: assertion (x->km.state == 
XFRM_STATE_DEAD) failed at xfrm_state.c(193)
Jul 27 17:24:35 unix-01 kernel: KERNEL: assertion (x->km.state == 
XFRM_STATE_DEAD) failed at xfrm_state.c(193)
Jul 27 17:24:35 unix-01 kernel: ------------[ cut here ]------------
Jul 27 17:24:35 unix-01 kernel: kernel BUG at xfrm_state.c:54!
Jul 27 17:24:35 unix-01 kernel: invalid operand: 0000
Jul 27 17:24:35 unix-01 kernel: esp4 ah4 cls_u32 sch_sfq sch_cbq ipt_TOS 
ipt_limit ip_nat_irc ppp_synctty ppp_async ppp_generic slhc ipt_state ipt_owner 
ipt_REDIRECT ipt_REJECT ipt_LOG iptab


And another:
Jul 30 05:59:07 unix-01 kernel: KERNEL: assertion (x->km.state == 
XFRM_STATE_DEAD) failed at xfrm_state.c(193)
<nothing else logged>


Expected results:


Additional info:
Recompiled stock 2.4.21-32.0.1.EL RedHat kernel with the XFRM_State patch fro 
mthe 2.6.11.7 changelog but the system still locks up after the same amount of 
time (although the 'kernel BUG at xfrm_state.c:54' messages have dissapeared):
ICMP frag. IPSec deadlock: http://lists.openswan.org/pipermail/users/2005-
April/004540.html

Syslog from one of the machines affected by this problem:
Aug 18 02:45:18 unix-01 racoon: INFO: pfkey.c:1394:pk_recvexpire(): IPsec-SA 
expired: AH/Tunnel 196.25.242.202->165.146.30.88 spi=200953471(0xbfa4e7f)
Aug 18 02:45:18 unix-01 racoon: INFO: pfkey.c:1394:pk_recvexpire(): IPsec-SA 
expired: ESP/Tunnel 196.25.242.202->165.146.30.88 spi=847614(0xceefe)
Aug 18 02:45:18 unix-01 racoon: INFO: pfkey.c:1394:pk_recvexpire(): IPsec-SA 
expired: AH/Tunnel 165.146.30.88->196.25.242.202 spi=18140664(0x114cdf8)
Aug 18 08:07:37 unix-01 syslogd 1.4.1: restart.
Aug 18 08:07:37 unix-01 syslog: syslogd startup succeeded
Aug 18 08:07:37 unix-01 kernel: klogd 1.4.1, log source = /proc/kmsg started.


NB: The same system works flawlessly when we switch the connection over to a 
fractional T1 link (diginet) instead of using PPPoE (ADSL)...

These bugzilla cases sound similar:
 151044 - Code to different to compare directly
    https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=151044)
 118885 - Appears to have been patched already
    https://bugzilla.redhat.com/bugzilla/long_list.cgi?buglist=118885

Previously entered as Bugzilla 164730
  https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=164730
Comment 1 David Herselman 2005-08-25 02:37:08 EDT
Have tried to implement a work around whereby I run a cron job every 3 hours to 
restart the tunnels and the systems are staying up longer now but had 2 crash 
with the following this morning:

syslog:
Aug 24 18:15:00 unix-01 modprobe: modprobe: Can't locate module ripemd160
Aug 24 18:15:00 unix-01 modprobe: modprobe: Can't locate module cast128
Aug 24 18:15:00 unix-01 modprobe: modprobe: Can't locate module lzs
Aug 24 18:15:01 unix-01 modprobe: modprobe: Can't locate module lzjh
Aug 24 18:15:01 unix-01 kernel: KERNEL: assertion (x->km.state == 
XFRM_STATE_DEAD) failed at xfrm_state.c(193)
Aug 25 08:00:12 unix-01 syslogd 1.4.1: restart.
Aug 25 08:00:12 unix-01 syslog: syslogd startup succeeded
Aug 25 08:00:12 unix-01 kernel: klogd 1.4.1, log source = /proc/kmsg started.

Screen:
Kernel bug at xfrm_state.c:54!
invalid operand : 0000
ide_cd cdrom esp4 ah4 cls_u32 sch_sfq sch_cbq ipt_TOS (did not finish
writing all of these there where a couple more)

CPU1
EIP: 0060 [<c028b15a>] Not tained
EFLAGS:0010202

EIP is at xfrm_state_gc destroy [KERNEL] 0x1a (2.4.21-32.0.1 Elmp /i686)

(Then there where a whole bunch of numbers)

Kernel panic: Fatal exception



2nd system that crashed, also running IPSec network-to-network VPN over PPPoE:
Aug 25 07:46:14 unix-01 pppd[3077]: LCP terminated by peer
Aug 25 07:46:14 unix-01 pppoe[3078]: Session 4481 terminated -- received PADT 
from peer
Aug 25 07:46:14 unix-01 pppoe[3078]: Sent PADT
Aug 25 07:46:14 unix-01 pppd[3077]: Modem hangup
Aug 25 07:46:14 unix-01 pppd[3077]: Connection terminated.
Aug 25 07:46:14 unix-01 pppd[3077]: Connect time 1440.2 minutes.
Aug 25 07:46:14 unix-01 pppd[3077]: Sent 112961249 bytes, received 345804065 
bytes.
Aug 25 07:46:14 unix-01 pppd[3077]: Exit.
Aug 25 07:46:14 unix-01 adsl-connect: ADSL connection lost; attempting re-
connection.
Aug 25 07:46:14 unix-01 /etc/hotplug/net.agent: NET unregister event not 
supported
Aug 25 07:46:18 unix-01 kernel: KERNEL: assertion (x->km.state == 
XFRM_STATE_DEAD) failed at xfrm_state.c(193)
Aug 25 08:14:43 unix-01 syslogd 1.4.1: restart.
Aug 25 08:14:43 unix-01 syslog: syslogd startup succeeded
Aug 25 08:14:43 unix-01 kernel: klogd 1.4.1, log source = /proc/kmsg started.




Sounds extremely relevant to the following kernel Bug posting:
  http://www.uwsg.indiana.edu/hypermail/linux/net/0307.3/0030.html
Comment 2 David Herselman 2005-08-25 03:06:41 EDT
Herbert's patch from the above posting has already been patched to the current 
system's kernel... Again, this only affects systems running IPSec tunnels over 
PPPoE connections, we switched one of the servers on to its backup route 
(fractional T1 (diginet)) and it hasn't locked up once.
Comment 3 David Herselman 2005-08-29 03:02:47 EDT
Is there any additional information I can supply to assist with resolving this 
problem? We've setup a RHEL4 test server running the same config so we'll see 
if this is specific to RHEL3 shortly...

Item of concern is how many people are actually doing this (especially via 
dynamic IP PPPoE connections) due to:
  1. The ifup-ipsec and ifdown-ipsec scripts being broken for net-to-net VPNs
  2. Racoon missing an init script
  3. Having to hack together a simple script to handle the changing IPs which
     updates the 'DST=' entry in /etc/sysconfig/network-scripts/ifcfg-ipsec?

Comment 4 David Herselman 2005-08-29 03:08:19 EDT
Could this possibly have something to do with IP addresses changing when the 
PPPoE connections re-establish?
Comment 5 David Herselman 2005-09-25 11:41:17 EDT
No feedback from anyone out there and I was under pressure to get this 
resolved... Got it working by installing kernel 2.6 from RHEL4.1 on the RHEL3 
servers.

Required packages:
kernel-2.6.9-11.EL.i686.rpm
lvm2-2.01.08-1.0.RHEL4.i386.rpm
depend/device-mapper-1.01.01-1.RHEL4.i386.rpm
depend/glibc-2.3.4-2.9.i686.rpm
depend/glibc-common-2.3.4-2.9.i386.rpm
depend/ipsec-tools-0.3.3-6.i386.rpm
depend/l2tpd-0.69-12jdl.i386.rpm
depend/libselinux-1.19.1-8.i386.rpm
depend/mkinitrd-4.2.1.3-1.i386.rpm
depend/module-init-tools-3.1-0.pre5.3.i386.rp
depend/nscd-2.3.4-2.9.i386.rpm


Installed like this:
rpm -e piranha
rpm -Uvh --nodeps lvm2-2.01.08-1.0.RHEL4.i386.rpm
rpm -Uvh depend/*.rpm
rpm -ivh kernel-2.6.9-11.EL.i686.rpm
vi /etc/lilo.conf
lilo
Comment 6 David Herselman 2005-09-25 11:43:07 EDT
Didn't get to test the following patch from Bugzilla #168458:
  http://sourceforge.net/mailarchive/forum.php?thread_id=3866075&forum_id=32000
Comment 7 RHEL Product and Program Management 2007-10-19 14:55:38 EDT
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.

Note You need to log in before you can comment on or make changes to this bug.