Bug 116494 - Kernel panics for fatal exception in interrupt when using ipsec
Summary: Kernel panics for fatal exception in interrupt when using ipsec
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: athlon
OS: Linux
medium
high
Target Milestone: ---
Assignee: David Miller
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-02-21 19:31 UTC by Kimmo Koivisto
Modified: 2007-11-30 22:10 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-03-08 19:11:29 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
screen shot from kernel panic from 2.6.3 (92.49 KB, image/jpeg)
2004-02-21 19:35 UTC, Kimmo Koivisto
no flags Details
screen shot from kernel panic 2.6.1-1.newer-than-37 (90.27 KB, image/jpeg)
2004-02-21 19:38 UTC, Kimmo Koivisto
no flags Details
rpm -qa (12.56 KB, text/plain)
2004-02-21 19:39 UTC, Kimmo Koivisto
no flags Details
Full oops from serial console (2.09 KB, text/plain)
2004-02-22 11:56 UTC, Kimmo Koivisto
no flags Details
oops with lucent wlan and ipsec (2.07 KB, text/plain)
2004-02-22 12:19 UTC, Kimmo Koivisto
no flags Details
oops with via rhine eth and ipsec (2.01 KB, text/plain)
2004-03-07 13:39 UTC, Kimmo Koivisto
no flags Details

Description Kimmo Koivisto 2004-02-21 19:31:42 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (compatible; Konqueror/3.2; fi, fi_FI@euro, fi_FI, fi_FI.UTF-8) (KHTML, like Gecko)

Description of problem:
Kernel panics after some time (from 30 seconds to couple of minutes) when using network (for example "wget something-big"). 

Fedora Core 1, some parts from fc1 development, some from test1.
Athlon 2100+, Motherboard with VIA chipset, integrated audio and LAN
Prism2 based PCMCIA WLAN
Buffalo PCMCIA/PCI adapter
Nova-t pci dvb-t tv-card
Bt878 based pci analog tv-card
IP: 192.168.2.2 (IPSec gateway is 192.168.2.1)

I have a WLAN and I'm using ipsec to protect all traffic. I think that these panics could relate to the ipsec but I haven't tested this enough to be sure. 
I've tested this with IBM T23 laptop (prism2-wlan) with FC1 and kernel-2.6.1-newer-than-37 with the same results.

Screenshots from panics, rpm -qa and other system files attached.

Version-Release number of selected component (if applicable):
kernel-2.6.1-1.37, kernel-2.6.3-1.91, ipsec-tools-0.2.2-8

How reproducible:
Always

Steps to Reproduce:
1. Configure ipsec to encrypt all
2. Make some traffic

    

Actual Results:  Kernel panic - atal exception in interrupt

Additional info:

Comment 1 Kimmo Koivisto 2004-02-21 19:35:33 UTC
Created attachment 97914 [details]
screen shot from kernel panic from 2.6.3

Comment 2 Kimmo Koivisto 2004-02-21 19:38:46 UTC
Created attachment 97915 [details]
screen shot from kernel panic 2.6.1-1.newer-than-37

Comment 3 Kimmo Koivisto 2004-02-21 19:39:14 UTC
Created attachment 97916 [details]
rpm -qa

Comment 4 David Miller 2004-02-21 20:13:59 UTC
And if you do such a transfer without using IPSEC, it works just
fine right?

Unfortunately, the top of the OOPS log scrolled off the screen by the
time you took the screenshots, so the most important information is not
there.  If there is any chance to get the rest of the OOPS that would
help a lot, maybe even by making use of serial console.

Also, if you can try with something other than a prism2 card, or over
ethernet, that would eliminate the prism2 card driver as a culprit as
well.

The more we narrow this down, the better chance it will get fixed.


Comment 5 Kimmo Koivisto 2004-02-22 11:54:13 UTC
I'm not 100% sure about the IPSec, I made some tests earlier (month ago) 
but dont remember all the details. 
 
I found the null modem cable and was able to get the full oops with serial 
console, I'll attach it to the bug.  
 
I'll try with Lucent WLAN card and report if it made any difference. 
 
 

Comment 6 Kimmo Koivisto 2004-02-22 11:56:33 UTC
Created attachment 97923 [details]
Full oops from serial console

Comment 7 Kimmo Koivisto 2004-02-22 12:18:07 UTC
And the Lucent WLAN card was no exception, same results (it's using the 
same orinoco driver... maybe I should try other cards). 
 
Oops with lucent card is attached.  
 
To be more exact, I'm using only ESP with AES128/SHA1, not AH at all. 
 

Comment 8 Kimmo Koivisto 2004-02-22 12:19:20 UTC
Created attachment 97924 [details]
oops with lucent wlan and ipsec

Comment 9 David Miller 2004-02-29 05:41:48 UTC
Both those cards use the orinoco driver, so we're not yet at the point
where the orinoco driver is not suspect.  I somehow think it is since I've
seen people using this kind of setup successfully without crashes over
other drivers.

Could you please try this over normal ethernet?


Comment 10 Christopher Johnson 2004-03-06 14:13:00 UTC
Here's your normal ethernet case.

I have recently been testing ipsec on 2.6 and hit this problem as
well. In my case it occured on an athlon server with ethernet, as well
as athlon laptop with wireless.

kernel-2.6.3-1.91
ipsec-tools-0.2.2-8

This is a home test lab, so I don't have all kinds of resources but I
do have flexibility.  I can attest that with ipsec turned down on both
systems that kernel panics do not occur, and with corresponding ipsec
interfaces up in transport mode so all traffic is encrypted, a kernel
panic does occur after a brief time.

The ethernet interface on server uses r8169 module (RealTek RTL8169
Gigabit Ethernet).
The wireless interface on laptop uses orinoco and orinoco_cs modules
(Intersil PRISM2 11 Mbps Wireless Adapter).

I don't have a serial console configured, but I wrote down a few
messages from a panic on the server:
kernel/sched.c:1799:spin_lock(kernel/sched.c:c035f140) already locked
by kernel/sched.c:1799
(several lines of the same)
kenrel/sched.c:291:spin_lock(kernel/sched.c:c04177e0) already locked
by kernel/sched.c:1634
(several lines of the same)
Kernel panic: Fatal exception in interrupt


Comment 11 Kimmo Koivisto 2004-03-07 13:38:49 UTC
I now managed to test this with wired cards and ipsec, it still panics.  
I used Via Rhine 2 (VIA Technologies, Inc. VT6102 [Rhine-II] (rev 74)) and 
exactly the same config. 
 
kernel-2.6.1-1.37 seems to work okay like it did with wireless cards, but 
kernel-2.6.3-1.91 panics. Capture from panic attached, panic looks quite the 
same that with wireless cards.  

Comment 12 Kimmo Koivisto 2004-03-07 13:39:59 UTC
Created attachment 98357 [details]
oops with via rhine eth and ipsec

Comment 13 Christopher Johnson 2004-03-07 19:30:40 UTC
Check out https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=117171

Per the posting on bug 117171 this appears to be fixed as of
kernel-2.6.3-2.1.238.

I updated to kernel-2.6.3-2.1.240 and have been testing for 5 hours
with no panic. To stress it a little I copied the .240 kernel rpm,
achieving ~3.8Mbps effective transfer rate over wireless connection
encrypted by transfer mode ipsec.  No panic.

Comment 14 Kimmo Koivisto 2004-03-07 21:13:38 UTC
Yes, now it seems to work ok with kernel-2.6.3-2.1.242. 
I have now tested this more than an hour without any problems. With 
2.6.3-1.91 it took less than minute to oops, so I think but is fixed.  
 
Any idea where the bug were, what was fixed? 
 
I noticed that tcpdump works quite differently now with 2.6.3-2.1.242. 
tcpdump used to look like this: 
192.168.2.2 > 192.168.2.1: ESP(spi=0x71412363,seq=0x217e) (DF) 
192.168.2.2 > 192.168.2.1: ESP(spi=0x71412363,seq=0x217f) (DF) 
192.168.2.1 > 192.168.2.2: ESP(spi=0x05989fc0,seq=0x211e) 
truncated-ip - 24 bytes missing! 192.168.2.1 > 192.168.2.2: truncated-ip - 
40764 bytes missing! 240.4.249.206 > 192.168.2.1: udp (frag 
17664:40876@672) [tos 0x98]  (ipip-proto-4) 
192.168.2.2 > 192.168.2.1: ESP(spi=0x71412363,seq=0x2180) (DF) 
192.168.2.1 > 192.168.2.2: ESP(spi=0x05989fc0,seq=0x211e) 
192.168.2.2 > 192.168.2.1: ESP(spi=0x71412363,seq=0x2181) (DF) 
 
but now the tcpdump looks like this: 
192.168.2.1 > 192.168.2.2: ESP(spi=0x04a44fad,seq=0x9c4) 
192.168.123.123 > 192.168.2.2: icmp 9: echo request seq 35841 
192.168.2.2 > 192.168.2.1: ESP(spi=0x3b2cdf1f,seq=0x989) 
192.168.2.1 > 192.168.2.2: ESP(spi=0x04a44fad,seq=0x9c5) 
192.168.123.123 > 192.168.2.2: icmp 9: echo request seq 36097 
192.168.2.2 > 192.168.2.1: ESP(spi=0x3b2cdf1f,seq=0x98a) 
 
I can see some of the traffic as plain text, but all the traffic is encrypted (I 
verified this with external sniffer, everything was ok). 

Comment 15 David Miller 2004-03-08 19:11:29 UTC
Traffic over tunnels is seen twice, and tcpdump is able to see the
traffic in both instances.  In the first case, pre-tunnel, the traffic
is not encrypted yet.  In the second case, after going into the tunnel,
the traffic is encrypted.

ANyways, I'm closing this bug now that it is fixed and no I have
no idea what fixed it, probably some random change that occurred
in 2.6.x development.



Note You need to log in before you can comment on or make changes to this bug.