From Bugzilla Helper: User-Agent: Mozilla/5.0 (compatible; Konqueror/3.2; fi, fi_FI@euro, fi_FI, fi_FI.UTF-8) (KHTML, like Gecko) Description of problem: Kernel panics after some time (from 30 seconds to couple of minutes) when using network (for example "wget something-big"). Fedora Core 1, some parts from fc1 development, some from test1. Athlon 2100+, Motherboard with VIA chipset, integrated audio and LAN Prism2 based PCMCIA WLAN Buffalo PCMCIA/PCI adapter Nova-t pci dvb-t tv-card Bt878 based pci analog tv-card IP: 192.168.2.2 (IPSec gateway is 192.168.2.1) I have a WLAN and I'm using ipsec to protect all traffic. I think that these panics could relate to the ipsec but I haven't tested this enough to be sure. I've tested this with IBM T23 laptop (prism2-wlan) with FC1 and kernel-2.6.1-newer-than-37 with the same results. Screenshots from panics, rpm -qa and other system files attached. Version-Release number of selected component (if applicable): kernel-2.6.1-1.37, kernel-2.6.3-1.91, ipsec-tools-0.2.2-8 How reproducible: Always Steps to Reproduce: 1. Configure ipsec to encrypt all 2. Make some traffic Actual Results: Kernel panic - atal exception in interrupt Additional info:
Created attachment 97914 [details] screen shot from kernel panic from 2.6.3
Created attachment 97915 [details] screen shot from kernel panic 2.6.1-1.newer-than-37
Created attachment 97916 [details] rpm -qa
And if you do such a transfer without using IPSEC, it works just fine right? Unfortunately, the top of the OOPS log scrolled off the screen by the time you took the screenshots, so the most important information is not there. If there is any chance to get the rest of the OOPS that would help a lot, maybe even by making use of serial console. Also, if you can try with something other than a prism2 card, or over ethernet, that would eliminate the prism2 card driver as a culprit as well. The more we narrow this down, the better chance it will get fixed.
I'm not 100% sure about the IPSec, I made some tests earlier (month ago) but dont remember all the details. I found the null modem cable and was able to get the full oops with serial console, I'll attach it to the bug. I'll try with Lucent WLAN card and report if it made any difference.
Created attachment 97923 [details] Full oops from serial console
And the Lucent WLAN card was no exception, same results (it's using the same orinoco driver... maybe I should try other cards). Oops with lucent card is attached. To be more exact, I'm using only ESP with AES128/SHA1, not AH at all.
Created attachment 97924 [details] oops with lucent wlan and ipsec
Both those cards use the orinoco driver, so we're not yet at the point where the orinoco driver is not suspect. I somehow think it is since I've seen people using this kind of setup successfully without crashes over other drivers. Could you please try this over normal ethernet?
Here's your normal ethernet case. I have recently been testing ipsec on 2.6 and hit this problem as well. In my case it occured on an athlon server with ethernet, as well as athlon laptop with wireless. kernel-2.6.3-1.91 ipsec-tools-0.2.2-8 This is a home test lab, so I don't have all kinds of resources but I do have flexibility. I can attest that with ipsec turned down on both systems that kernel panics do not occur, and with corresponding ipsec interfaces up in transport mode so all traffic is encrypted, a kernel panic does occur after a brief time. The ethernet interface on server uses r8169 module (RealTek RTL8169 Gigabit Ethernet). The wireless interface on laptop uses orinoco and orinoco_cs modules (Intersil PRISM2 11 Mbps Wireless Adapter). I don't have a serial console configured, but I wrote down a few messages from a panic on the server: kernel/sched.c:1799:spin_lock(kernel/sched.c:c035f140) already locked by kernel/sched.c:1799 (several lines of the same) kenrel/sched.c:291:spin_lock(kernel/sched.c:c04177e0) already locked by kernel/sched.c:1634 (several lines of the same) Kernel panic: Fatal exception in interrupt
I now managed to test this with wired cards and ipsec, it still panics. I used Via Rhine 2 (VIA Technologies, Inc. VT6102 [Rhine-II] (rev 74)) and exactly the same config. kernel-2.6.1-1.37 seems to work okay like it did with wireless cards, but kernel-2.6.3-1.91 panics. Capture from panic attached, panic looks quite the same that with wireless cards.
Created attachment 98357 [details] oops with via rhine eth and ipsec
Check out https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=117171 Per the posting on bug 117171 this appears to be fixed as of kernel-2.6.3-2.1.238. I updated to kernel-2.6.3-2.1.240 and have been testing for 5 hours with no panic. To stress it a little I copied the .240 kernel rpm, achieving ~3.8Mbps effective transfer rate over wireless connection encrypted by transfer mode ipsec. No panic.
Yes, now it seems to work ok with kernel-2.6.3-2.1.242. I have now tested this more than an hour without any problems. With 2.6.3-1.91 it took less than minute to oops, so I think but is fixed. Any idea where the bug were, what was fixed? I noticed that tcpdump works quite differently now with 2.6.3-2.1.242. tcpdump used to look like this: 192.168.2.2 > 192.168.2.1: ESP(spi=0x71412363,seq=0x217e) (DF) 192.168.2.2 > 192.168.2.1: ESP(spi=0x71412363,seq=0x217f) (DF) 192.168.2.1 > 192.168.2.2: ESP(spi=0x05989fc0,seq=0x211e) truncated-ip - 24 bytes missing! 192.168.2.1 > 192.168.2.2: truncated-ip - 40764 bytes missing! 240.4.249.206 > 192.168.2.1: udp (frag 17664:40876@672) [tos 0x98] (ipip-proto-4) 192.168.2.2 > 192.168.2.1: ESP(spi=0x71412363,seq=0x2180) (DF) 192.168.2.1 > 192.168.2.2: ESP(spi=0x05989fc0,seq=0x211e) 192.168.2.2 > 192.168.2.1: ESP(spi=0x71412363,seq=0x2181) (DF) but now the tcpdump looks like this: 192.168.2.1 > 192.168.2.2: ESP(spi=0x04a44fad,seq=0x9c4) 192.168.123.123 > 192.168.2.2: icmp 9: echo request seq 35841 192.168.2.2 > 192.168.2.1: ESP(spi=0x3b2cdf1f,seq=0x989) 192.168.2.1 > 192.168.2.2: ESP(spi=0x04a44fad,seq=0x9c5) 192.168.123.123 > 192.168.2.2: icmp 9: echo request seq 36097 192.168.2.2 > 192.168.2.1: ESP(spi=0x3b2cdf1f,seq=0x98a) I can see some of the traffic as plain text, but all the traffic is encrypted (I verified this with external sniffer, everything was ok).
Traffic over tunnels is seen twice, and tcpdump is able to see the traffic in both instances. In the first case, pre-tunnel, the traffic is not encrypted yet. In the second case, after going into the tunnel, the traffic is encrypted. ANyways, I'm closing this bug now that it is fixed and no I have no idea what fixed it, probably some random change that occurred in 2.6.x development.