Bug 135432

Summary: Kernel panic in ipv6_rcv
Product: [Fedora] Fedora Reporter: Seth Nickell <snickell>
Component: kernelAssignee: David Miller <davem>
Status: CLOSED WORKSFORME QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 5CC: bclark, davem, dcbw, dff, jgarzik, jturner, kotte, linville, marius.andreiana, tburke, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-03-17 15:27:21 EST Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Bug Depends On:    
Bug Blocks: 130887, 131589    
Attachments:
Description Flags
Screen shot showing the kernel panic
none
kernel panic on 640 none

Description Seth Nickell 2004-10-12 13:45:11 EDT
Attached screen shot shows the panic. Its triggered by enabling
NetworkManager on at least two laptops (IBM T41 and a Latitude LS)
with a variety of wireless cards (Cisco Aironet mini-PCI & PCMCIA, and
Prism 54 cardbus). We have no idea if its related to wireless or not,
though that's going to be where NetworkManager exercises the kernel
most heavily.

Its present in at least kernel builds 541 & 585, though occurs *far*
more frequently with 541 (usually within several minutes of starting
NetworkManager).

This seriously hampers our ability to deploy NetworkManager for RHEL4
(and, for example, has made us unable to safely demo NetworkManager
for fear the kernel will die and make our product look terribly unstable).
Comment 1 Seth Nickell 2004-10-12 13:46:40 EDT
Created attachment 105084 [details]
Screen shot showing the kernel panic
Comment 2 Dan Williams 2004-10-12 15:23:28 EDT
Was also triggered just by using iwconfig on the Latitidue LS with the
prism54 card.  So its not just a NetworkManager problem, though
NetworkManager might expose the issue more due to the fact that it can
make the ioctl() and iwlib calls faster than a human can on the
command line.
Comment 3 Seth Nickell 2004-10-15 12:42:43 EDT
Any other information we can provide to help track this down?
Comment 4 Bill Nottingham 2004-10-20 01:17:25 EDT
Presuming it persists on .639?
Comment 5 Dan Williams 2004-10-21 15:33:34 EDT
Persists in .640
Comment 6 David Miller 2004-10-21 15:45:17 EDT
After a cursory scan of net/core/wireless.c I'd say that
get_spydata() looks very suspicious.  If the driver has
a NULL dev->wireless_data and dev->wireless_handlers->spy_offset
is some random value, this will spam the driver private data
in indeterminate ways during iwconfig calls.

Which exact iwconfig commands are you invoking?  This may help
us track this down.

In general, the wireless kernel APIs are a complete mess and
much more complicated than they should be IMHO.
Comment 7 Bryan W Clark 2004-10-21 16:13:56 EDT
Created attachment 105607 [details]
kernel panic on 640
Comment 8 Dan Williams 2004-10-21 16:17:32 EDT
Any and all of the iwconfig commands.  NetworkManager (I've also had
the crash using iwconfig) uses wireless ioctl() commands quite a bit.
 Its no particular command, just usually freezes at various points
after you've been exercising the wireless driver (ie, taking it down,
setting essid and key, bringing it up, etc).  You're unlinkely to
reproduce this with iwconfig since you can't physically type the
commands fast enough.
Comment 9 Bryan W Clark 2004-10-28 14:56:16 EDT
still happening on 643 kernels
Comment 10 David Miller 2004-11-02 17:57:45 EST
Bryan, can you check against 648 please?  There are a bunch
of ipv6 device and route refcounting fixes in there which
might cure this.

The bug seems to have to do with downing and upping interfaces
which ipv6 runs over, do the NetworkManager scripts up and
down interfaces quite a lot?  If so, then the whole deal with
the wireless ioctl() was a total red herring and has no bearing
upon this bug report.
Comment 11 Dan Williams 2004-11-03 09:38:16 EST
Yes, NM used to up/down the interface quite a bit, for example
whenever it would connect to another access point, bring the interface
down, set WEP key and essid and mode, then bring it up.  While I still
think this is the technically correct approach, it seems to interfere
with drivers that load firmware onto the card, so taht every time the
interface goes up again a firmware hotplug is triggered, at least on
Prism54 cards.  Upstream CVS no longer up/downs the interface nearly
as much, and doesn't do it when switching wireless networks which
seemed to be the trigger for the panic.
Comment 12 Tim Burke 2004-11-03 17:42:58 EST
So is this a bug to pursue from the kernel space, or something to work-around
the behavior at user level?
Comment 13 David Miller 2004-11-03 18:40:32 EST
It's a kernel side bug, we just need Bryan to,
as I stated in comment #10, retest with the most up to
date kernel images to see if it's been fixed or not.

I believe that upstream fixes that occurred going into
648 may have fixed this.  There has been a lot of churn
in ipv6 lately.
Comment 14 Dan Williams 2004-11-04 07:47:24 EST
Well the bug here appears to be in kernel space since the thing
panics.  However, while we _don't_ know yet, I think NetworkManager
should trigger this bug less.  We'll have to see though.
Comment 15 Bryan W Clark 2004-11-04 14:59:01 EST
I'm having a hard time reproducing this one, I'm using the 667 kernel
now.  I played with it for a while this morning and was able to make
it hang once.  However I haven't been able to do it again and I didn't
get the kernel panic output from it.
Comment 17 Jay Turner 2004-12-06 07:42:05 EST
It's been over a month, so closing this out.  Please holler quickly and loudly
if this becomes an issue again.
Comment 18 Jon Orris 2005-01-03 16:37:11 EST
I can consistently reproduce this on my laptop with a prism54 card
using the 681 kernel. If NetworkManager is started as a service, the
laptop will lock up completely within minutes of booting.

Laptop is a Thinkpad T30 running FC3 updated as of 1/3/05.
Wireless card is a Netgear WG511. 

The following is what shows up in /var/log/messages before the system
freezes:

Jan  3 14:56:05 localhost NetworkManager:
nm_create_device_and_add_to_list(): adding device 'eth1' (wireless)
Jan  3 14:56:05 localhost NetworkManager:
nm_create_device_and_add_to_list(): adding device 'eth0' (wired)
Jan  3 14:56:05 localhost NetworkManager: AUTO: Best wired device = (null)
Jan  3 14:56:05 localhost NetworkManager: AUTO: Best wireless device =
eth1  ()
Jan  3 14:56:05 localhost NetworkManager:     SWITCH: best device changed
Jan  3 14:56:05 localhost NetworkManager:
nm_state_modification_monitor(): beginning activation for device 'eth1'
Jan  3 14:56:07 localhost NetworkManager: HAVELINK: act=0 &&
(dev_crypt=0 <= prev_crypt=0) 
Jan  3 14:56:07 localhost NetworkManager: HAVELINK: act=0 &&
(dev_crypt=0 <= prev_crypt=0) 
Jan  3 14:56:07 localhost NetworkManager: LINK: !HAVE=1, (best_ap=0x0
&& (is_enc=0 && (!source=1 || !len_source=0))) 
Jan  3 14:56:09 localhost NetworkManager: HAVELINK: act=0 &&
(dev_crypt=0 <= prev_crypt=0) 
Jan  3 14:56:09 localhost NetworkManager: HAVELINK: act=0 &&
(dev_crypt=0 <= prev_crypt=0) 
Jan  3 14:56:09 localhost NetworkManager: LINK: !HAVE=1, (best_ap=0x0
&& (is_enc=0 && (!source=1 || !len_source=0))) 
Jan  3 14:56:11 localhost NetworkManager: HAVELINK: act=0 &&
(dev_crypt=0 <= prev_crypt=0) 
Jan  3 14:56:11 localhost NetworkManager: HAVELINK: act=0 &&
(dev_crypt=0 <= prev_crypt=0) 
Jan  3 14:56:11 localhost NetworkManager: LINK: !HAVE=1, (best_ap=0x0
&& (is_enc=0 && (!source=1 || !len_source=0))) 
Jan  3 14:56:13 localhost kernel: eth1: timeout waiting for mgmt response
Jan  3 14:56:13 localhost NetworkManager: HAVELINK: act=0 &&
(dev_crypt=0 <= prev_crypt=0) 
Jan  3 14:56:13 localhost NetworkManager: HAVELINK: act=0 &&
(dev_crypt=0 <= prev_crypt=0) 
Jan  3 14:56:13 localhost NetworkManager: LINK: !HAVE=1, (best_ap=0x0
&& (is_enc=0 && (!source=1 || !len_source=0))) 
Jan  3 14:56:15 localhost NetworkManager: HAVELINK: act=0 &&
(dev_crypt=0 <= prev_crypt=0) 
Jan  3 14:56:15 localhost NetworkManager: HAVELINK: act=0 &&
(dev_crypt=0 <= prev_crypt=0) 
Jan  3 14:56:15 localhost NetworkManager: LINK: !HAVE=1, (best_ap=0x0
&& (is_enc=0 && (!source=1 || !len_source=0))) 
Jan  3 14:56:16 localhost kernel: ACPI: PCI interrupt 0000:01:00.0[A]
-> GSI 11 (level, low) -> IRQ 11

The only way I can get NetworkManager to work is:
 Disable NM service startup. 
 Start eth1 on boot.
 After logging in, start NetworkManager and  NetworkManagerInfo.
 Immediately select a network that NM finds. Delays in selecting the
network will cause the laptop to lock up.


Comment 20 John W. Linville 2005-01-04 10:18:37 EST
From comment 18, "Delays in selecting the network will cause the
laptop to lock up."  Interesting...?

Anyone (dcbw?) familiar enough w/ the workings of NetworkManager to
theorize on how its behaviour changes between before and after the
user selects a network?  Particular w.r.t. how it interacts w/ the kernel?
Comment 21 Jon Orris 2005-01-04 11:49:59 EST
Sometime yesterday after posting my comments, the 724 kernel became
available. I installed it on my laptop and NetworkManager no logner
appears to lock the machine. It has run for about 15 minutes without a
hang. I'm letting it run for a longer span to make sure.

However, wireless with NM no longer works at all. The log is full of
'eth1: timeout waiting for mgmt response' messages interspersed with
NM messages from comment 18. If I shut down the NM service, reinsert
the card & start networking normally, wireless works fine.

Comment 22 Jon Orris 2005-01-04 11:56:24 EST
Heh. I spoke too soon. The laptop still locks up running NM; this time
it did it in about 10 minutes.
Comment 23 John Devereaux 2005-01-10 10:54:34 EST
Started NetworkManagerInfo and I experienced FC3 locking up after
trying to connect to a WEP-enabled Access Point. The application was
attempting to connect then it locked up the system. I could not change
different consoles to reboot or terminate the PID. 
Comment 25 Dan Williams 2005-02-10 08:09:37 EST
davem:  I've seen the airo patch that sets all the data to NULL after releasing
it, but I'm unsure what version of the kernel that patch is in.  Is it likely
that other drivers may need to be patched in the same way?
Comment 26 Dave Jones 2005-10-05 23:07:50 EDT
Is this still occuring in current Fedora kernels ?
If its also occuring in RHEL4, this bug should be cloned, and appropriately
reclassified.
Comment 27 Dan Williams 2006-03-17 15:27:21 EST
seems to have gone away