Bug 440648 - system/kernel lockup on service NetworkManager start
Summary: system/kernel lockup on service NetworkManager start
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL: http://www.smolts.org/show?uuid=pub_7...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-04-04 12:05 UTC by Andrew Farris
Modified: 2008-04-30 02:29 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-04-10 23:27:30 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Andrew Farris 2008-04-04 12:05:33 UTC
Description of problem:
NetworkManager is making my system lockup immediately as I start the service. 
The system does not respond to any input or network traffic.

I had previously switched to using legacy configuration for my network devices
(two wired, one unused wireless) on this desktop machine.  I tried to return to
NM config but this lockup is making it impossible to test/workaround, and the
machine must be hard-reset each time (a plus is I'm giving my luks ext4
filesystem a good lashing, it recovers the journal each time).

Version-Release number of selected component (if applicable):
NetworkManager-0.7.0-0.9.1.svn3521.fc9.i386

I've tried all these kernels:
kernel-2.6.25-0.121.rc5.git4.fc9.i686
kernel-2.6.25-0.172.rc7.git4.fc9.i686
kernel-2.6.25-0.167.rc7.git2.fc9.i686
kernel-2.6.25-0.177.rc7.git6.fc9.i686
kernel-2.6.25-0.185.rc7.git6.fc9.i686

How reproducible:
A sure bet.

Additional info:

I've got kernel debuginfo installed, but absolutely nothing gets printed to
terminal after NetworkManager starts (I get the new root terminal prompt, then
the cursor stops blinking and thats it...).  I have NM turned off by chkconfig
and am manually requesting start after its fully booted.

I have the configuration scripts setup like this:
-> cat /etc/sysconfig/network-scripts/ifcfg-eth0
# ADMtek NC100 Network Everywhere Fast Ethernet 10/100
DEVICE=eth0
IPADDR=192.168.1.10
SUBNET=255.255.255.0
GATEWAY=192.168.1.1
HWADDR=00:04:5a:7d:9f:01
ONBOOT=no
BOOTPROTO=none
NETMASK=255.255.255.0
TYPE=Ethernet
USERCTL=yes
PEERDNS=no
IPV6INIT=yes
NM_CONTROLLED=no
DNS1=4.2.2.4
DNS2=4.2.2.1
DNS3=208.67.222.222

$-> cat /etc/sysconfig/network-scripts/ifcfg-eth1
# 3Com Corporation 3c905C-TX/TX-M [Tornado]
DEVICE=eth1
BOOTPROTO=dhcp
HWADDR=00:06:5b:de:54:07
ONBOOT=yes
DHCP_HOSTNAME=localhost.localdomain
TYPE=Ethernet
USERCTL=yes
PEERDNS=no
IPV6INIT=yes
NM_CONTROLLED=yes
DNS1=4.2.2.4
DNS2=4.2.2.1
DNS3=208.67.222.222

The tulip (network everwhere) card is turned off for NM because I know it has an
ifup/ifdown bug #431038 which causes a kernel stacktrace but never hard-locks
the system like this by itself.  Even with that not NM controlled the system
locks up.  I can use that device with legacy config but it sometimes causes the
oops.

Any clues where to attack this one?  If I leave service network running, and
just ifup eth0 or ifup eth1 the network works just fine.  If I stop network, and
then start NM I get a deadlocked kernel (same if I leave network running and
start NM).

I've tried getting NM started up without any configs in place and same thing
happens.

Comment 1 Dan Williams 2008-04-04 18:32:20 UTC
Is the capslock light flashing to indicate a kernel panic?

Can you turn NM off with chkconfig, then boot up and switch to a VT, then from
there start NetworkManager to capture some of the panic?  If the machine
hardlocks, it's almost certainly a kernel driver bug.

Comment 2 Andrew Farris 2008-04-04 21:03:00 UTC
Capslock was not flashing any time its happened yet (none of the LEDS were).

I have NM off via chkconfig already, and was doing the service start manually.  No output shows up other 
than the 'Starting NetworkManager..[ok]' and my next terminal prompt (I use a complicated custom $PS1, 
it finishes doing that).  Then, and only after the bash prompt.. the cursor stops blinking.

I've tried in runlevel 3 and 5.  Right now I don't know if its actually a kernel lock or not because I have no 
network response to check (it has no active interface).

Comment 3 Andrew Farris 2008-04-04 21:06:56 UTC
BTW, until I turned off NM it was partially working at one time in rawhide, but it was not correctly 
configuring the wired interfaces then.  It was at least able to start the service, and it would use DHCP on 
both interfaces but without any DNS.

I know it was working with kernel-2.6.25-0.121.rc5.git4.fc9.i686 because thats the reason I've still got it 
installed.  It now fails with that kernel, so I'm not sure how to decide if its a kernel bug or not.

Comment 4 Fernando Atrio 2008-04-05 15:04:52 UTC
I see the exact same bug on a fedora 8 machine running the latest kernel update,
kernel-2.6.24.4-64.fc8.
Everything works fine if I go back to kernel-2.6.24.3-50.fc8.
This is my smolt profile:
http://www.smolts.org/client/show/pub_a29d08fe-f229-45fe-9559-887ca3105085


Comment 5 Andrew Farris 2008-04-05 15:30:25 UTC
Thats  interesting Fernando.  It looks like we have a somewhat similar wireless adapter, although yours is 
using the rt61 driver and mine is using the rt2500pci driver.

Dan, I may be able to test using only the two wired adapters.  Other than me removing my wireless 
adapter, or blacklisting the driver, how do I turn it off or prevent NM from trying to configure it when the 
service starts?

Right now I do not have anything configured for that adapter in system-config-network at all.  There are 
no config scripts in /etc/sysconfig/network* for it either.  Would NM take any action trying to auto-
configure that adapter in this case?

Comment 6 Andrew Farris 2008-04-08 02:57:49 UTC
I have narrowed this down a little bit.  I used system-config-network to create
a configuration for ifcfg-wlan0 using the rt2500pci driver and setting the
adapter to be NM controlled.  The system now allows NetworkManager to start, and
control all the adapters.

The problem seems to be related to some attempt at auto configuration when the
kernel driver for the wireless device was loaded but no config was present yet.

Comment 7 Dan Williams 2008-04-10 16:16:47 UTC
Any chance you can revert to the broken setup and try sysrq on the box to see
where it's hanging?  would be good to track down the hang even if it's not a
panic...

http://en.wikipedia.org/wiki/Magic_SysRq_key



Comment 8 Andrew Farris 2008-04-10 23:27:30 UTC
I removed the configuration for my wlan0 in /etc/sysconfig/network-scripts and
/etc/sysconfig/networking/{default,profiles/default}, then attempted to
reproduce the problem and I cannot.

When starting from a fresh boot, no configuration scripts in place for wlan,
ifconfig shows an unconfigured adapter and NM starts up fine.

I had to blacklist the tulip module due to my tulip device problems to test this
but once I did I cannot get any hang now (only tested .204 kernel).  Going to
close until it can be reproduced.  I'll try with a fresh Preview install.

Comment 9 Fernando Atrio 2008-04-30 02:29:44 UTC
Everything works fine here, running kernel-2.6.24.5-85.fc8 


Note You need to log in before you can comment on or make changes to this bug.