Description of problem: NetworkManager is making my system lockup immediately as I start the service. The system does not respond to any input or network traffic. I had previously switched to using legacy configuration for my network devices (two wired, one unused wireless) on this desktop machine. I tried to return to NM config but this lockup is making it impossible to test/workaround, and the machine must be hard-reset each time (a plus is I'm giving my luks ext4 filesystem a good lashing, it recovers the journal each time). Version-Release number of selected component (if applicable): NetworkManager-0.7.0-0.9.1.svn3521.fc9.i386 I've tried all these kernels: kernel-2.6.25-0.121.rc5.git4.fc9.i686 kernel-2.6.25-0.172.rc7.git4.fc9.i686 kernel-2.6.25-0.167.rc7.git2.fc9.i686 kernel-2.6.25-0.177.rc7.git6.fc9.i686 kernel-2.6.25-0.185.rc7.git6.fc9.i686 How reproducible: A sure bet. Additional info: I've got kernel debuginfo installed, but absolutely nothing gets printed to terminal after NetworkManager starts (I get the new root terminal prompt, then the cursor stops blinking and thats it...). I have NM turned off by chkconfig and am manually requesting start after its fully booted. I have the configuration scripts setup like this: -> cat /etc/sysconfig/network-scripts/ifcfg-eth0 # ADMtek NC100 Network Everywhere Fast Ethernet 10/100 DEVICE=eth0 IPADDR=192.168.1.10 SUBNET=255.255.255.0 GATEWAY=192.168.1.1 HWADDR=00:04:5a:7d:9f:01 ONBOOT=no BOOTPROTO=none NETMASK=255.255.255.0 TYPE=Ethernet USERCTL=yes PEERDNS=no IPV6INIT=yes NM_CONTROLLED=no DNS1=4.2.2.4 DNS2=4.2.2.1 DNS3=208.67.222.222 $-> cat /etc/sysconfig/network-scripts/ifcfg-eth1 # 3Com Corporation 3c905C-TX/TX-M [Tornado] DEVICE=eth1 BOOTPROTO=dhcp HWADDR=00:06:5b:de:54:07 ONBOOT=yes DHCP_HOSTNAME=localhost.localdomain TYPE=Ethernet USERCTL=yes PEERDNS=no IPV6INIT=yes NM_CONTROLLED=yes DNS1=4.2.2.4 DNS2=4.2.2.1 DNS3=208.67.222.222 The tulip (network everwhere) card is turned off for NM because I know it has an ifup/ifdown bug #431038 which causes a kernel stacktrace but never hard-locks the system like this by itself. Even with that not NM controlled the system locks up. I can use that device with legacy config but it sometimes causes the oops. Any clues where to attack this one? If I leave service network running, and just ifup eth0 or ifup eth1 the network works just fine. If I stop network, and then start NM I get a deadlocked kernel (same if I leave network running and start NM). I've tried getting NM started up without any configs in place and same thing happens.
Is the capslock light flashing to indicate a kernel panic? Can you turn NM off with chkconfig, then boot up and switch to a VT, then from there start NetworkManager to capture some of the panic? If the machine hardlocks, it's almost certainly a kernel driver bug.
Capslock was not flashing any time its happened yet (none of the LEDS were). I have NM off via chkconfig already, and was doing the service start manually. No output shows up other than the 'Starting NetworkManager..[ok]' and my next terminal prompt (I use a complicated custom $PS1, it finishes doing that). Then, and only after the bash prompt.. the cursor stops blinking. I've tried in runlevel 3 and 5. Right now I don't know if its actually a kernel lock or not because I have no network response to check (it has no active interface).
BTW, until I turned off NM it was partially working at one time in rawhide, but it was not correctly configuring the wired interfaces then. It was at least able to start the service, and it would use DHCP on both interfaces but without any DNS. I know it was working with kernel-2.6.25-0.121.rc5.git4.fc9.i686 because thats the reason I've still got it installed. It now fails with that kernel, so I'm not sure how to decide if its a kernel bug or not.
I see the exact same bug on a fedora 8 machine running the latest kernel update, kernel-2.6.24.4-64.fc8. Everything works fine if I go back to kernel-2.6.24.3-50.fc8. This is my smolt profile: http://www.smolts.org/client/show/pub_a29d08fe-f229-45fe-9559-887ca3105085
Thats interesting Fernando. It looks like we have a somewhat similar wireless adapter, although yours is using the rt61 driver and mine is using the rt2500pci driver. Dan, I may be able to test using only the two wired adapters. Other than me removing my wireless adapter, or blacklisting the driver, how do I turn it off or prevent NM from trying to configure it when the service starts? Right now I do not have anything configured for that adapter in system-config-network at all. There are no config scripts in /etc/sysconfig/network* for it either. Would NM take any action trying to auto- configure that adapter in this case?
I have narrowed this down a little bit. I used system-config-network to create a configuration for ifcfg-wlan0 using the rt2500pci driver and setting the adapter to be NM controlled. The system now allows NetworkManager to start, and control all the adapters. The problem seems to be related to some attempt at auto configuration when the kernel driver for the wireless device was loaded but no config was present yet.
Any chance you can revert to the broken setup and try sysrq on the box to see where it's hanging? would be good to track down the hang even if it's not a panic... http://en.wikipedia.org/wiki/Magic_SysRq_key
I removed the configuration for my wlan0 in /etc/sysconfig/network-scripts and /etc/sysconfig/networking/{default,profiles/default}, then attempted to reproduce the problem and I cannot. When starting from a fresh boot, no configuration scripts in place for wlan, ifconfig shows an unconfigured adapter and NM starts up fine. I had to blacklist the tulip module due to my tulip device problems to test this but once I did I cannot get any hang now (only tested .204 kernel). Going to close until it can be reproduced. I'll try with a fresh Preview install.
Everything works fine here, running kernel-2.6.24.5-85.fc8