Red Hat Bugzilla – Bug 595043
Fedora 12 x86_64 KVM libvirtd Nokia USB connection kernel panic crash on reboot
Last modified: 2010-10-02 20:02:42 EDT
Created attachment 415910 [details]
Contents of abrt kerneloops directory after crash
Description of problem:
Fedora 12 crashes on boot when starting KVM (libvirtd --daemon) whilst Nokia USB interface cable is connected.
Version-Release number of selected component (if applicable):
Fedora 12 x86_64 server with any of at least the following kernels:
* kernel 22.214.171.124-115.fc12.x86_64
* kernel 126.96.36.199-99.fc12.x86_64
* kernel 188.8.131.52-90.fc12.x86_64
How reproducible: ALWAYS
Steps to Reproduce:
1. Connect Nokia 6303 Classic (or similar) mobile phone via USB cable
2. Have libvirtd set to start (chkconfig libvirtd on)
3. Reboot server
Alternate method to reproduce:
1. Connect Nokia as before
2. Disable libvirtd auto-start (chkconfig libvirtd off)
3. Reboot server
4. Login to server
5. execute "service libvirtd start"
System crash with references to various "net" related routines or entry-points.
System should start libvirtd/KVM services without crashing.
Plugging the Nokia USB cable in *AFTER* the KVM/libvirtd services have loaded and stabilised does not cause a crash - the crash only occurs when booting with the Nokia cable connected.
"ifconfig -a" after reboot with Nokia connected reports a "usbpn0" device.
This device seems to be transient as a later "ifconfig -a" no longer reports its presence.
It is very likely that it is the presence of this device (or possible usb network devices in general) that is causing libvirtd/KVM to "chuck its guts".
Created attachment 415911 [details]
BZ2 Tarball containing various configuration information output
To assist in investigation I've also included a tarball containing configuration information such as list of installed packages, "dmesg" results, network configuration and output from "lsusb".
Oh, for those wondering why the Nokia phone is connected?
It's being used by Nagios to send SMS messages when certain critical events occur.
The problem still occurs after updating to:
Here's the ifconfig output pertaining to the mysterious usbpn0:
usbpn0 Link encap:UNSPEC HWaddr 1B-00-00-00-00-00-80-4E-00-00-00-00-00-00-00-00
POINTOPOINT NOARP MTU:65541 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
Created attachment 421205 [details]
Updated configuration information text files (tarballed)
Created attachment 421207 [details]
Most recent abrt Kerneloops data (crash #1)
Created attachment 421208 [details]
Most recent abrt Kerneloops data (crash #2)
Smolt profile for system:
Smolt wiki page for Nokia USB connection in above system:
Have updated to kernel 184.108.40.206-127.fc12.x86_64, however the crash still occurs.
Have the Nokia USB interface connected *BEFORE* starting libvirtd, and the system crashes.
If you start libvirtd and *THEN* plug in the Nokia USB cable, everything works fine.
This means that any sort of restart will cause a kernel crash if libvirtd is set to execute at system start and the Nokia USB cable is connected.
Unfortunately, it appears that due to the nature of the crash, abrt/kerneloops is unable to actually the traceback.
What I could record from some quick scribbling was:
Fatal exc in interrupt Pid=37
comm. netns Tainted G D W
Changed Bugzilla component entry from "kernel" to "libvirt".
It's hard to choose the right component here people, as we have three choices, all of which are correct at some level.
I initially thought it should be part of KVM, as we're using kvm_XXXX modules.
Then, given the fact that the entire kernel gets clagged, I thought maybe it should be "kernel".
I've now changed it to "libvirt" as it's likely that it's the interaction between the virtualisation libraries and the associated kernel loadable modules that's causing the crash.
It would be nice to be able to somehow mark a bug as impacting (or associated with) all 3 (or more) components.
That may help in assigning the bug to the correct team after review.
Also, why can't we change the priority when editing the bug details?
For me, this is a HIGH priority, not the default of "low".
Now that libvirt-0.7.1-18.fc12 has been pushed to Fedora 12 stable repository, I hope to test it this weekend.
Created attachment 430949 [details]
Screen shot of kernel panic traceback
libvirt-0.7.1-18.fc12 has been installed.
Unfortunately, the situation has not improved.
When the Nokia USB cable is left connected, and libvirtd is started, we get a kernel panic (screenshot of "panic" screen attached).
Start libvirtd WITHOUT the USB cable, and everything is fine.
Plug the USB cable in AFTER libvirtd is started, everything continues to operate correctly.
Created attachment 430950 [details]
Currently installed package list (sorted)
I've attached the (sorted) output from 'rpm -qa' to help provide a more complete picture of the installation environment.
We really need to see the entire oops report. Try starting the kernel with the option "vga=1" to put the console in 50-line mode, or "vga=791" for VESA mode. Or scroll back if you can do that.
Created attachment 431691 [details]
Updated oops/kernel panic screen shot
This is screen shot of the kernel panic after executing "service libvirtd start" with the Nokia USB cable connected.
Kernel video mode set to vga=1 at boot load.
Screen captured via browser to HP iLO2 remote console facility.
I hope this helps more!
I suspect that the screen buffer in the iLO2 card got a little mangled (evidenced by the repeating sections of text from "phonedeon" on the far right side of the screen shot I just attached). I suspect that "phonedeon" is actually "phonet" and "radeon" squished over each other.
Unfortunately, it's the best I can offer.
However, the code call traces appear to be complete.
I see that kernel-220.127.116.11-141.fc12 has been released to the stable "updates" repository.
I am arranging another outage to install and activate the new kernel.
Once the new kernel is active, I'll try starting libvirtd again with the Nokia USB cable attached.
I should have the results in some 16 hours from time of this update.
Created attachment 432000 [details]
Another clipboard grab of remote console kernel panic traceback
This is the panic after starting libvirtd with the new kernel.
Whilst my caveat about the screen buffer in my previous note still holds true, the traceback sequences definitely point to some interaction between "phonet", "netns".
I will attach the output from dmesg which shows the USB/cdc_phonet activity when plugging in the Nokia USB cable *after* libvirtd has started.
Created attachment 432001 [details]
"dmesg" output from kernel-18.104.22.168-141.fc12.x86_64
Here's the output from dmesg as promised.
Possibly fixed by:
(In reply to comment #20)
> Possibly fixed by:
Not sure when this patch will make it into the kernel, or even if it will remedy the problem.
We're now running kernel 22.214.171.124-166.fc12.x86_64 (with 4 older kernels available as boot time alternatives) and libvirt 0.8.2-1.fc12.x86_64 however the problem still exists.
When I can organise another outage, I'll try to get an up-to-date crash/trace for the forum.
There's a patch here you may want to try btw.:
The stack trace in comment #25 of the chromium bug is very similar to comment #18 here.
Lemme know if it works please.
(In reply to comment #22)
> There's a patch here you may want to try btw.:
> The stack trace in comment #25 of the chromium bug is very similar to comment
> #18 here.
> Lemme know if it works please.
Hi Craig - very interesting reading, and thanks for providing the very useful link.
Yes, I suspect that this would indeed resolve the issue as I *know* that libvirtd definitely interacts with the kernel network namespace functions during initialisation, as it needs to setup & provide the bridge between the physical host network interface(s) and the network interface(s) of the virtual machine.
Unfortunately, I won't be able to try patching the kernel until 02-Oct.
It's interesting the comment from Nokia saying that it's difficult (impossible?) to fix in kernels <= 2.6.32. Hopefully the Fedora upstream kernel maintainers can integrate this patch too, at least in the 2.6.32 series, otherwise I may have to wait for a kernel version > 2.6.32.
I've just run a quick query against the "rawhide" Fedora repository, and there's a kernel 2.6.36 available there (based on the 0.21.rc4.git1.fc15 release).
By the time I'm in a position to try this patch out, this new kernel may even be released as a stable update.
If not, I will try it out before attempting to integrate the Nokia patch to the current kernel.
I'll keep people posted here.
It appears that the boys & girls over in the Google chromium bug tracker have tested the v2.6.35 kernel, and aren't having any issues with system crashes with the Nokia connected (refer http://code.google.com/p/chromium/issues/detail?id=54617#c32).
I see too that the "rawhide" repository has updated the release of the 2.6.36 kernel to 0.22.rc4.git2.fc15, so hopefully the v2.6.36 kernel should resolve this issue once it's officially released (based upon the fact that 2.6.35 appears to be okay with Nokia).
It would be nice to have this annoying problem resolved.
This bug will never get fixed in Fedora 12, since it's staying on 2.6.32 until EOL. So we can close it as CANTFIX, or you can update to Fedora 13 or 14 and we can change the version it's reported against to one of those.
Hmm - I strongly disagree with the attitude regarding the maintenance of kernel levels for FC12, given that Fedora 14 has not yet been officially released, however there's not much that I can do about it out here in user-land.
I have upgraded to the FC13 versions of the Fedora kernel & libvirt packages (versions 126.96.36.199-56.fc13.x86_64 & 0.8.2-1.fc13.x86_64 respectively) and am happy to advise that the bug appears to have been removed/resolved in those versions.
Thanks to all for their time in reviewing this issue, along with the various suggestions offered.
(I draw the attention of people to the following link: http://fedoraproject.org/wiki/Releases/Schedule#Maintenance_Schedule
And I quote:
"This translates into:
* Fedora 12 will be maintained until 1 month after the release of Fedora 14.
* Fedora 13 will be maintained until 1 month after the release of Fedora 15."
Given that Fedora 14 is not scheduled for release until November 2010 [http://fedoraproject.org/wiki/Releases/14/Schedule], I would have expected that Fedora 12 would enjoy complete support, including kernel updates, until December 2010. )