Bug 595043 - Fedora 12 x86_64 KVM libvirtd Nokia USB connection kernel panic crash on reboot
Summary: Fedora 12 x86_64 KVM libvirtd Nokia USB connection kernel panic crash on reboot
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 12
Hardware: x86_64
OS: Linux
low
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-05-23 02:47 UTC by Scott Marshall
Modified: 2010-10-03 00:02 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-10-03 00:02:42 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Contents of abrt kerneloops directory after crash (1.37 KB, application/octet-stream)
2010-05-23 02:47 UTC, Scott Marshall
no flags Details
BZ2 Tarball containing various configuration information output (29.92 KB, application/octet-stream)
2010-05-23 02:51 UTC, Scott Marshall
no flags Details
Updated configuration information text files (tarballed) (29.69 KB, application/octet-stream)
2010-06-04 11:33 UTC, Scott Marshall
no flags Details
Most recent abrt Kerneloops data (crash #1) (1.47 KB, application/octet-stream)
2010-06-04 11:35 UTC, Scott Marshall
no flags Details
Most recent abrt Kerneloops data (crash #2) (1.46 KB, application/octet-stream)
2010-06-04 11:35 UTC, Scott Marshall
no flags Details
Screen shot of kernel panic traceback (169.73 KB, image/png)
2010-07-11 01:37 UTC, Scott Marshall
no flags Details
Currently installed package list (sorted) (59.64 KB, text/plain)
2010-07-11 01:47 UTC, Scott Marshall
no flags Details
Updated oops/kernel panic screen shot (87.70 KB, image/png)
2010-07-14 07:35 UTC, Scott Marshall
no flags Details
Another clipboard grab of remote console kernel panic traceback (88.45 KB, image/png)
2010-07-15 08:47 UTC, Scott Marshall
no flags Details
"dmesg" output from kernel-2.6.32.16-141.fc12.x86_64 (57.42 KB, text/plain)
2010-07-15 08:50 UTC, Scott Marshall
no flags Details

Description Scott Marshall 2010-05-23 02:47:40 UTC
Created attachment 415910 [details]
Contents of abrt kerneloops directory after crash

Description of problem:
Fedora 12 crashes on boot when starting KVM (libvirtd --daemon) whilst Nokia USB interface cable is connected.

Version-Release number of selected component (if applicable):
Fedora 12 x86_64 server with any of at least the following kernels:
* kernel 2.6.32.12-115.fc12.x86_64
* kernel 2.6.32.11-99.fc12.x86_64
* kernel 2.6.32.10-90.fc12.x86_64

How reproducible: ALWAYS


Steps to Reproduce:
1.  Connect Nokia 6303 Classic (or similar) mobile phone via USB cable
2.  Have libvirtd set to start (chkconfig libvirtd on)
3.  Reboot server

Alternate method to reproduce:
1.  Connect Nokia as before
2.  Disable libvirtd auto-start (chkconfig libvirtd off)
3.  Reboot server
4.  Login to server
5.  execute "service libvirtd start"
  
Actual results:
System crash with references to various "net" related routines or entry-points.

Expected results:
System should start libvirtd/KVM services without crashing.

Additional info:
Plugging the Nokia USB cable in *AFTER* the KVM/libvirtd services have loaded and stabilised does not cause a crash - the crash only occurs when booting with the Nokia cable connected.

"ifconfig -a" after reboot with Nokia connected reports a "usbpn0" device.
This device seems to be transient as a later "ifconfig -a" no longer reports its presence.

It is very likely that it is the presence of this device (or possible usb network devices in general) that is causing libvirtd/KVM to "chuck its guts".

Comment 1 Scott Marshall 2010-05-23 02:51:09 UTC
Created attachment 415911 [details]
BZ2 Tarball containing various configuration information output

To assist in investigation I've also included a tarball containing configuration information such as list of installed packages, "dmesg" results, network configuration and output from "lsusb".

Comment 2 Scott Marshall 2010-05-23 02:59:23 UTC
Oh, for those wondering why the Nokia phone is connected?
It's being used by Nagios to send SMS messages when certain critical events occur.

Comment 3 Scott Marshall 2010-06-04 10:37:15 UTC
The problem still occurs after updating to:
libvirt-client-0.7.1-16.fc12.x86_64
libvirt-0.7.1-16.fc12.x86_64       
libvirt-python-0.7.1-16.fc12.x86_64
net-tools-1.60-100.fc12.x86_64

Comment 4 Scott Marshall 2010-06-04 10:51:13 UTC
Here's the ifconfig output pertaining to the mysterious usbpn0:

usbpn0    Link encap:UNSPEC  HWaddr 1B-00-00-00-00-00-80-4E-00-00-00-00-00-00-00-00
          POINTOPOINT NOARP  MTU:65541  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:3
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

Comment 5 Scott Marshall 2010-06-04 11:33:28 UTC
Created attachment 421205 [details]
Updated configuration information text files (tarballed)

Comment 6 Scott Marshall 2010-06-04 11:35:00 UTC
Created attachment 421207 [details]
Most recent abrt Kerneloops data (crash #1)

Comment 7 Scott Marshall 2010-06-04 11:35:38 UTC
Created attachment 421208 [details]
Most recent abrt Kerneloops data (crash #2)

Comment 8 Scott Marshall 2010-06-14 13:55:12 UTC
Smolt profile for system:
http://www.smolts.org/client/show/pub_1b078633-90ba-48f2-b8b3-5019313c0e2c

Smolt wiki page for Nokia USB connection in above system:
http://smolts.org/smolt-wiki/usb/0421/01b0/0000/0000

Comment 9 Scott Marshall 2010-06-30 14:30:54 UTC
Have updated to kernel 2.6.32.14-127.fc12.x86_64, however the crash still occurs.

Sequence is:
	Have the Nokia USB interface connected *BEFORE* starting libvirtd, and the system crashes.

If you start libvirtd and *THEN* plug in the Nokia USB cable, everything works fine.

This means that any sort of restart will cause a kernel crash if libvirtd is set to execute at system start and the Nokia USB cable is connected.

Unfortunately, it appears that due to the nature of the crash, abrt/kerneloops is unable to actually the traceback.

What I could record from some quick scribbling was:
raw_notifier_call_chain
call_netdevice_notifiers
rollback_registered
unregister_netdevice
unregister_netdev
loopback_net_exit
cleanup_net
worker_thread
? cleanup_net
? aubremove_wake_function
? worker_thread
kthread
child_rip
? kthread
? child_rip
__phonet_get [phonet]
Fatal exc in interrupt Pid=37
comm. netns Tainted G      D W

Comment 10 Scott Marshall 2010-07-09 10:03:17 UTC
Changed Bugzilla component entry from "kernel" to "libvirt".
It's hard to choose the right component here people, as we have three choices, all of which are correct at some level.
I initially thought it should be part of KVM, as we're using kvm_XXXX modules.
Then, given the fact that the entire kernel gets clagged, I thought maybe it should be "kernel".

I've now changed it to "libvirt" as it's likely that it's the interaction between the virtualisation libraries and the associated kernel loadable modules that's causing the crash.

It would be nice to be able to somehow mark a bug as impacting (or associated with) all 3 (or more) components.

That may help in assigning the bug to the correct team after review.

Also, why can't we change the priority when editing the bug details?
For me, this is a HIGH priority, not the default of "low".

Comment 11 Scott Marshall 2010-07-09 10:05:28 UTC
Now that libvirt-0.7.1-18.fc12 has been pushed to Fedora 12 stable repository, I hope to test it this weekend.

Comment 12 Scott Marshall 2010-07-11 01:37:18 UTC
Created attachment 430949 [details]
Screen shot of kernel panic traceback

libvirt-0.7.1-18.fc12 has been installed.
Unfortunately, the situation has not improved.

When the Nokia USB cable is left connected, and libvirtd is started, we get a kernel panic (screenshot of "panic" screen attached).

Start libvirtd WITHOUT the USB cable, and everything is fine.
Plug the USB cable in AFTER libvirtd is started, everything continues to operate correctly.

Comment 13 Scott Marshall 2010-07-11 01:47:05 UTC
Created attachment 430950 [details]
Currently installed package list (sorted)

I've attached the (sorted) output from 'rpm -qa' to help provide a more complete picture of the installation environment.

Comment 14 Chuck Ebbert 2010-07-13 19:36:30 UTC
We really need to see the entire oops report. Try starting the kernel with the option "vga=1" to put the console in 50-line mode, or "vga=791" for VESA mode. Or scroll back if you can do that.

Comment 15 Scott Marshall 2010-07-14 07:35:36 UTC
Created attachment 431691 [details]
Updated oops/kernel panic screen shot

This is screen shot of the kernel panic after executing "service libvirtd start" with the Nokia USB cable connected.

Kernel video mode set to vga=1 at boot load.

Screen captured via browser to HP iLO2 remote console facility.

I hope this helps more!

Cheers,
Scott

Comment 16 Scott Marshall 2010-07-14 07:50:03 UTC
I suspect that the screen buffer in the iLO2 card got a little mangled (evidenced by the repeating sections of text from "phonedeon" on the far right side of the screen shot I just attached).  I suspect that "phonedeon" is actually "phonet" and "radeon" squished over each other.

Unfortunately, it's the best I can offer.
However, the code call traces appear to be complete.

Comment 17 Scott Marshall 2010-07-14 15:25:20 UTC
I see that kernel-2.6.32.16-141.fc12 has been released to the stable "updates" repository.

I am arranging another outage to install and activate the new kernel.

Once the new kernel is active, I'll try starting libvirtd again with the Nokia USB cable attached.

I should have the results in some 16 hours from time of this update.

Comment 18 Scott Marshall 2010-07-15 08:47:55 UTC
Created attachment 432000 [details]
Another clipboard grab of remote console kernel panic traceback

This is the panic after starting libvirtd with the new kernel.

Whilst my caveat about the screen buffer in my previous note still holds true, the traceback sequences definitely point to some interaction between "phonet", "netns".

I will attach the output from dmesg which shows the USB/cdc_phonet activity when plugging in the Nokia USB cable *after* libvirtd has started.

Comment 19 Scott Marshall 2010-07-15 08:50:37 UTC
Created attachment 432001 [details]
"dmesg" output from kernel-2.6.32.16-141.fc12.x86_64

Here's the output from dmesg as promised.

Comment 21 Scott Marshall 2010-09-16 09:06:26 UTC
(In reply to comment #20)
> Possibly fixed by:
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff_plain;h=7dfde179c38056b91d51e60f3d50902387f27c84

Not sure when this patch will make it into the kernel, or even if it will remedy the problem.

We're now running kernel 2.6.32.21-166.fc12.x86_64 (with 4 older kernels available as boot time alternatives) and libvirt 0.8.2-1.fc12.x86_64 however the problem still exists.

When I can organise another outage, I'll try to get an up-to-date crash/trace for the forum.

<hr>
kernel-2.6.32.11-99.fc12.x86_64
kernel-2.6.32.12-115.fc12.x86_64
kernel-2.6.32.14-127.fc12.x86_64
kernel-2.6.32.16-141.fc12.x86_64
kernel-2.6.32.21-166.fc12.x86_64
kernel-devel-2.6.32.11-99.fc12.x86_64
kernel-devel-2.6.32.12-115.fc12.x86_64
kernel-devel-2.6.32.14-127.fc12.x86_64
kernel-devel-2.6.32.16-141.fc12.x86_64
kernel-devel-2.6.32.21-166.fc12.x86_64
kernel-firmware-2.6.32.21-166.fc12.noarch
kernel-headers-2.6.32.21-166.fc12.x86_64
libvirt-0.8.2-1.fc12.x86_64
libvirt-client-0.8.2-1.fc12.x86_64
libvirt-python-0.8.2-1.fc12.x86_64
<hr>

Cheers!

Comment 22 Craig Schlenter 2010-09-16 11:07:31 UTC
There's a patch here you may want to try btw.:

http://code.google.com/p/chromium/issues/detail?id=54617#c30

The stack trace in comment #25 of the chromium bug is very similar to comment #18 here.

Lemme know if it works please.

Comment 23 Scott Marshall 2010-09-16 11:46:04 UTC
(In reply to comment #22)
> There's a patch here you may want to try btw.:
> 
> http://code.google.com/p/chromium/issues/detail?id=54617#c30
> 
> The stack trace in comment #25 of the chromium bug is very similar to comment
> #18 here.
> 
> Lemme know if it works please.

Hi Craig - very interesting reading, and thanks for providing the very useful link.

Yes, I suspect that this would indeed resolve the issue as I *know* that libvirtd definitely interacts with the kernel network namespace functions during initialisation, as it needs to setup & provide the bridge between the physical host network interface(s) and the network interface(s) of the virtual machine.

Unfortunately, I won't be able to try patching the kernel until 02-Oct.

It's interesting the comment from Nokia saying that it's difficult (impossible?) to fix in kernels <= 2.6.32.  Hopefully the Fedora upstream kernel maintainers can integrate this patch too, at least in the 2.6.32 series, otherwise I may have to wait for a kernel version > 2.6.32.

I've just run a quick query against the "rawhide" Fedora repository, and there's a kernel 2.6.36 available there (based on the 0.21.rc4.git1.fc15 release).
By the time I'm in a position to try this patch out, this new kernel may even be released as a stable update.
If not, I will try it out before attempting to integrate the Nokia patch to the current kernel.

I'll keep people posted here.

Cheers!

Comment 24 Scott Marshall 2010-09-19 12:22:19 UTC
It appears that the boys & girls over in the Google chromium bug tracker have tested the v2.6.35 kernel, and aren't having any issues with system crashes with the Nokia connected (refer http://code.google.com/p/chromium/issues/detail?id=54617#c32).

I see too that the "rawhide" repository has updated the release of the 2.6.36 kernel to 0.22.rc4.git2.fc15, so hopefully the v2.6.36 kernel should resolve this issue once it's officially released (based upon the fact that 2.6.35 appears to be okay with Nokia).

It would be nice to have this annoying problem resolved.

Comment 25 Chuck Ebbert 2010-09-20 03:41:49 UTC
This bug will never get fixed in Fedora 12, since it's staying on 2.6.32 until EOL. So we can close it as CANTFIX, or you can update to Fedora 13 or 14 and we can change the version it's reported against to one of those.

Comment 26 Scott Marshall 2010-10-03 00:02:42 UTC
Hmm - I strongly disagree with the attitude regarding the maintenance of kernel levels for FC12, given that Fedora 14 has not yet been officially released, however there's not much that I can do about it out here in user-land.

I have upgraded to the FC13 versions of the Fedora kernel & libvirt packages (versions 2.6.34.7-56.fc13.x86_64 & 0.8.2-1.fc13.x86_64 respectively) and am happy to advise that the bug appears to have been removed/resolved in those versions.

Thanks to all for their time in reviewing this issue, along with the various suggestions offered.

Cheers!


(I draw the attention of people to the following link: http://fedoraproject.org/wiki/Releases/Schedule#Maintenance_Schedule

And I quote:
"This translates into:

    * Fedora 12 will be maintained until 1 month after the release of Fedora 14.
    * Fedora 13 will be maintained until 1 month after the release of Fedora 15." 

Given that Fedora 14 is not scheduled for release until November 2010 [http://fedoraproject.org/wiki/Releases/14/Schedule], I would have expected that Fedora 12 would enjoy complete support, including kernel updates, until December 2010. )


Note You need to log in before you can comment on or make changes to this bug.