Bug 480779
Summary: | kvm/ kernel crash upon installation of guest (kvm mmu BUG) | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Dominick Grift <dominick.grift> | ||||||||||
Component: | kvm | Assignee: | Marcelo Tosatti <mtosatti> | ||||||||||
Status: | CLOSED NOTABUG | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||
Severity: | medium | Docs Contact: | |||||||||||
Priority: | low | ||||||||||||
Version: | 10 | CC: | berrange, clalance, gcosta, markmc, quintela, virt-maint | ||||||||||
Target Milestone: | --- | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | All | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2009-02-03 07:43:20 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 480594 | ||||||||||||
Attachments: |
|
Description
Dominick Grift
2009-01-20 14:28:43 UTC
Marcelo: does this look familiar to you? I thought it might be this: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=34d4cb8 http://www.mail-archive.com/kvm@vger.kernel.org/msg01312.html but no, that's in 2.6.27 and we're hitting a BUG() here, not a NULL deref Looks similar to: http://www.kerneloops.org/oops.php?number=133016 Can you please attach the output of "cat /proc/cpuinfo" ? Created attachment 329617 [details]
cpu info
My cpu info
domg472, Can you reproduce this ? If you can, there are a couple of parameters we can change to find more clues. The shadow page tables which KVM maintains are corrupt. notebook0 kernel: rmap_remove: ffff880102a6ee80 ff7ffffffffff001 0->BUG ^^^^^^^^^^^^^^^^ So a single bit is different from fffffffffffff001, which is the VMX trap pte. The reports in kerneloops.org and sourceforge exhibit different corruption patterns though. Yes i can reproduce it. It happens almost every time i try to install a guest. Only once i managed to complete the installation and even then it crashed when i tried to update the installed system. OK. It is not easy as tuning a parameter as mentioned earlier, though. We will prepare kvm/kvm-intel modules with debugging. What kernel version are you using?(output of uname -a would be helpful). Thanks Linux notebook0.grift.internal 2.6.27.9-159.fc10.x86_64 #1 SMP Tue Dec 16 14:47:52 EST 2008 x86_64 x86_64 x86_64 GNU/Linux OK, Can you please download and insmod the kvm.ko and kvm-intel.ko (after unloading the ones shipped with FC10 via rmmod) at http://people.redhat.com/~mtosatti/fc10-kvm-modules/ These will print additional information when the error occurs. Will try this asap and report back. (probably tomorrow evening) OK, tried it. This time not only did it crash virt-manager but the whole system, and very early in the installation process. However, there is not much debugging info: Jan 27 11:00:13 notebook0 kernel: loaded kvm module (kvm-83-633-g5326eee) Jan 27 11:01:06 notebook0 kernel: Bridge firewalling registered Jan 27 11:01:06 notebook0 kernel: virbr0: Dropping NETIF_F_UFO since no NETIF_F_HW_CSUM feature. Jan 27 11:01:06 notebook0 kernel: virbr0: starting userspace STP failed, starting kernel STP Jan 27 11:01:06 notebook0 avahi-daemon[2605]: Joining mDNS multicast group on interface virbr0.IPv4 with addre ss 192.168.122.1. Jan 27 11:01:06 notebook0 avahi-daemon[2605]: New relevant interface virbr0.IPv4 for mDNS. Jan 27 11:01:06 notebook0 avahi-daemon[2605]: Registering new address record for 192.168.122.1 on virbr0.IPv4. Jan 27 11:01:07 notebook0 dnsmasq[15149]: started, version 2.45 cachesize 150 Jan 27 11:01:07 notebook0 dnsmasq[15149]: compile time options: IPv6 GNU-getopt no-ISC-leasefile DBus no-I18N TFTP Jan 27 11:01:07 notebook0 dnsmasq[15149]: DHCP, IP range 192.168.122.2 -- 192.168.122.254, lease time 1h Jan 27 11:01:07 notebook0 dnsmasq[15149]: reading /etc/resolv.conf Jan 27 11:01:07 notebook0 dnsmasq[15149]: using nameserver 82.197.196.183#53 Jan 27 11:01:07 notebook0 dnsmasq[15149]: using nameserver 82.197.196.182#53 Jan 27 11:01:07 notebook0 dnsmasq[15149]: read /etc/hosts - 7 addresses Jan 27 11:01:07 notebook0 kernel: Not cloning cgroup for unused subsystem ns Jan 27 11:01:08 notebook0 ntpd[2539]: Listening on interface #7 virbr0, 192.168.122.1#123 Enabled Jan 27 11:01:08 notebook0 avahi-daemon[2605]: Registering new address record for fe80::2852:dbff:fef0:3e7e on virbr0.*. Jan 27 11:01:11 notebook0 ntpd[2539]: Listening on interface #8 virbr0, fe80::2852:dbff:fef0:3e7e#123 Enabled Jan 27 11:03:06 notebook0 kernel: tun: Universal TUN/TAP device driver, 1.6 Jan 27 11:03:06 notebook0 kernel: tun: (C) 1999-2004 Max Krasnyansky <maxk> Jan 27 11:03:06 notebook0 kernel: device vnet0 entered promiscuous mode Jan 27 11:03:07 notebook0 kernel: virbr0: topology change detected, propagating Jan 27 11:03:07 notebook0 kernel: virbr0: port 1(vnet0) entering forwarding state Jan 27 11:03:07 notebook0 NetworkManager: nm_device_ethernet_new: assertion `driver != NULL' failed Jan 27 11:03:08 notebook0 avahi-daemon[2605]: Registering new address record for fe80::ecad:e6ff:fe7a:33c on v net0.*. Jan 27 11:03:10 notebook0 ntpd[2539]: Listening on interface #9 vnet0, fe80::ecad:e6ff:fe7a:33c#123 Enabled Jan 27 11:03:11 notebook0 nm-system-settings: Adding default connection 'Auto vnet0' for /org/freedesktop/Hal/ devices/net_ee_ad_e6_7a_03_3c Jan 27 11:03:42 notebook0 dnsmasq[15149]: DHCPDISCOVER(virbr0) 54:52:00:2b:7a:f6 Jan 27 11:03:42 notebook0 dnsmasq[15149]: DHCPOFFER(virbr0) 192.168.122.166 54:52:00:2b:7a:f6 Jan 27 11:03:42 notebook0 dnsmasq[15149]: DHCPREQUEST(virbr0) 192.168.122.166 54:52:00:2b:7a:f6 Jan 27 11:03:42 notebook0 dnsmasq[15149]: DHCPACK(virbr0) 192.168.122.166 54:52:00:2b:7a:f6 This is all. Also in dmesg there is no trace at all. I will try this again in a bit. Tried it a couple more times. The first time virt-manager froze (cpu maxed out) while trying to unlock the encrypted disk. The second time virt-manager (guest installation) froze (cpu maxed out) when the first package was about to be installed. Worth noting that both times it did not crash my whole system but it just hung the virt-manager guest installation process. /var/log/messages did show anything strange to me. (see attached log) However this time dmesg did show something strange: "vcpu not ready for apic_round_robin" Please see attached dmesg snippet. Created attachment 330076 [details]
"/var/log/messages" snipppet
Created attachment 330078 [details]
"dmesg | less" snippet
domg, Hum, probably qemu-kvm is not forward compatible with the version shipped in FC10. I've put a tarball at http://people.redhat.com/~mtosatti/kvm-83-mmu-debug.tar.gz Decompress it, run "./configure; make". Then change your initialization script to use kvm-userspace-mmu-debug/qemu/x86_64-softmmu/qemu-system-x86_64 instead of /usr/bin/qemu-kvm. You can contact me directly via private email if you encounter problems with that. Sorry about that. "Hum, probably qemu-kvm is not forward compatible with the version shipped in FC10." Scratch that, there _must be_ backward compatibility for older userspace, as noted by others. But try to compile that tarball yourself please (you'll have to run qemu manually, in a script, instead of through libvirt) so we can focus on this particular problem. Now that I look again, the oops shows the nvidia binary blob is loaded. domg: please try and reproduce without the nvidio driver loaded. We have no ability to fix issues caused by a closed source kernel module. Unfortunately i guess that i am out of luck then because vesa only supports resolution up to 800x600 here and virt-manager cannot handle that resolution. I tried it and it only displays part of the installation screen, there is no way for me to click the continue button. (In reply to comment #18) > Unfortunately i guess that i am out of luck then because vesa only supports > resolution up to 800x600 here and virt-manager cannot handle that resolution. > > I tried it and it only displays part of the installation screen, there is no > way for me to click the continue button. Hmm, the bottom of the window is offscreen? How about if you Alt-leftmouse and move the window up? Exactly. I have just tried it again and i can confirm that this does not work. You could also use virt-install to kick off the installation from the command line. We've tried to install without the nvidia module loaded and other problems have been encountered. The attached png contains a screenshot of an exception during guest installation which seems to indicate hardware problems. Dominick will try memtest on the machine and report back. Thanks for the bug report. Created attachment 330667 [details]
guest install exception
Dominick confirms memtest fails. Thanks guys, closing |