Bug 480779

Summary: kvm/ kernel crash upon installation of guest (kvm mmu BUG)
Product: [Fedora] Fedora Reporter: Dominick Grift <dominick.grift>
Component: kvmAssignee: Marcelo Tosatti <mtosatti>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 10CC: berrange, clalance, gcosta, markmc, quintela, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-02-03 02:43:20 EST Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Bug Depends On:    
Bug Blocks: 480594    
Attachments:
Description Flags
cpu info
none
"/var/log/messages" snipppet
none
"dmesg | less" snippet
none
guest install exception none

Description Dominick Grift 2009-01-20 09:28:43 EST
Description of problem:
kvm crashes upon installation of guest

   1.
      Jan 20 15:15:40 notebook0 kernel: rmap_remove: ffff880102a6ee80 ff7ffffffffff001 0->BUG
   2.
      Jan 20 15:15:40 notebook0 kernel: ------------[ cut here ]------------
   3.
      Jan 20 15:15:40 notebook0 kernel: kernel BUG at arch/x86/kvm/mmu.c:548!
   4.
      Jan 20 15:15:40 notebook0 kernel: invalid opcode: 0000 [1] SMP
   5.
      Jan 20 15:15:40 notebook0 kernel: CPU 0
   6.
      Jan 20 15:15:40 notebook0 kernel: Modules linked in: tun ipt_MASQUERADE iptable_nat nf_nat bridge stp nls_utf8 vfat fat mmc_block usb_storage fuse sunrpc ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 cpufreq_ondemand acpi_cpufreq freq_table dm_multipath kvm_intel kvm uinput nvidia(P) snd_hda_intel snd_seq_dummy snd_seq_oss arc4 ecb snd_seq_midi_event snd_seq snd_usb_audio snd_usb_lib snd_pcm_oss iwlagn snd_mixer_oss snd_pcm iwlcore snd_rawmidi snd_timer snd_seq_device snd_page_alloc snd_hwdep snd uvcvideo rfkill compat_ioctl32 i2c_i801 mac80211 video sdhci_pci joydev videodev sdhci firewire_ohci output pcspkr v4l1_compat serio_raw firewire_core mmc_core e1000e iTCO_wdt iTCO_vendor_support soundcore i2c_core ac cfg80211 battery crc_itu_t sha256_generic cbc aes_x86_64 aes_generic dm_crypt crypto_blkcipher [last unloaded: microcode]
   7.
      Jan 20 15:15:40 notebook0 kernel: Pid: 11973, comm: qemu-kvm Tainted: P 2.6.27.9-159.fc10.x86_64 #1
   8.
      Jan 20 15:15:40 notebook0 kernel: RIP: 0010:[<ffffffffa0a3c176>] [<ffffffffa0a3c176>] rmap_remove+0xc9/0x198 [kvm]
   9.
      Jan 20 15:15:40 notebook0 kernel: RSP: 0018:ffff8800b18c9868 EFLAGS: 00010292
  10.
      Jan 20 15:15:40 notebook0 kernel: RAX: 0000000000000039 RBX: 000000ffffffffff RCX: ffff8801389d5e50
  11.
      Jan 20 15:15:40 notebook0 kernel: RDX: ffff8800b18c9678 RSI: ffff8800b18c9728 RDI: 0000000000000246
  12.
      Jan 20 15:15:40 notebook0 kernel: RBP: ffff8800b18c9888 R08: ffff8800b18c96d8 R09: 0000000000000096
  13.
      Jan 20 15:15:40 notebook0 kernel: R10: 00100574bb316352 R11: 0000000100000000 R12: ffff880102a6ee80
  14.
      Jan 20 15:15:40 notebook0 kernel: R13: ffff88000192e420 R14: ffff8800a18e4000 R15: ffff8800a1479378
  15.
      Jan 20 15:15:40 notebook0 kernel: FS: 00007f3432b98950(0000) GS:ffffffff8155e100(0000) knlGS:0000000000000000
  16.
      Jan 20 15:15:40 notebook0 kernel: CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
  17.
      Jan 20 15:15:40 notebook0 kernel: CR2: 0000000000dd68a0 CR3: 00000000a18d6000 CR4: 00000000000026e0
  18.
      Jan 20 15:15:40 notebook0 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  19.
      Jan 20 15:15:40 notebook0 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  20.
      Jan 20 15:15:40 notebook0 kernel: Process qemu-kvm (pid: 11973, threadinfo ffff8800b18c8000, task ffff8800a1579710)
  21.
      Jan 20 15:15:40 notebook0 kernel: Stack: ffff880102a6ee80 ffff88000192e420 00000000000001d0 ffff8800a18e4000
  22.
      Jan 20 15:15:40 notebook0 kernel: ffff8800b18c98c8 ffffffffa0a3c37c ffff8800b18c98c8 0000000000068882
  23.
      Jan 20 15:15:40 notebook0 kernel: ffff8800a1478000 0000000068882d50 ffff8800a1478000 ffff8800a1479378
  24.
      Jan 20 15:15:40 notebook0 kernel: Call Trace:
  25.
      Jan 20 15:15:40 notebook0 kernel: [<ffffffffa0a3c37c>] kvm_mmu_zap_page+0x8f/0x25d [kvm]
  26.
      Jan 20 15:15:40 notebook0 kernel: [<ffffffffa0a3e546>] kvm_mmu_pte_write+0x339/0x7e2 [kvm]
  27.
      Jan 20 15:15:40 notebook0 kernel: [<ffffffffa0a3819e>] emulator_write_phys+0x37/0x47 [kvm]
  28.
      Jan 20 15:15:40 notebook0 kernel: [<ffffffffa0a3afb6>] emulator_write_emulated_onepage+0x71/0xf9 [kvm]
  29.
      Jan 20 15:15:40 notebook0 kernel: [<ffffffffa0a3b0a3>] emulator_write_emulated+0x65/0x71 [kvm]
  30.
      Jan 20 15:15:40 notebook0 kernel: [<ffffffffa0a424d3>] x86_emulate_insn+0x2352/0x2f6b [kvm]
  31.
      Jan 20 15:15:40 notebook0 kernel: [<ffffffff810a168f>] ? copy_page_range+0x4e3/0x799
  32.
      Jan 20 15:15:40 notebook0 kernel: [<ffffffffa0a38089>] emulate_instruction+0x141/0x21f [kvm]
  33.
      Jan 20 15:15:40 notebook0 kernel: [<ffffffffa0a3e1d0>] kvm_mmu_page_fault+0x49/0x86 [kvm]
  34.
      Jan 20 15:15:40 notebook0 kernel: [<ffffffffa0a5972c>] handle_exception+0x1a4/0x25e [kvm_intel]
  35.
      Jan 20 15:15:40 notebook0 kernel: [<ffffffffa0a444fb>] ? apic_update_ppr+0x1c/0x4f [kvm]
  36.
      Jan 20 15:15:40 notebook0 kernel: [<ffffffffa0a57d02>] kvm_handle_exit+0x116/0x136 [kvm_intel]
  37.
      Jan 20 15:15:40 notebook0 kernel: [<ffffffff81031033>] ? need_resched+0x1e/0x28
  38.
      Jan 20 15:15:40 notebook0 kernel: [<ffffffffa0a39f63>] kvm_arch_vcpu_ioctl_run+0x50d/0x69f [kvm]
  39.
      Jan 20 15:15:40 notebook0 kernel: [<ffffffff8101f864>] ? native_smp_send_reschedule+0x4d/0x4f
  40.
      Jan 20 15:15:40 notebook0 kernel: [<ffffffff81031104>] ? resched_task+0x84/0x8c
  41.
      Jan 20 15:15:40 notebook0 kernel: [<ffffffff81332cb4>] ? _spin_unlock_irqrestore+0x27/0x3e
  42.
      Jan 20 15:15:40 notebook0 kernel: [<ffffffffa0a3345b>] kvm_vcpu_ioctl+0xf7/0x3c4 [kvm]
  43.
      Jan 20 15:15:40 notebook0 kernel: [<ffffffff81332aba>] ? _spin_lock+0x9/0xc
  44.
      Jan 20 15:15:40 notebook0 kernel: [<ffffffff8113ee2f>] ? avc_has_perm+0x4e/0x60
  45.
      Jan 20 15:15:40 notebook0 kernel: [<ffffffff81060937>] ? do_futex+0xb5/0x973
  46.
      Jan 20 15:15:40 notebook0 kernel: [<ffffffff8114093d>] ? inode_has_perm+0x5b/0x61
  47.
      Jan 20 15:15:40 notebook0 kernel: [<ffffffffa0a34ddd>] ? kvm_vm_ioctl+0x20e/0x228 [kvm]
  48.
      Jan 20 15:15:40 notebook0 kernel: [<ffffffff81140a8b>] ? file_has_perm+0x83/0x8e
  49.
      Jan 20 15:15:40 notebook0 kernel: [<ffffffff810cba52>] vfs_ioctl+0x2a/0x78
  50.
      Jan 20 15:15:40 notebook0 kernel: [<ffffffff810cbcda>] do_vfs_ioctl+0x23a/0x24b
  51.
      Jan 20 15:15:40 notebook0 kernel: [<ffffffff810cbd40>] sys_ioctl+0x55/0x79
  52.
      Jan 20 15:15:40 notebook0 kernel: [<ffffffff8101024a>] system_call_fastpath+0x16/0x1b
  53.
      Jan 20 15:15:40 notebook0 kernel:
  54.
      Jan 20 15:15:40 notebook0 kernel:
  55.
      Jan 20 15:15:40 notebook0 kernel: Code: 80 00 00 00 48 8b 34 c1 e8 0b ff ff ff 49 89 c1 48 8b 00 48 85 c0 75 17 49 8b 14 24 4c 89 e6 48 c7 c7 f0 82 a4 a0 e8 12 46 8f e0 <0f> 0b eb fe a8 01 75 2a 49 39 c4 74 19 49 8b 14 24 4c 89 e6 48
  56.
      Jan 20 15:15:40 notebook0 kernel: RIP [<ffffffffa0a3c176>] rmap_remove+0xc9/0x198 [kvm]
  57.
      Jan 20 15:15:40 notebook0 kernel: RSP <ffff8800b18c9868>
  58.
      Jan 20 15:15:40 notebook0 kernel: ---[ end trace de0eb3fb2ac08118 ]---

Version-Release number of selected component (if applicable):
kvm-74-10.fc10.x86_64
kernel-2.6.27.9-159.fc10.x86_64

How reproducible:
Install a guest using virt-manager kvm/qemu
Comment 1 Mark McLoughlin 2009-01-20 10:50:29 EST
Marcelo: does this look familiar to you?

I thought it might be this:

  http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=34d4cb8
  http://www.mail-archive.com/kvm@vger.kernel.org/msg01312.html

but no, that's in 2.6.27 and we're hitting a BUG() here, not a NULL deref
Comment 2 Mark McLoughlin 2009-01-20 10:55:40 EST
Looks similar to:

  http://www.kerneloops.org/oops.php?number=133016
Comment 3 Marcelo Tosatti 2009-01-21 08:37:13 EST
Can you please attach the output of "cat /proc/cpuinfo" ?
Comment 4 Dominick Grift 2009-01-21 10:38:13 EST
Created attachment 329617 [details]
cpu info

My cpu info
Comment 5 Marcelo Tosatti 2009-01-26 10:27:14 EST
domg472,

Can you reproduce this ? If you can, there are a couple of parameters
we can change to find more clues.

The shadow page tables which KVM maintains are corrupt. 

notebook0 kernel: rmap_remove: ffff880102a6ee80 ff7ffffffffff001 0->BUG
                                                ^^^^^^^^^^^^^^^^

So a single bit is different from fffffffffffff001, which is the VMX trap pte.

The reports in kerneloops.org and sourceforge exhibit different corruption
patterns though.
Comment 6 Dominick Grift 2009-01-26 11:24:46 EST
Yes i can reproduce it. It happens almost every time i try to install a guest. Only once i managed to complete the installation and even then it crashed when i tried to update the installed system.
Comment 7 Marcelo Tosatti 2009-01-26 13:42:45 EST
OK.

It is not easy as tuning a parameter as mentioned earlier, though. We will 
prepare kvm/kvm-intel modules with debugging. What kernel version are you using?(output of uname -a would be helpful).

Thanks
Comment 8 Dominick Grift 2009-01-26 15:52:22 EST
Linux notebook0.grift.internal 2.6.27.9-159.fc10.x86_64 #1 SMP Tue Dec 16 14:47:52 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
Comment 9 Marcelo Tosatti 2009-01-26 17:15:46 EST
OK, 

Can you please download and insmod the kvm.ko and kvm-intel.ko (after unloading
the ones shipped with FC10 via rmmod) at http://people.redhat.com/~mtosatti/fc10-kvm-modules/

These will print additional information when the error occurs.
Comment 10 Dominick Grift 2009-01-26 17:38:07 EST
Will try this asap and report back. (probably tomorrow evening)
Comment 11 Dominick Grift 2009-01-27 05:29:58 EST
OK, tried it. This time not only did it crash virt-manager but the whole system, and very early in the installation process.

However, there is not much debugging info:

Jan 27 11:00:13 notebook0 kernel: loaded kvm module (kvm-83-633-g5326eee)
Jan 27 11:01:06 notebook0 kernel: Bridge firewalling registered
Jan 27 11:01:06 notebook0 kernel: virbr0: Dropping NETIF_F_UFO since no NETIF_F_HW_CSUM feature.
Jan 27 11:01:06 notebook0 kernel: virbr0: starting userspace STP failed, starting kernel STP
Jan 27 11:01:06 notebook0 avahi-daemon[2605]: Joining mDNS multicast group on interface virbr0.IPv4 with addre
ss 192.168.122.1.
Jan 27 11:01:06 notebook0 avahi-daemon[2605]: New relevant interface virbr0.IPv4 for mDNS.
Jan 27 11:01:06 notebook0 avahi-daemon[2605]: Registering new address record for 192.168.122.1 on virbr0.IPv4.
Jan 27 11:01:07 notebook0 dnsmasq[15149]: started, version 2.45 cachesize 150
Jan 27 11:01:07 notebook0 dnsmasq[15149]: compile time options: IPv6 GNU-getopt no-ISC-leasefile DBus no-I18N 
TFTP
Jan 27 11:01:07 notebook0 dnsmasq[15149]: DHCP, IP range 192.168.122.2 -- 192.168.122.254, lease time 1h
Jan 27 11:01:07 notebook0 dnsmasq[15149]: reading /etc/resolv.conf
Jan 27 11:01:07 notebook0 dnsmasq[15149]: using nameserver 82.197.196.183#53
Jan 27 11:01:07 notebook0 dnsmasq[15149]: using nameserver 82.197.196.182#53
Jan 27 11:01:07 notebook0 dnsmasq[15149]: read /etc/hosts - 7 addresses
Jan 27 11:01:07 notebook0 kernel: Not cloning cgroup for unused subsystem ns
Jan 27 11:01:08 notebook0 ntpd[2539]: Listening on interface #7 virbr0, 192.168.122.1#123 Enabled
Jan 27 11:01:08 notebook0 avahi-daemon[2605]: Registering new address record for fe80::2852:dbff:fef0:3e7e on 
virbr0.*.
Jan 27 11:01:11 notebook0 ntpd[2539]: Listening on interface #8 virbr0, fe80::2852:dbff:fef0:3e7e#123 Enabled
Jan 27 11:03:06 notebook0 kernel: tun: Universal TUN/TAP device driver, 1.6
Jan 27 11:03:06 notebook0 kernel: tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
Jan 27 11:03:06 notebook0 kernel: device vnet0 entered promiscuous mode
Jan 27 11:03:07 notebook0 kernel: virbr0: topology change detected, propagating
Jan 27 11:03:07 notebook0 kernel: virbr0: port 1(vnet0) entering forwarding state
Jan 27 11:03:07 notebook0 NetworkManager: nm_device_ethernet_new: assertion `driver != NULL' failed
Jan 27 11:03:08 notebook0 avahi-daemon[2605]: Registering new address record for fe80::ecad:e6ff:fe7a:33c on v
net0.*.
Jan 27 11:03:10 notebook0 ntpd[2539]: Listening on interface #9 vnet0, fe80::ecad:e6ff:fe7a:33c#123 Enabled
Jan 27 11:03:11 notebook0 nm-system-settings: Adding default connection 'Auto vnet0' for /org/freedesktop/Hal/
devices/net_ee_ad_e6_7a_03_3c
Jan 27 11:03:42 notebook0 dnsmasq[15149]: DHCPDISCOVER(virbr0) 54:52:00:2b:7a:f6 
Jan 27 11:03:42 notebook0 dnsmasq[15149]: DHCPOFFER(virbr0) 192.168.122.166 54:52:00:2b:7a:f6 
Jan 27 11:03:42 notebook0 dnsmasq[15149]: DHCPREQUEST(virbr0) 192.168.122.166 54:52:00:2b:7a:f6 
Jan 27 11:03:42 notebook0 dnsmasq[15149]: DHCPACK(virbr0) 192.168.122.166 54:52:00:2b:7a:f6

This is all. Also in dmesg there is no trace at all.
I will try this again in a bit.
Comment 12 Dominick Grift 2009-01-27 06:32:51 EST
Tried it a couple more times. The first time virt-manager froze (cpu maxed out) while trying to unlock the encrypted disk. The second time virt-manager (guest installation) froze (cpu maxed out) when the first package was about to be installed. Worth noting that both times it did not crash my whole system but it just hung the virt-manager guest installation process.

/var/log/messages did show anything strange to me. (see attached log)

However this time dmesg did show something strange:

"vcpu not ready for apic_round_robin"

Please see attached dmesg snippet.
Comment 13 Dominick Grift 2009-01-27 06:33:48 EST
Created attachment 330076 [details]
"/var/log/messages" snipppet
Comment 14 Dominick Grift 2009-01-27 06:34:28 EST
Created attachment 330078 [details]
"dmesg | less" snippet
Comment 15 Marcelo Tosatti 2009-01-27 09:13:53 EST
domg, 

Hum, probably qemu-kvm is not forward compatible with the version shipped
in FC10.

I've put a tarball at http://people.redhat.com/~mtosatti/kvm-83-mmu-debug.tar.gz

Decompress it, run "./configure; make". Then change your initialization script to use kvm-userspace-mmu-debug/qemu/x86_64-softmmu/qemu-system-x86_64 instead of /usr/bin/qemu-kvm. You can contact me directly via private email if you encounter
problems with that.

Sorry about that.
Comment 16 Marcelo Tosatti 2009-01-27 09:54:08 EST
"Hum, probably qemu-kvm is not forward compatible with the version shipped
in FC10."

Scratch that, there _must be_ backward compatibility for older userspace, as noted by others.

But try to compile that tarball yourself please (you'll have to run qemu manually, in a script, instead of through libvirt) so we can focus on this particular problem.
Comment 17 Mark McLoughlin 2009-01-30 10:54:03 EST
Now that I look again, the oops shows the nvidia binary blob is loaded.

domg: please try and reproduce without the nvidio driver loaded. We have no ability to fix issues caused by a closed source kernel module.
Comment 18 Dominick Grift 2009-01-30 13:50:33 EST
Unfortunately i guess that i am out of luck then because vesa only supports resolution up to 800x600 here and virt-manager cannot handle that resolution.

I tried it and it only displays part of the installation screen, there is no way for me to click the continue button.
Comment 19 Mark McLoughlin 2009-01-31 14:21:42 EST
(In reply to comment #18)
> Unfortunately i guess that i am out of luck then because vesa only supports
> resolution up to 800x600 here and virt-manager cannot handle that resolution.
> 
> I tried it and it only displays part of the installation screen, there is no
> way for me to click the continue button.

Hmm, the bottom of the window is offscreen? How about if you Alt-leftmouse and move the window up?
Comment 20 Dominick Grift 2009-01-31 16:27:15 EST
Exactly. I have just tried it again and i can confirm that this does not work.
Comment 21 Mark McLoughlin 2009-02-01 14:07:59 EST
You could also use virt-install to kick off the installation from the command line.
Comment 22 Marcelo Tosatti 2009-02-02 15:11:18 EST
We've tried to install without the nvidia module loaded and other problems
have been encountered.

The attached png contains a screenshot of an exception during guest installation which seems to indicate hardware problems. 

Dominick will try memtest on the machine and report back. Thanks for the bug report.
Comment 23 Marcelo Tosatti 2009-02-02 15:16:28 EST
Created attachment 330667 [details]
guest install exception
Comment 24 Marcelo Tosatti 2009-02-02 15:18:00 EST
Dominick confirms memtest fails.
Comment 25 Mark McLoughlin 2009-02-03 02:43:20 EST
Thanks guys, closing