Bug 539745

Summary: Windows guests consistently blue screen on FC12
Product: [Fedora] Fedora Reporter: Wylie Edwards <wylie>
Component: kvmAssignee: Glauber Costa <gcosta>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 12CC: berrange, clalance, ehabkost, gcosta, jforbes, markmc, quintela, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-11-23 18:28:32 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Wylie Edwards 2009-11-20 23:10:42 UTC
Description of problem:

Upgraded from FC11 to FC12.  Started existing windows guests (Windows XP and a Windows 2000 Server) that previously had been functioning without issue.  These machines now either boot to loging and blue screen within about 15 mins, or blue screen on boot.

Does not appear to affect linux guests.

Version-Release number of selected component (if applicable):

kernel : 2.6.31.5-127.fc12.i686.PAE
qemu-system-x86-0.11.0-11.fc12.i686

How reproducible:

Start a previously working windows guest with just the basics as per the following;

qemu-kvm -m 2047 -smp 2 -usbdevice tablet -localtime -hda /var/lib/libvirt/images/disk1.vmdk -hdb /var/lib/libvirt/images/disk2.vmdk -net nic

  
Actual results:

Guest will blue screen (usually either on boot or within 5 - 10 mins of starting).  Additional windows guests can be started to speed up a crash.

A few seconds prior to the guest blue screening, the following is logged in/var/log/messages;

Nov 21 09:47:38 core kernel: BUG: Bad page state in process qemu-kvm  pfn:1a406
Nov 21 09:47:38 core kernel: page:c18f80c0 flags:40000004 count:0 mapcount:0 mapping:(null) index:b0ac9 (Tainted: G    B     )
Nov 21 09:47:38 core kernel: Pid: 11018, comm: qemu-kvm Tainted: G    B      2.6.31.5-127.fc12.i686.PAE #1
Nov 21 09:47:38 core kernel: Call Trace:
Nov 21 09:47:38 core kernel: [<c049ecc8>] bad_page+0xdf/0xf4
Nov 21 09:47:38 core kernel: [<c049fc51>] get_page_from_freelist+0x28d/0x364
Nov 21 09:47:38 core kernel: [<c049fe10>] __alloc_pages_nodemask+0xe8/0x447
Nov 21 09:47:38 core kernel: [<c04a20c7>] ? add_page_to_lru_list+0x3d/0x42
Nov 21 09:47:38 core kernel: [<c04a2380>] ? ____pagevec_lru_add+0xf7/0x105
Nov 21 09:47:38 core kernel: [<c04af062>] alloc_pages_node.clone.0+0x16/0x18
Nov 21 09:47:38 core kernel: [<c04afd63>] handle_mm_fault+0x1d7/0x8f2
Nov 21 09:47:38 core kernel: [<c04b0770>] __get_user_pages+0x2f2/0x3d1
Nov 21 09:47:38 core kernel: [<c04b087d>] get_user_pages+0x2e/0x35
Nov 21 09:47:38 core kernel: [<c04296f8>] get_user_pages_fast+0xef/0x116
Nov 21 09:47:38 core kernel: [<f848f3e4>] gfn_to_pfn+0x4c/0xfa [kvm]
Nov 21 09:47:38 core kernel: [<f849d24d>] paging32_page_fault+0xca/0x347 [kvm]
Nov 21 09:47:38 core kernel: [<c0500000>] ? proc_pid_cmdline+0x3f/0xcd
Nov 21 09:47:38 core kernel: [<f849b10f>] kvm_mmu_page_fault+0x1b/0x7a [kvm]
Nov 21 09:47:38 core kernel: [<c0500000>] ? proc_pid_cmdline+0x3f/0xcd
Nov 21 09:47:38 core kernel: [<c0500000>] ? proc_pid_cmdline+0x3f/0xcd
Nov 21 09:47:38 core kernel: [<f84ddda9>] handle_exception+0x15d/0x286 [kvm_intel]
Nov 21 09:47:38 core kernel: [<f84de0e9>] vmx_handle_exit+0x181/0x1b7 [kvm_intel]
Nov 21 09:47:38 core kernel: [<f84973dd>] kvm_arch_vcpu_ioctl_run+0x8b3/0xb47 [kvm]
Nov 21 09:47:38 core kernel: [<f848e4d8>] kvm_vcpu_ioctl+0xeb/0x4b6 [kvm]
Nov 21 09:47:38 core kernel: [<c056ce8d>] ? inode_has_perm+0x69/0x84
Nov 21 09:47:38 core kernel: [<c056cf31>] ? file_has_perm+0x89/0xa3
Nov 21 09:47:38 core kernel: [<f848e3ed>] ? kvm_vcpu_ioctl+0x0/0x4b6 [kvm]
Nov 21 09:47:38 core kernel: [<c04d338f>] vfs_ioctl+0x1d/0x76
Nov 21 09:47:38 core kernel: [<c04d392e>] do_vfs_ioctl+0x498/0x4d6
Nov 21 09:47:38 core kernel: [<c056d1d5>] ? selinux_file_ioctl+0x43/0x46
Nov 21 09:47:38 core kernel: [<c04d39b2>] sys_ioctl+0x46/0x66
Nov 21 09:47:38 core kernel: [<c0408f7b>] sysenter_do_call+0x12/0x28

Expected results:

Windows guests run without crashing.

Comment 1 Justin M. Forbes 2009-11-23 18:28:32 UTC
This appears to be a duplicate of bug 532215, and is a KSM issue that should be resolved in the next kernel.  You can test kernel-2.6.31.6-145.fc12 from updates-testing, or wait until the system boots and 'sudo service ksmtuned stop; sudo service ksm stop' from the command line.  Once the new kernel is installed, there will be no need to stop these services.

*** This bug has been marked as a duplicate of bug 532215 ***

Comment 2 Wylie Edwards 2009-11-23 22:19:50 UTC
Just an update on this, and not sure if it helps at all, but over the past few days I have found what appears to be a resolution to my issue.

I removed all kvm / libvirt packages from this machine.  I then scoured the file system looking for anything that was still left, and found a bunch of libraries and xml files and dont seem to get removed when the packages are removed.  I manually deleted these (files like libvirt.so.0 and all xml files (except host definitions) contained in /etc/libvirtd/).

Then I installed all the required packages again, and the problem appears to have gone away.  This is telling me the issue might be related to the upgrade process not cleaning up properly, or not removing older libraries/config files before installing the new ones.

For the last 24 hours I have been running happily without these errors (I also have ksm and ksmtuned services running as well).