Description of problem: Recently I haven't been able to boot my 32 bit linux guests due kvm getting stuck while guest kernel boots. KVM consumes 200 % cpu load while given 2 cpus to use. Version-Release number of selected component (if applicable): qemu-kvm-0.14.0-7.fc15.x86_64 kernel-2.6.38.8-32.fc15.x86_64 How reproducible: most times (~98% ? of the) boots fail. Steps to Reproduce: 1. download rhel or centos vmlinuz and initrd: wget ftp://ftp.funet.fi/pub/Linux/mirrors/centos/5.6/os/i386/isolinux/{vmlinuz,initrd.img} 2. start guest: qemu-kvm -M pc-0.14 -enable-kvm -m 2048 -smp 2,sockets=2,cores=1,threads=1 -name CentOS -kernel vmlinuz -initrd initrd.img Actual results: booting stops around the time kernel frees unused memory, but the lines might be little earlier or after. It's time dependent (race) rather than actual console output line. Expected results: boot succesfully Additional info: I tried to take perf trace of it, but cpu is too busy that it could do it. I'm willing to take more traces if someone tells me what exactly. System is fedora 15 up to date, with cpu: Intel(R) Core(TM) i7 CPU X 000 @ 3.33GHz and 24GB of ram. The same happened to me on older Xeon too.
some additions after talking with Ikke and doing some tests as well. - the qemu-kvm line is for everybody's ease of testing. Ikke gets the same result when using virt-manager in cliky mode. - RHEL 6.1 host works fine (did change the -M to rhel5.5.0 as there is no pc-0.14 on el6), did 10 successfuil tests in a row - F15 x86_64 also fails for me, roughly 2 out of 3 attempts fail - test with latest qemu and kernel from rawhide will follow shortly (Ikke; bc-blade-02 in the test network downstairs)
(In reply to comment #1) > some additions after talking with Ikke and doing some tests as well. > > - RHEL 6.1 host works fine (did change the -M to rhel5.5.0 as there is no > pc-0.14 on el6), did 10 successfuil tests in a row Ignore the above, the 6.1 test was done on an IBM LS21, that has an AMD CPU and further testing reveals that F15 x86_64 as a host also works on that hardware sorry > - F15 x86_64 also fails for me, roughly 2 out of 3 attempts fail That remains as written. test was done on an Intel CPU (my workstation) but on that box I will not install RHEL 6.1 or rawhide (as it's my main work tool) Ikke has the same problem, he sees the failure on his workstation, so changing distro will hinder his other work too much. I'll hunt for a box in the lab here that actually reproduces the issue under F15 x86_64 host and report back (have a HP pizzabox in mind)
re-testing done on an Intel CPU (Ikke; acpi4-15 in the test network) F15 host; same as reported by Ikke, ~half the boots fail pulled F15 host to rawhide; 10 consecutive successful boots of the guest I guess we can CLOSE this RAWHIDE, is that OK with you Ikke? Rawhide being rawhide, I would not blindly update my workstation as I have done with that test machine. YMMW Another question; want me to re-do the RHEL 6.1 test round on this box? (for now I guess leaving it on rawhide so you can also have a look is more reasonable for your use case)
Thanks pcfe, I tried it several times on the box you setup. It seems not to get stuck. I also spent 1h trying to search kvm mailing list about regression fix, and finally found this regression fix: http://marc.info/?l=linux-kernel&m=129942743310538&w=4 and for anyone else hitting the same problem, here is a workaround: add kernel command line parameter "clocksource=acpi_pm" and it will boot \o/ please upgrade the KVM in fedora to fix the regression.
This is the fix needed: $ git show 1aa8ceef commit 1aa8ceef0312a6aae7dd863a120a55f1637b361d Author: Nikola Ciprich <extmaillist> Date: Wed Mar 9 23:36:51 2011 +0100 KVM: fix kvmclock regression due to missing clock update commit 387b9f97750444728962b236987fbe8ee8cc4f8c moved kvm_request_guest_time_update(vcpu), breaking 32bit SMP guests using kvm-clock. Fix this by moving (new) clock update function to proper place. Signed-off-by: Nikola Ciprich <nikola.ciprich> Acked-by: Zachary Amsden <zamsden> Signed-off-by: Avi Kivity <avi> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 01f08a6..f1e4025 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2127,8 +2127,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) if (check_tsc_unstable()) { kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta); vcpu->arch.tsc_catchup = 1; - kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); } + kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); if (vcpu->cpu != cpu) kvm_migrate_timers(vcpu); vcpu->cpu = cpu;
thanks Ikke. Let's see what the owner of the bug thinks. setting NEEDINFO on bug owner
This seems to be correct and will make the next update.
This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component.
Verified this code is in f15 kernel git, so closing as CURRENTRELEASE