Bug 1278688 - Failed to start vm with kvm enabled, regression with kernel 4.2
Failed to start vm with kvm enabled, regression with kernel 4.2
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
23
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
: Patch
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-11-06 02:29 EST by lnie
Modified: 2015-11-30 18:21 EST (History)
25 users (show)

See Also:
Fixed In Version: kernel-4.2.6-301.fc23 kernel-4.2.6-201.fc22
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-11-26 15:55:24 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
screenshot (916.00 KB, image/png)
2015-11-06 02:29 EST, lnie
no flags Details

  None (edit)
Description lnie 2015-11-06 02:29:46 EST
Created attachment 1090485 [details]
screenshot

Description of problem:
 As is shown in the screeshot ,I tried to start a vm by running "qemu-kvm -cdrom " on a f23 system,but failed
 

Version-Release number of selected component (if applicable):
 f23

How reproducible:
always

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
 It seems that this bug has something with the hardware,
 and hp z400 is affected  .
Comment 1 Dangyi Liu 2015-11-06 04:34:53 EST
I installed both f22 and f23 on the same machine, and it turned out f23 failed to start QEMU but f22 succeeded.
Comment 2 Dangyi Liu 2015-11-08 21:43:43 EST
Hi, all.

On Fedora 23 I tested the following combination

kernel-4.2.3-300.fc23.x86_64 + qemu-system-x86_64 -enable-kvm: Failed
kernel-4.2.3-300.fc23.x86_64 + qemu-system-x86_64:             Succeeded
kernel-4.1.7-200.fc22.x86_64 + qemu-system-x86_64 -enable-kvm: Succeeded

So it's a kvm bug.

Dangyi
Comment 3 Cole Robinson 2015-11-10 16:09:25 EST
Moving back to component=qemu for now, the kvm component isn't used anymore

Does qemu print any error messages to the terminal?
Any errors pop up in dmesg -w  when running qemu?
Comment 4 Cole Robinson 2015-11-10 16:11:30 EST
Also, fedora 23 has kernel-4.2.5-300.fc23.x86_64 now too, can you give that a try?
Comment 5 Dangyi Liu 2015-11-11 00:36:22 EST
The bug still exists with kernel-4.2.5 (and upstream kernel v4.2). There's no error message to terminal or dmesg.

After I enable all logs using "-d all" for qemu, the output shows

$ qemu-system-x86_64 -d all -enable-kvm -cdrom Fedora-Live-Workstation-x86_64-23-10.iso -vga std -sdl
CPU Reset (CPU 0)
EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000000
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=00000000 EFL=00000000 [-------] CPL=0 II=0 A20=0 SMM=0 HLT=0
ES =0000 00000000 00000000 00000000
CS =0000 00000000 00000000 00000000
SS =0000 00000000 00000000 00000000
DS =0000 00000000 00000000 00000000
FS =0000 00000000 00000000 00000000
GS =0000 00000000 00000000 00000000
LDT=0000 00000000 00000000 00000000
TR =0000 00000000 00000000 00000000
GDT=     00000000 00000000
IDT=     00000000 00000000
CR0=00000000 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=0000000000000000 DR7=0000000000000000
CCS=00000000 CCD=00000000 CCO=DYNAMIC 
EFER=0000000000000000
FCW=0000 FSW=0000 [ST=0] FTW=ff MXCSR=00000000
FPR0=0000000000000000 0000 FPR1=0000000000000000 0000
FPR2=0000000000000000 0000 FPR3=0000000000000000 0000
FPR4=0000000000000000 0000 FPR5=0000000000000000 0000
FPR6=0000000000000000 0000 FPR7=0000000000000000 0000
XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000
XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000
XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000
XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000
CR0 update: CR0=0x60000010
CPU Reset (CPU 0)
EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000663
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 ffff0000 0000ffff 00009b00
SS =0000 00000000 0000ffff 00009300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT=     00000000 0000ffff
IDT=     00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=00000000 CCD=00000000 CCO=DYNAMIC 
EFER=0000000000000000
FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80
FPR0=0000000000000000 0000 FPR1=0000000000000000 0000
FPR2=0000000000000000 0000 FPR3=0000000000000000 0000
FPR4=0000000000000000 0000 FPR5=0000000000000000 0000
FPR6=0000000000000000 0000 FPR7=0000000000000000 0000
XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000
XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000
XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000
XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000
CR0 update: CR0=0x60000010
CPU Reset (CPU 0)
EAX=00000011 EBX=0000000b ECX=00003000 EDX=000fd294
ESI=00000000 EDI=02000000 EBP=000f5225 ESP=00006ebc
EIP=000fd18d EFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=1 HLT=0
ES =0000 00000000 ffffffff 00809300 DPL=0 DS16 [-WA]
CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
SS =0000 00000000 ffffffff 00809300 DPL=0 DS16 [-WA]
DS =0000 00000000 ffffffff 00809300 DPL=0 DS16 [-WA]
FS =0000 00000000 ffffffff 00809300 DPL=0 DS16 [-WA]
GS =0000 00000000 ffffffff 00809300 DPL=0 DS16 [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
GDT=     000f6ac0 00000037
IDT=     000f6afe 00000000
CR0=00000011 CR2=000fd18d CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=00000000 CCD=00000000 CCO=DYNAMIC 
EFER=0000000000000000
FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80
FPR0=0000000000000000 0000 FPR1=0000000000000000 0000
FPR2=0000000000000000 0000 FPR3=0000000000000000 0000
FPR4=0000000000000000 0000 FPR5=0000000000000000 0000
FPR6=0000000000000000 0000 FPR7=0000000000000000 0000
XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000
XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000
XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000
XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000
CR0 update: CR0=0x60000010
====== hang here =======
Comment 6 Dangyi Liu 2015-11-11 00:46:49 EST
This is the dmesg output after I enabled all kvm dynamic debug flags.

[ 2458.613752] kvm:kvm_get_time_scale:1217: kvm_get_time_scale: base_khz 1000000 => 2666666, shift 2, mul 2863310814
[ 2458.613998] kvm:kvm_write_tsc:1407: kvm: new tsc generation 1, clock 0
[ 2458.625640] kvm:pit_load_count:382: pit: load_count val is 0, channel is 0
[ 2458.625645] kvm:pit_load_count:382: pit: load_count val is 0, channel is 1
[ 2458.625647] kvm:pit_load_count:382: pit: load_count val is 0, channel is 2
[ 2458.637413] kvm:kvm_write_tsc:1382: kvm: matched tsc offset for 0
[ 2458.699411] kvm:pit_load_count:382: pit: load_count val is 65536, channel is 0
[ 2458.699466] kvm:pit_load_count:382: pit: load_count val is 65536, channel is 0
[ 2458.699469] kvm:create_pit_timer:341: pit: create pit timer, interval is 54925447 nsec
[ 2458.699714] kvm:kvm_write_tsc:1382: kvm: matched tsc offset for 0
[ 2458.699842] kvm:pit_load_count:382: pit: load_count val is 65536, channel is 0
[ 2458.699845] kvm:create_pit_timer:341: pit: create pit timer, interval is 54925447 nsec
[ 2458.706728] kvm:pit_load_count:382: pit: load_count val is 65536, channel is 0
[ 2458.706732] kvm:create_pit_timer:341: pit: create pit timer, interval is 54925447 nsec
[ 2458.706738] kvm:pit_ioport_write:466: pit: write addr is 0x3, len is 1, val is 0xb0
[ 2458.706742] kvm:pit_ioport_write:466: pit: write addr is 0x2, len is 1, val is 0x8
[ 2458.706744] kvm:pit_load_count:382: pit: load_count val is 2048, channel is 2
[ 2458.708468] kvm:pit_load_count:382: pit: load_count val is 65536, channel is 0
[ 2458.708471] kvm:create_pit_timer:341: pit: create pit timer, interval is 54925447 nsec
[ 2458.708504] kvm:pit_ioport_write:466: pit: write addr is 0x3, len is 1, val is 0x34
[ 2458.708507] kvm:pit_load_count:382: pit: load_count val is 0, channel is 0
[ 2458.708509] kvm:create_pit_timer:341: pit: create pit timer, interval is 54925447 nsec
[ 2458.715212] kvm:pit_load_count:382: pit: load_count val is 65536, channel is 0
[ 2458.715216] kvm:create_pit_timer:341: pit: create pit timer, interval is 54925447 nsec
[ 2458.715260] kvm:pit_load_count:382: pit: load_count val is 65536, channel is 0
[ 2458.715262] kvm:create_pit_timer:341: pit: create pit timer, interval is 54925447 nsec
[ 2458.716771] kvm:kvm_write_tsc:1382: kvm: matched tsc offset for 0
Comment 7 Cole Robinson 2015-11-11 14:46:24 EST
Can you provide /proc/cpuinfo ?
Another thing to try: yum install fedora-repos-rawhide; yum --enablerepo=rawhide update kernel, reboot and see if you still reproduce

If so I'll ping the kernel devs
Comment 8 Dangyi Liu 2015-11-12 00:15:04 EST
cpuinfo (last one):

processor	: 3
vendor_id	: GenuineIntel
cpu family	: 6
model		: 26
model name	: Intel(R) Xeon(R) CPU           W3520  @ 2.67GHz
stepping	: 5
microcode	: 0x19
cpu MHz		: 2661.000
cache size	: 8192 KB
physical id	: 0
siblings	: 4
core id		: 3
cpu cores	: 4
apicid		: 6
initial apicid	: 6
fpu		: yes
fpu_exception	: yes
cpuid level	: 11
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida dtherm tpr_shadow vnmi flexpriority ept vpid
bugs		:
bogomips	: 5333.52
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:


After I upgrade kernel to 4.4.0-0.rc0.git6.1.fc24.x86_64, the problem is resolved. So it has been fixed upstream.
Comment 9 Dangyi Liu 2015-11-13 05:23:36 EST
Upstream 6d396b55203969ca61cc8f838db2e68433e13f7b introduces this bug. 

Links: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=6d396b55203969ca61cc8f838db2e68433e13f7b

But I didn't find any fix in latest kernel for this patch.

Cc Paolo Bonzini <pbonzini@redhat.com> because he is the patch author.
Comment 10 Paolo Bonzini 2015-11-18 11:16:54 EST
These fixes are needed.  They are in the process of being backported to stable kernels.  Sorry for the breakage!

commit 25188b9986cf6b0cadcf1bc1d1693a2e9c50ed47
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Wed Oct 14 15:51:08 2015 +0200

    KVM: x86: fix previous commit for 32-bit
    
    Unfortunately I only noticed this after pushing.
    
    Fixes: f0d648bdf0a5bbc91da6099d5282f77996558ea4
    Cc: stable@vger.kernel.org
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

commit f0d648bdf0a5bbc91da6099d5282f77996558ea4
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Mon Oct 12 13:56:27 2015 +0200

    KVM: x86: map/unmap private slots in __x86_set_memory_region
    
    Otherwise, two copies (one of them never populated and thus bogus)
    are allocated for the regular and SMM address spaces.  This breaks
    SMM with EPT but without unrestricted guest support, because the
    SMM copy of the identity page map is all zeros.
    
    By moving the allocation to the caller we also remove the last
    vestiges of kernel-allocated memory regions (not accessible anymore
    in userspace since commit b74a07beed0e, "KVM: Remove kernel-allocated
    memory regions", 2010-06-21); that is a nice bonus.
    
    Reported-by: Alexandre DERUMIER <aderumier@odiso.com>
    Cc: stable@vger.kernel.org
    Fixes: 9da0e4d5ac969909f6b435ce28ea28135a9cbd69
    Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

commit 1d8007bdee074fdffcf3539492d8a151a1fb3436
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Mon Oct 12 13:38:32 2015 +0200

    KVM: x86: build kvm_userspace_memory_region in x86_set_memory_region
    
    The next patch will make x86_set_memory_region fill the
    userspace_addr.  Since the struct is not used untouched
    anymore, it makes sense to build it in x86_set_memory_region
    directly; it also simplifies the callers.
    
    Reported-by: Alexandre DERUMIER <aderumier@odiso.com>
    Cc: stable@vger.kernel.org
    Fixes: 9da0e4d5ac969909f6b435ce28ea28135a9cbd69
    Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Comment 11 Dusty Mabe 2015-11-19 12:20:52 EST
As denoted in https://bugzilla.redhat.com/show_bug.cgi?id=1283666 I believe I can confirm the problem is fixed in latest rawhide kernel:

Using kernel kernel-4.4.0-0.rc1.git0.1.fc24.x86_64 it worked:

#on fedora 23:
dnf install fedora-repos-rawhide
dnf update kernel --enablerepo=rawhide
reboot
cd /root/
wget http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-disk.img
/usr/bin/qemu-kvm -m 1024 -name f15 -drive file=/root/cirros-0.3.4-x86_64-disk.img,if=virtio -nographic


Now we just need to get the changes backported to 4.2.
Comment 12 Adam Williamson 2015-11-19 17:52:32 EST
Just to add another voice asking for this to get backported: I just spent two days working out that we're also suffering from the same bug on the boxes we want to use for the official Fedora openQA deployment; all my tests were hanging due to this bug. We'd really like to have this in F23 ASAP so we don't have to run Rawhide kernels on the openQA boxes. thanks!
Comment 13 Justin M. Forbes 2015-11-21 12:45:51 EST
Can someone give https://koji.fedoraproject.org/koji/buildinfo?buildID=700611 a test?  It has those patches backported, but I haven't heard from Paolo yet as to whether those are all that are required.
Comment 14 Adam Williamson 2015-11-23 15:30:27 EST
That seems to work for our case (openQA worker host box using a 2009-era Xeon CPU without unrestricted guest support).
Comment 15 Fedora Update System 2015-11-25 10:17:53 EST
kernel-4.2.6-301.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2015-f26dec73e9
Comment 16 Fedora Update System 2015-11-25 10:19:36 EST
kernel-4.2.6-201.fc22 has been submitted as an update to Fedora 22. https://bodhi.fedoraproject.org/updates/FEDORA-2015-912d8e4998
Comment 17 Fedora Update System 2015-11-25 21:24:47 EST
kernel-4.2.6-201.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report.
If you want to test the update, you can install it with
$ su -c 'dnf --enablerepo=updates-testing update kernel'
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-912d8e4998
Comment 18 Fedora Update System 2015-11-25 21:53:31 EST
kernel-4.2.6-301.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.
If you want to test the update, you can install it with
$ su -c 'dnf --enablerepo=updates-testing update kernel'
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-f26dec73e9
Comment 19 lnie 2015-11-26 00:17:26 EST
kernel-4.2.6-301.fc23 works fine on Xeon  W3520
Comment 20 Fedora Update System 2015-11-26 15:54:57 EST
kernel-4.2.6-301.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.
Comment 21 Fedora Update System 2015-11-30 18:21:25 EST
kernel-4.2.6-201.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.