Bug 927032 - guest crash when installing RHEL6.3: KVM internal error. Suberror: 1
Summary: guest crash when installing RHEL6.3: KVM internal error. Suberror: 1
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Fedora
Classification: Fedora
Component: qemu
Version: 21
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Fedora Virtualization Maintainers
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-03-24 23:47 UTC by Dan Callaghan
Modified: 2015-01-12 07:05 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-01-12 07:05:04 UTC
Type: Bug


Attachments (Terms of Use)
libvirt XML for guest (2.51 KB, text/plain)
2013-09-12 07:58 UTC, Dan Callaghan
no flags Details
/proc/cpuinfo from host (6.58 KB, text/plain)
2013-09-12 22:25 UTC, Dan Callaghan
no flags Details

Description Dan Callaghan 2013-03-24 23:47:05 UTC
Description of problem:

KVM guest crashes with an error like the following (from libvirt logs). The guest had just finished installing RHEL 6.3, Anaconda was rebooting.

2013-03-24 23:16:16.100+0000: starting up
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -S -M pc-0.14 -enable-kvm -m 2048 -smp 4,sockets=4,cores=1,threads=1 -name beefyguest2 -uuid dad01161-602c-f695-e1da-d9a33806d3fe -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/beefyguest2.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/vg_test2/beefyguest2,if=none,id=drive-virtio-disk0,format=raw,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -netdev tap,fd=20,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:9a:62:b1,bus=pci.0,addr=0x3,bootindex=1 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:5 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
char device redirected to /dev/pts/5
KVM internal error. Suberror: 1
emulation failure
RAX=ffffffff81000122 RBX=0000000001f92000 RCX=0000000001d4e000 RDX=0000000001000000
RSI=0000000000093780 RDI=0000000001a8c000 RBP=0000000000000000 RSP=0000000002367040
R8 =0000000001a8c000 R9 =0000000000000001 R10=0000000000000038 R11=0000000000000038
R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
RIP=ffffffff81000122 RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 0000000000000000 ffffffff 00c00000
CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
SS =0000 0000000000000000 ffffffff 00c00000
DS =0000 0000000000000000 ffffffff 00c00000
FS =0000 0000000000000000 ffffffff 00c00000
GS =0000 0000000000000000 ffffffff 00c00000
LDT=0000 0000000000000000 ffffffff 00c00000
TR =0020 0000000000000000 00000fff 00808b00 DPL=0 TSS64-busy
GDT=     00000000004c9ff8 00000030
IDT=     0000000000000000 00000000
CR0=80000011 CR2=0000000000000000 CR3=000000009238e090 CR4=000000a0
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000500
Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

Version-Release number of selected component (if applicable):

libvirt-0.9.11.7-1.fc17.x86_64
qemu-kvm-1.0.1-2.fc17.x86_64
kernel-3.6.9-2.fc17.x86_64

How reproducible:

Not sure exactly what triggers it, but it happens for me roughly once a week at random. It happens at some point during Anaconda install in the guest (I do lots of Anaconda installs on these guests, they are used by Beaker).

Steps to Reproduce:
1. Run a whole bunch of Anaconda installs in the guest
  
Actual results:

Crash

Expected results:

No crash

Additional info:

The guest is currently paused by libvirt, I will leave it like that for a while in case any other info is needed from it before I restart it.

Comment 1 Cole Robinson 2013-04-01 23:08:39 UTC
Can you pull the latest f17 kernel from updates-testing and try to reproduce?

Comment 2 Dan Callaghan 2013-04-04 06:24:12 UTC
After upgrading to kernel-3.8.4-102.fc17.x86_64 I haven't been able to reproduce this. So I guess we can consider it fixed. I will re-open the bug if I see the error again.

Comment 3 Dan Callaghan 2013-09-11 22:07:20 UTC
This is still happening with:

kernel-3.10.10-200.fc19.x86_64
qemu-kvm-1.4.2-7.fc19.x86_64
libvirt-1.0.5.5-1.fc19.x86_64

It happens on about 50% of my installations now, which seems like a higher rate than previously. I can provide any core dumps, stack traces, or logs which might help to debug this.

2013-09-11 03:17:07.874+0000: starting up
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -name beefyguest1 -S -machine pc-0.14,accel=kvm,usb=off -m 2048 -smp 4,sockets=4,cores=1,threads=1 -uuid 3e35960b-3b9b-c6f5-4c7e-f6df56308e34 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/beefyguest1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot menu=off -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/vg_test2/beefyguest1,if=none,id=drive-virtio-disk0,format=raw,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -netdev tap,fd=31,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:9a:62:b0,bus=pci.0,addr=0x3,bootindex=1 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:5 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
char device redirected to /dev/pts/5 (label charserial0)
KVM internal error. Suberror: 1
emulation failure
RAX=ffffffff81000122 RBX=0000000001f8b000 RCX=0000000001d55000 RDX=0000000001000000
RSI=0000000000093780 RDI=0000000001a8c000 RBP=0000000000000000 RSP=000000000236e140
R8 =0000000001a8c000 R9 =0000000000000001 R10=0000000000000038 R11=0000000000000038
R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
RIP=ffffffff81000122 RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 0000000000000000 ffffffff 00c00000
CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
SS =0000 0000000000000000 ffffffff 00c00000
DS =0000 0000000000000000 ffffffff 00c00000
FS =0000 0000000000000000 ffffffff 00c00000
GS =0000 0000000000000000 ffffffff 00c00000
LDT=0000 0000000000000000 ffffffff 00c00000
TR =0020 0000000000000000 00000fff 00808b00 DPL=0 TSS64-busy
GDT=     00000000004d80d8 00000030
IDT=     0000000000000000 00000000
CR0=80000011 CR2=0000000000000000 CR3=000000009238e090 CR4=000000a0
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000500
Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

Comment 4 Richard W.M. Jones 2013-09-12 07:51:34 UTC
Is there anything printed in the kernel log (dmesg) when
this error happens?

Comment 5 Dan Callaghan 2013-09-12 07:58:33 UTC
Created attachment 796665 [details]
libvirt XML for guest

(In reply to Richard W.M. Jones from comment #4)
> Is there anything printed in the kernel log (dmesg) when
> this error happens?

Yes, it looks like these two errors appear from the kernel for each crash:

[95864.764587] qemu-system-x86: sending ioctl 5326 to a partition!
[95864.764859] qemu-system-x86: sending ioctl 80200204 to a partition!

The guests all use LVM logical volumes for their virtual disks (virtio, format raw, cache none). Complete guest XML definition is attached.

Comment 6 Dan Callaghan 2013-09-12 08:05:45 UTC
(In reply to Dan Callaghan from comment #5)
> Yes, it looks like these two errors appear from the kernel for each crash:
> 
> [95864.764587] qemu-system-x86: sending ioctl 5326 to a partition!
> [95864.764859] qemu-system-x86: sending ioctl 80200204 to a partition!

Looking closer, I'm not at all certain that these correspond to crashes. They might just be when the guests start normally. It's a bit hard to tell when the guests crashed because the "KVM internal error" in the libvirt logs isn't timestamped. But I can extrapolate from the Beaker logs roughly when the crashes happened and the kernel messages don't line up.

There's nothing else of interest in kernel messages, only this type of thing which I'm sure is normal:

[95844.450285] br0: port 6(vnet4) entered forwarding state

Comment 7 Dan Callaghan 2013-09-12 08:08:54 UTC
In case it matters, I'm also seeing this crash on an F18 box with slightly different hardware but very similar guest setup, running kernel-3.10.10-100.fc18.x86_64.

Comment 8 Richard W.M. Jones 2013-09-12 08:45:42 UTC
(In reply to Dan Callaghan from comment #5)
> Created attachment 796665 [details]
> libvirt XML for guest
> 
> (In reply to Richard W.M. Jones from comment #4)
> > Is there anything printed in the kernel log (dmesg) when
> > this error happens?
> 
> Yes, it looks like these two errors appear from the kernel for each crash:
> 
> [95864.764587] qemu-system-x86: sending ioctl 5326 to a partition!
> [95864.764859] qemu-system-x86: sending ioctl 80200204 to a partition!

These wouldn't be connected to this crash.

The error is 'KVM_INTERNAL_ERROR_EMULATION' which means different
things on Intel & AMD host processors.  Is the host processor Intel
or AMD (and what precise model?  /proc/cpuinfo would be useful here).

Comment 9 Dan Callaghan 2013-09-12 22:25:24 UTC
Created attachment 797053 [details]
/proc/cpuinfo from host

The host CPU is an Intel Core i7 870. /proc/cpuinfo for the host is attached.

The other box where I am seeing the same crash has a Xeon W3550, I can attach its /proc/cpuinfo too if needed.

Comment 10 Richard W.M. Jones 2013-09-13 09:06:48 UTC
It's some internal error deep inside KVM.  I have no idea why
it happens, but you could help by adjusting the libvirt
configuration to see if you can reliably make the error appear
and go away (eg. by adding/removing a particular device).
Otherwise I'd suggest posting a bug in the upstream qemu tracker.

Comment 11 Cole Robinson 2013-10-31 20:30:04 UTC
Dan, still seeing this with latest F19 packages? I'd also be interested to know if trying kernel 3.12 makes any difference, here's the command, but it may pull in other dependencies, if so don't worry:

sudo yum install fedora-release-rawhide
sudo yum --enablerepo=rawhide update kernel

Comment 12 Cole Robinson 2013-11-17 19:24:23 UTC
Latest rawhide kernel has quite a few emulation fixes for kvm, so I'm going to assume this is fixed in the 3.13 snapshots which will eventually end up in f19. Closing, please reopen if that's not the case.

Comment 13 Dan Callaghan 2014-02-03 02:33:28 UTC
This is still happening with kernel-3.13.0-0.rc8.git0.1.fc21.x86_64.

KVM internal error. Suberror: 1
emulation failure
RAX=ffffffff81000122 RBX=0000000001f8b000 RCX=0000000001d55000 RDX=0000000001000000
RSI=0000000000093780 RDI=0000000001a8c000 RBP=0000000000000000 RSP=000000000236e140
R8 =0000000001a8c000 R9 =0000000000000001 R10=0000000000000038 R11=0000000000000038
R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
RIP=ffffffff81000122 RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 0000000000000000 ffffffff 00c00000
CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
SS =0000 0000000000000000 ffffffff 00c00000
DS =0000 0000000000000000 ffffffff 00c00000
FS =0000 0000000000000000 ffffffff 00c00000
GS =0000 0000000000000000 ffffffff 00c00000
LDT=0000 0000000000000000 ffffffff 00c00000
TR =0020 0000000000000000 00000fff 00808b00 DPL=0 TSS64-busy
GDT=     00000000004d80d8 00000030
IDT=     0000000000000000 00000000
CR0=80000011 CR2=0000000000000000 CR3=000000009238e090 CR4=000000a0
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000500
Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
qemu: terminating on signal 15 from pid 819

Comment 14 Cole Robinson 2014-09-08 13:17:54 UTC
Anyone still hitting this with latest packages? If so, please list the versions

Comment 15 Dan Callaghan 2014-09-08 22:00:02 UTC
I'm still seeing this occasionally with:

kernel-3.15.6-200.fc20.x86_64
qemu-kvm-1.6.2-6.fc20.x86_64

Comment 16 Cole Robinson 2014-09-11 19:31:30 UTC
paolo, is this a known issue? any extra info dan can provide?

Comment 17 Paolo Bonzini 2014-09-12 12:52:01 UTC
No, it's not known...

Comment 18 Fedora End Of Life 2015-01-09 17:48:56 UTC
This message is a notice that Fedora 19 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 19. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained. Approximately 4 (four) weeks from now this bug will
be closed as EOL if it remains open with a Fedora 'version' of '19'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 19 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 19 Dan Callaghan 2015-01-12 07:05:04 UTC
I can't reproduce this anymore using kernel-3.17.8-300.fc21.x86_64 and qemu-kvm-2.1.2-7.fc21.x86_64.


Note You need to log in before you can comment on or make changes to this bug.