Bug 1013310 - KVM internal error. Suberror: 1 when booting guest
KVM internal error. Suberror: 1 when booting guest
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm (Show other bugs)
7.0
x86_64 Linux
low Severity low
: rc
: ---
Assigned To: Radim Krčmář
Virtualization Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-09-29 04:08 EDT by Xu Han
Modified: 2016-11-03 17:46 EDT (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-11-03 17:46:25 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
screen shot (59.05 KB, image/png)
2013-12-18 23:53 EST, Xu Han
no flags Details
KVM internal error. Suberror: 1 (6.53 KB, text/plain)
2014-01-29 02:10 EST, langfang
no flags Details

  None (edit)
Description Xu Han 2013-09-29 04:08:54 EDT
Description of problem:
KVM internal error. Suberror: 1 when booting guest.
Not sure what directly cause this issue, but with these senarios below can always reproduce.
1.use q35 and pci-e
2.use lots of device

Version-Release number of selected component (if applicable):
seabios-1.7.2.2-3.el7.x86_64
qemu-kvm-rhev-1.5.3-6.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.boot guest without selected boot device
/usr/libexec/qemu-kvm -nodefaults -M q35 -cpu SandyBridge -m 1G -smp 4,cores=2,threads=2,sockets=1 -vnc :1 -spice disable-ticketing,port=5931 -monitor stdio -qmp tcp:0:6666,server,nowait -vga qxl -boot menu=on -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 \
-device ioh3420,bus=pcie.0,id=root \
-device x3130-upstream,bus=root,id=upstream \
-device xio3130-downstream,bus=upstream,id=downstream1,chassis=1 \
-device xio3130-downstream,bus=upstream,id=downstream2,chassis=2 \
-netdev tap,id=netdev0,vhost=on \
-device virtio-net-pci,netdev=netdev0,id=net0,mac=00:1a:4a:42:0a:11 \
-drive file=/home/win7sp1-64.qcow2_v3,format=qcow2,id=guest-img,if=none,werror=stop,rerror=stop \
-device virtio-blk-pci,scsi=off,drive=guest-img,id=os-disk \
-chardev socket,id=isa-serial-1,path=/tmp/isa-serial-1,server,nowait \
-device isa-serial,chardev=isa-serial-1 \
-drive file=/home/init/floppy,if=none,id=drive-fdc0-0-0,format=raw \
-global isa-fdc.driveA=drive-fdc0-0-0 \
-netdev tap,id=hostnet1,vhost=on \
-device rtl8139,netdev=hostnet1,id=net1,mac=00:1a:4a:42:0a:0a,bus=downstream1 \
-netdev tap,id=hostnet2,vhost=on \
-device e1000,netdev=hostnet2,id=net2,mac=00:1a:4a:42:0a:0b,bus=downstream2 \
-usb \
-device usb-tablet,id=input1,port=1 \
-device usb-hub,port=2,id=hub \
-device usb-storage,port=2.4,drive=drive-usb-2-0,id=usb-2-0,removable=on \
-drive file=/home/init/usb,if=none,id=drive-usb-2-0,media=disk,format=qcow2,aio=threads \
-device usb-ehci,id=ehci \
-device usb-storage,drive=drive-usb-0-0,id=usb-0-0,removable=on,bus=ehci.0,port=1 \
-drive file=/home/init/usb2,if=none,id=drive-usb-0-0,media=disk,format=qcow2,aio=native \
-device virtio-balloon-pci,id=ballon_1 \
-drive file=/home/init/disk_ide,if=none,media=disk,id=drive-ide1-1-0,format=qcow2,werror=stop,rerror=stop \
-device ide-hd,drive=drive-ide1-1-0,id=ide-disk0 \
-device intel-hda,id=sound0 -device hda-duplex
2.wait for booting
3.

Actual results:
(qemu) KVM internal error. Suberror: 1
emulation failure
EAX=00000085 EBX=00017b20 ECX=e8663451 EDX=00000000
ESI=c0185188 EDI=e86634a5 EBP=e8663451 ESP=e8663441
EIP=89e6e89f EFL=00010097 [--S-APC] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 3ff1aa70 ffffffff 00c09300 DPL=0 DS   [-WA]
CS =0008 3ff1aa70 ffffffff 00c09f00 DPL=0 CS32 [CRA]
SS =0010 3ff1aa70 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0010 3ff1aa70 ffffffff 00c09300 DPL=0 DS   [-WA]
FS =0010 3ff1aa70 ffffffff 00c09300 DPL=0 DS   [-WA]
GS =0010 3ff1aa70 ffffffff 00c09300 DPL=0 DS   [-WA]
LDT=0000 00000000 ffffffff 00c00000
TR =0030 00005cc4 00000067 00008b00 DPL=0 TSS32-busy
GDT=     0009cf30 00000037
IDT=     00000000 0000ffff
CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

(qemu) info status 
VM status: paused (internal-error)


Expected results:


Additional info:
use seabios-1.7.2.2-2.el7.x86_64 also hit this issue.
Comment 2 xhan 2013-12-05 03:03:51 EST
met this problem on 

qemu-kvm-rhev-1.5.3-20.el7.x86_64

KVM internal error. Suberror: 1
emulation failure
EAX=00000200 EBX=0000aa55 ECX=00000007 EDX=00000080
ESI=00007bd0 EDI=00000800 EBP=000007be ESP=00007be0
EIP=00000684 EFL=00003202 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 ffffffff 00809300
CS =0000 00000000 ffffffff 00809b00
SS =0000 00000000 ffffffff 00809300
DS =0000 00000000 ffffffff 00809300
FS =0000 00000000 ffffffff 00809300
GS =0000 00000000 ffffffff 00809300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT=     000fd808 00000037
IDT=     00000000 000003ff
CR0=00000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=7c 68 01 00 68 10 00 b4 42 8a 56 00 8b f4 cd 13 9f 83 c4 10 <9e> eb 14 b8 01 02 bb 00 7c 8a 56 00 8a 76 01 8a 4e 02 8a 6e 03 cd 13 66 61 73 1c fe 4e 11


command line:
qemu \
    -S  \
    -name 'virt-tests-vm1'  \
    -sandbox on  \
    -M pc-q35-rhel7.0.0  \
    -nodefaults  \
    -vga qxl  \
    -global qxl-vga.vram_size=33554432 \
    -device intel-hda,bus=pcie.0,addr=02 \
    -device hda-duplex  \
    -monitor stdio \
    -chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20131203-203439-mlN3V9L2,server,nowait \
    -device isa-serial,chardev=serial_id_serial0 \
    -device virtio-serial-pci,id=virtio_serial_pci0,bus=pcie.0,addr=03  \
    -chardev socket,id=devvs,path=/tmp/virtio_port-vs-20131203-203439-mlN3V9L2,server,nowait \
    -device virtserialport,chardev=devvs,name=vs,id=vs,bus=virtio_serial_pci0.0  \
    -chardev socket,id=seabioslog_id_20131203-203439-mlN3V9L2,path=/tmp/seabios-20131203-203439-mlN3V9L2,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20131203-203439-mlN3V9L2,iobase=0x402 \
    -device nec-usb-xhci,id=usb1,bus=pcie.0,addr=04 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0,addr=05 \
    -drive id=drive_image1,if=none,cache=none,snapshot=off,aio=native,file=images/win7-64-virtio.qcow2 \
    -device scsi-hd,id=image1,drive=drive_image1 \
    -device virtio-net-pci,mac=9a:5e:5f:60:61:62,id=idRRc9Jr,netdev=idOqrsQV,bus=pcie.0,addr=06  \
    -netdev tap,id=idOqrsQV,vhost=on,script= \
    -m 4096  \
    -smp 1,maxcpus=1,cores=1,threads=1,sockets=2  \
    -cpu 'Penryn',hv_relaxed \
    -drive id=drive_cd1,if=none,snapshot=off,aio=native,media=cdrom,file=isos/ISO/Windows7/en_windows_7_ultimate_with_sp1_x64_dvd_u_677332.iso \
    -device ide-cd,id=cd1,drive=drive_cd1,bus=ide.0,unit=0 \
    -drive id=drive_winutils,if=none,snapshot=off,aio=native,media=cdrom,file=isos/windows/winutils.iso \
    -device ide-cd,id=winutils,drive=drive_winutils,bus=ide.1,unit=0 \
    -drive id=drive_virtio,if=none,snapshot=off,aio=native,media=cdrom,file=isos/windows/virtio-win.latest_prewhql.iso \
    -device ide-cd,id=virtio,drive=drive_virtio,bus=ide.2,unit=0 \
    -drive id=drive_fl,if=none,cache=none,snapshot=off,readonly=off,aio=native,file=images/win7-64/answer.vfd \
    -global isa-fdc.driveA=drive_fl \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -spice port=3000,password=123456,addr=0,tls-port=3200,x509-dir=/tmp/spice_x509d,tls-channel=main,tls-channel=inputs,image-compression=auto_glz,zlib-glz-wan-compression=auto,streaming-video=all,agent-mouse=on,playback-compression=on,ipv4  \
    -rtc base=localtime,clock=host,driftfix=slew  \
    -boot order=cdn,once=d,menu=off \
    -enable-kvm
Comment 4 Xu Han 2013-12-18 23:53:11 EST
Created attachment 838762 [details]
screen shot

Re-tested with these components below:
qemu-kvm-rhev-1.5.3-21.el7.x86_64
seabios-1.7.2.2-6.el7.x86_64
kernel-3.10.0-64.el7.x86_64

Steps same as comment 0.

Results:
qemu-kvm can not boot from guest image disk. And after quit from first NIC boot, vm attempt boot from the second NIC but hang. This time fail to see the kvm internal errors in comment 0.

Disabled 'unrestricted_guest' or using scsi will get same results as above.

If remove ahci disk, then guest would be able to boot from os image disk.
Comment 5 Radim Krčmář 2013-12-19 09:45:03 EST
Thanks, it is a different problem.

The minimized reproducer is:
 /usr/libexec/qemu-kvm -nodefaults -m 1G -vnc :1 -vga qxl \
  -netdev tap,id=netdev0 \
  -device virtio-net-pci,netdev=netdev0 \
  -netdev tap,id=hostnet1 \
  -device rtl8139,netdev=hostnet1

After the first pxe has been loaded, I write 'menu' and select '(local)'; it fails to boot and tries the second card, which usually causes a reboot in my case. (I have seen some hangs too)

The type of cards does not matter, two different ones cause a reboot/hang on the second one.
If the cards are of the same type (eg. two virtio), the second pxe does not reboot, but it does not work either -- it want to load pxelinux.cfg/0000-00... (the sysuuid, which is suspicious) instead of pxelinux.cfg/default.

Is this the bug you are hitting?
Comment 6 juzhang 2013-12-19 21:27:28 EST
Hi Han,

Could you have a look comment5 and update your comment?

Best Regards,
Junyi
Comment 8 Xu Han 2013-12-19 22:39:15 EST
(In reply to Radim Krčmář from comment #5)
> Thanks, it is a different problem.
> 
> The minimized reproducer is:
>  /usr/libexec/qemu-kvm -nodefaults -m 1G -vnc :1 -vga qxl \
>   -netdev tap,id=netdev0 \
>   -device virtio-net-pci,netdev=netdev0 \
>   -netdev tap,id=hostnet1 \
>   -device rtl8139,netdev=hostnet1
> 
> After the first pxe has been loaded, I write 'menu' and select '(local)'; it
> fails to boot and tries the second card, which usually causes a reboot in my
> case. (I have seen some hangs too)
> 
> The type of cards does not matter, two different ones cause a reboot/hang on
> the second one.
> If the cards are of the same type (eg. two virtio), the second pxe does not
> reboot, but it does not work either -- it want to load
> pxelinux.cfg/0000-00... (the sysuuid, which is suspicious) instead of
> pxelinux.cfg/default.
> 
> Is this the bug you are hitting?

Yes, and some points need to note:

1. If remove extra NICs(e.g. just leave virtio-net), the guest will automatic reboot endless after select '(local)'. usually, it just show 'no bootable device' and wait for handling.

2. There is a small different between virtio-scsi and virtio-blk. When using former, after first PXE done qemu-kvm process would hang during second PXE load, but the latter won't.
Comment 9 langfang 2014-01-29 02:07:36 EST
I hit the problem when boot guest with mul usb-storages, the CLI see attachment
Version:
Host
# uname -r
3.10.0-79.el7.x86_64
# rpm -q qemu-kvm
qemu-kvm-1.5.3-41.el7.x86_64

Results:
QEMU 1.5.3 monitor - type 'help' for more information
(qemu) xhci: xhci_kick_ep for disabled endpoint 3,1
xhci: xhci_kick_ep for disabled endpoint 3,1
xhci: xhci_kick_ep for disabled endpoint 3,1
xhci: xhci_kick_ep for disabled endpoint 3,1
xhci: xhci_kick_ep for disabled endpoint 3,1
xhci: xhci_kick_ep for disabled endpoint 3,1
xhci: xhci_kick_ep for disabled endpoint 3,1
xhci: xhci_kick_ep for disabled endpoint 3,1
xhci: xhci_kick_ep for disabled endpoint 3,12
xhci: xhci_kick_ep for disabled endpoint 3,12
xhci: xhci_kick_ep for disabled endpoint 3,12
xhci: xhci_kick_ep for disabled endpoint 3,12
xhci: xhci_kick_ep for disabled endpoint 3,12
xhci: xhci_kick_ep for disabled endpoint 3,12
xhci: xhci_kick_ep for disabled endpoint 3,12
xhci: xhci_kick_ep for disabled endpoint 3,12
xhci: xhci_kick_ep for disabled endpoint 3,20
xhci: xhci_kick_ep for disabled endpoint 3,20
xhci: xhci_kick_ep for disabled endpoint 3,20
xhci: xhci_kick_ep for disabled endpoint 3,20
xhci: xhci_kick_ep for disabled endpoint 3,20
xhci: xhci_kick_ep for disabled endpoint 3,20
xhci: xhci_kick_ep for disabled endpoint 3,20
xhci: xhci_kick_ep for disabled endpoint 3,20

(qemu) 
(qemu) info status
VM status: running
(qemu) main_channel_link: add main channel client
main_channel_handle_parsed: net test: latency 246.876000 ms, bitrate 1115634 bps (1.063951 Mbps) LOW BANDWIDTH
inputs_connect: inputs channel client create
red_dispatcher_set_cursor_peer: 
xhci: reset while running!
KVM internal error. Suberror: 1
emulation failure
EAX=000f1a38 EBX=0000008c ECX=00000010 EDX=0000002c
ESI=00006f84 EDI=40000000 EBP=3fff0000 ESP=00006f68
EIP=a20e49d6 EFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
SS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
FS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
GS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
GDT=     000fd808 00000037
IDT=     000fd846 00000000
CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

(qemu) info status
VM status: paused (internal-error)
(qemu)
Comment 10 langfang 2014-01-29 02:10:19 EST
Created attachment 856939 [details]
KVM internal error. Suberror: 1
Comment 11 Radim Krčmář 2014-01-29 13:21:53 EST
The suberrors usually look very similar and unhelpful. (comment #2 was one of the easy ones -- opcode <9e> wasn't emulated in kvm there)

The error in comment 9 is likely different than the original one, please create a new bug for it.

And please, always mention the whole reproducer, I figured out it is:
 1) boot the guest with command from attachment 856939 [details]
 2) connect with spice
 3) send ctrl+alt+del

(Additional findings:
  qemu system_reset does not fail;
  less than 12 drives do not cause this issue)
Comment 12 langfang 2014-02-08 04:40:52 EST
(In reply to Radim Krčmář from comment #11)
> The suberrors usually look very similar and unhelpful. (comment #2 was one
> of the easy ones -- opcode <9e> wasn't emulated in kvm there)
> 
> The error in comment 9 is likely different than the original one, please
> create a new bug for it.
> 
> And please, always mention the whole reproducer, I figured out it is:
>  1) boot the guest with command from attachment 856939 [details]
>  2) connect with spice
>  3) send ctrl+alt+del
> 
> (Additional findings:
>   qemu system_reset does not fail;
>   less than 12 drives do not cause this issue)

Hi,Radim
  Thanks your attention.Tried on latest version, but can't hit the problem anymore. I will track the problem ,while hit  again, i will report new bug.

# uname -r
3.10.0-84.el7.x86_64
# rpm -q qemu-kvm
qemu-kvm-1.5.3-45.el7.x86_64
Comment 17 Radim Krčmář 2015-09-22 13:19:26 EDT
I can't reproduce it anymore, but because iPXE doesn't get to menu, the bug might not be fixed.  7.3 in any case.
Comment 19 Radim Krčmář 2016-08-30 22:23:29 EDT
I think it is fixed.  Can you still reproduce?

Thanks.  (Also moving to 7.4.)
Comment 20 Radim Krčmář 2016-09-12 15:16:36 EDT
The bug in Customer Portal 01695855 is not related to bug 1013310.  The internal error show that both jump into zeroed memory, but bug 1013310 needed two network cards and a PXE boot to trigger.  Customer Portal 01695855 happens when booting the Linux kernel (far later) and the resulting internal error is different.

There were actually two different bugs (different manifestations) in Customer Portal 01695855:
 1) internal error after a jump to invalid code in real mode
 2) double fault on rdmsr(MSR_GS_BASE) in long mode

Customer Portal 01695855 has a problem with just one host, so it would be good to know if they have have other hosts with the same configuration.

Note You need to log in before you can comment on or make changes to this bug.