1261244 – F23 beta tc4 vagrant box yields crashes kvm and tcg qemu

Bug 1261244 - F23 beta tc4 vagrant box yields crashes kvm and tcg qemu

Summary: F23 beta tc4 vagrant box yields crashes kvm and tcg qemu

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	qemu
Sub Component:
Version:	23
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Fedora Virtualization Maintainers
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-09-09 03:28 UTC by Dusty Mabe
Modified:	2015-09-16 16:11 UTC (History)
CC List:	17 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2015-09-16 16:11:29 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Dusty Mabe 2015-09-09 03:28:25 UTC

Description of problem:

I don't know where exactly this bug should go. For now I am filing against kvm and we can move it around as needed.

Several of us have experienced 'emulation failures' in the /var/log/libvirt/qemu/*.log files when trying to run the f23 beta tc4 vagrant box. The log file contains:

<<<<<<<<
2015-09-08 20:59:51.642+0000: starting up libvirt version: 1.2.13.1, package: 2.fc22 (Fedora Project, 2015-06-06-15:21:32, buildvm-13.phx2.fedoraproject.org), qemu version: 2.3.1 (qemu-2.3.1-1.fc22)
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -name f23alpha_f23alpha -S -machine pc-i440fx-2.3,accel=kvm,usb=off -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 97803cd4-16ed-4124-990c-db3e5195d971 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/f23alpha_f23alpha.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/guests/virtstoragepool/f23alpha_f23alpha.img,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/guests/virtstoragepool/f23alpha_f23alpha-vdb.qcow2,if=none,id=drive-virtio-disk1,format=qcow2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:15:98:2f,bus=pci.0,addr=0x6 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:1 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on
char device redirected to /dev/pts/11 (label charserial0)
KVM internal error. Suberror: 1
emulation failure
EAX=00000020 EBX=0010e704 ECX=000000fd EDX=2ef000f8
ESI=00e09241 EDI=00116fe1 EBP=00007b0e ESP=0032afdc
EIP=d35f676c EFL=00010246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0028 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
CS =0020 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
SS =0028 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0028 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
FS =0000 00000000 ffffffff 00c00000
GS =0000 00000000 ffffffff 00c00000
LDT=0000 00000000 ffffffff 00c00000
TR =0008 00000580 00000067 00008b00 DPL=0 TSS32-busy
GDT=     00009400 0000002f
IDT=     00003008 000007ff
CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
qemu: terminating on signal 15 from pid 1203
2015-09-08 21:00:43.842+0000: shutting down
<<<<<<<<

Version-Release number of selected component (if applicable):
I am running f22 but I reported this against f23 because jbrooks reported he had the problem with f23. The versions of software I have on my system are:

[root@media qemu]# rpm -q qemu-kvm libvirt vagrant-libvirt
qemu-kvm-2.3.1-1.fc22.x86_64
libvirt-1.2.13.1-2.fc22.x86_64
vagrant-libvirt-0.0.26-3.fc22.noarch


How reproducible:
Looks to be always

Steps to Reproduce:
1. http://fedoramagazine.org/running-vagrant-fedora-22/#comment-400126
2. vagrant box add f23 http://dl.fedoraproject.org/pub/alt/stage/23_Beta_TC4/Cloud_Images/x86_64/Images/Fedora-Cloud-Base-Vagrant-23_Beta_TC4-20150907.x86_64.vagrant-libvirt.box
3. vagrant init f23
4. vagrant up

Actual results:
KVM internal error. Suberror: 1
emulation failure

Expected results:
Started virtual machine

Additional info:
[root@media qemu]# cat /proc/cpuinfo | grep "model name" | head -n 1
model name      : Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz

Comment 1 Cole Robinson 2015-09-10 12:37:44 UTC

I can reproduce on f23 host too... I'll poke at it

Comment 2 Cole Robinson 2015-09-10 13:39:21 UTC

Non-vagrant reproducer:

wget https://dl.fedoraproject.org/pub/alt/stage/23_Beta_TC4/Cloud_Images/x86_64/Images/Fedora-Cloud-Base-Vagrant-23_Beta_TC4-20150907.x86_64.vagrant-libvirt.box
tar -xvf Fedora-Cloud-Base-Vagrant-23_Beta_TC4-20150907.x86_64.vagrant-libvirt.box
qemu-kvm -machine pc,accel=kvm -m 2048 -display sdl box.img

Fails almost immediately with an 'emulation failure'.

However if I up the memory to -m 4096, it doesn't throw an error, but the boot hangs after printing the Syslinux copyright banner.

Using accel=tcg -m 512 crashes with: qemu: fatal: Trying to execute code outside RAM or ROM at 0x000000002ef000f8

accel=tcg -m 2048 and -m 4096 seems to hang at the Syslinux banner as well.

All the above commands work for at least starting a kernel boot of the (different) Fedora-Cloud-Base image from here: https://dl.fedoraproject.org/pub/alt/stage/23_Beta_TC4/Cloud_Images/x86_64/Images/Fedora-Cloud-Base-23_Beta_TC4-20150907.x86_64.qcow2

Tried with F23 qemu-kvm-2.4.0-2.fc23.x86_64  and the F21 qemu-2.1.3-10.fc21 compiled locally and both reproduced.

All f23 vagrant images seem to have the issue, but the f22 GA vagrant images work fine.

So my guess is that either some syslinux change is tickling a bug in qemu or maybe seabios, or the f23 images are actually bogus.

Dusty, how is the vagrant disk image different from the cloud base image? Does the cloud image use syslinux or grub?

Gerd, paolo, any suggestions how to debug this further?

Comment 3 Dusty Mabe 2015-09-10 14:36:35 UTC

(In reply to Cole Robinson from comment #2)

> Dusty, how is the vagrant disk image different from the cloud base image?
> Does the cloud image use syslinux or grub?
> 

The vagrant image uses the cloud kickstart as a base (so they are very similar, but see changes in links below). Unfortunately in this case it looks like cloud base was switched to use grub for f23 while the vagrant box is still using extlinux. This could be the cause of some of the issues. 

https://git.fedorahosted.org/cgit/spin-kickstarts.git/tree/fedora-cloud-base.ks?h=f23
https://git.fedorahosted.org/cgit/spin-kickstarts.git/tree/fedora-cloud-base-vagrant.ks?h=f23

Comment 4 Dusty Mabe 2015-09-10 15:30:53 UTC

I am going to get a rebuild of the vagrant box with grub rather than extlinux. If that doesn't show any failures then is this issue of concern? 

In other words, if we somehow managed to throw garbage at qemu/kvm is it still considered a bug or is it invalid?

Comment 5 Cole Robinson 2015-09-11 21:15:01 UTC

(In reply to Dusty Mabe from comment #4)
> 
> In other words, if we somehow managed to throw garbage at qemu/kvm is it
> still considered a bug or is it invalid?

I'm not an expert here but it may or may not be a qemu bug... really depends on if what the guest is doing makes sense or not. If the guest is completely legitimate then this is probably qemu/seabios issue that needs solving at some point, but if the image is messed up or syslinux is busted then qemu/kvm falling over like this might be considered fine. Would need someone like Gerd or Paolo who understand this stuff better to chime in.

But I'm guessing the quickest path to 'fixing' this WRT to vagrant usage is to switch to what the cloud images are doing for booting

Comment 6 Gerd Hoffmann 2015-09-15 08:43:05 UTC

> EAX=00000020 EBX=0010e704 ECX=000000fd EDX=2ef000f8
> ESI=00e09241 EDI=00116fe1 EBP=00007b0e ESP=0032afdc
> EIP=d35f676c EFL=00010246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0

> ES =0028 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> CS =0020 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
> SS =0028 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0028 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]

> CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000

EIP looks like it jumped into nowhere.  Paging is not yet enabled, so there is nothing at that address.  So kvm most likely barfs on trying to emulate an invalid instruction.

Segments look like the bootloader (i.e. extlinux) is running at that point.
Unlikely it is something in qemu.  I'd guess either extlinux does something strange (possibly due to toolchain issues), or the x86 emulation in the kernel is buggy and extlinux trips over it.

Hmm, just seeing tcg fails too, for the same reason (try execute code outside ram).  Makes extlinux being buggy more likely.

Does the same extlinux version work on real hardware?

Comment 7 Dusty Mabe 2015-09-16 15:53:10 UTC

So I'm going to say this was an issue on our end. We made the cloud base image use grub and removed the 'helper code for extlinux in our %post in anaconda' but we didn't modify the vagrant image's kickstart (which inherits from cloud base) to use grub so it still used extlinux. 

This resulted in an image that was configured to use extlinux but didn't have any of the "helper code"[1] in place to make it happen. I'm going to close this bug as invalid unless anyone thinks it should be left open. 

[1] https://git.fedorahosted.org/cgit/spin-kickstarts.git/commit/?h=f23&id=bc1f075e4110c5bad913936036b335bd217f6624

Comment 8 Cole Robinson 2015-09-16 16:11:29 UTC

Nah let's close it. Thanks for following up

Note You need to log in before you can comment on or make changes to this bug.