Red Hat Bugzilla – Bug 1261244
F23 beta tc4 vagrant box yields crashes kvm and tcg qemu
Last modified: 2015-09-16 12:11:29 EDT
Description of problem:
I don't know where exactly this bug should go. For now I am filing against kvm and we can move it around as needed.
Several of us have experienced 'emulation failures' in the /var/log/libvirt/qemu/*.log files when trying to run the f23 beta tc4 vagrant box. The log file contains:
2015-09-08 20:59:51.642+0000: starting up libvirt version: 18.104.22.168, package: 2.fc22 (Fedora Project, 2015-06-06-15:21:32, buildvm-13.phx2.fedoraproject.org), qemu version: 2.3.1 (qemu-2.3.1-1.fc22)
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -name f23alpha_f23alpha -S -machine pc-i440fx-2.3,accel=kvm,usb=off -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 97803cd4-16ed-4124-990c-db3e5195d971 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/f23alpha_f23alpha.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/guests/virtstoragepool/f23alpha_f23alpha.img,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/guests/virtstoragepool/f23alpha_f23alpha-vdb.qcow2,if=none,id=drive-virtio-disk1,format=qcow2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:15:98:2f,bus=pci.0,addr=0x6 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:1 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on
char device redirected to /dev/pts/11 (label charserial0)
KVM internal error. Suberror: 1
EAX=00000020 EBX=0010e704 ECX=000000fd EDX=2ef000f8
ESI=00e09241 EDI=00116fe1 EBP=00007b0e ESP=0032afdc
EIP=d35f676c EFL=00010246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0028 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
CS =0020 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
SS =0028 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
DS =0028 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
FS =0000 00000000 ffffffff 00c00000
GS =0000 00000000 ffffffff 00c00000
LDT=0000 00000000 ffffffff 00c00000
TR =0008 00000580 00000067 00008b00 DPL=0 TSS32-busy
GDT= 00009400 0000002f
IDT= 00003008 000007ff
CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
qemu: terminating on signal 15 from pid 1203
2015-09-08 21:00:43.842+0000: shutting down
Version-Release number of selected component (if applicable):
I am running f22 but I reported this against f23 because jbrooks reported he had the problem with f23. The versions of software I have on my system are:
[root@media qemu]# rpm -q qemu-kvm libvirt vagrant-libvirt
Looks to be always
Steps to Reproduce:
2. vagrant box add f23 http://dl.fedoraproject.org/pub/alt/stage/23_Beta_TC4/Cloud_Images/x86_64/Images/Fedora-Cloud-Base-Vagrant-23_Beta_TC4-20150907.x86_64.vagrant-libvirt.box
3. vagrant init f23
4. vagrant up
KVM internal error. Suberror: 1
Started virtual machine
[root@media qemu]# cat /proc/cpuinfo | grep "model name" | head -n 1
model name : Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
I can reproduce on f23 host too... I'll poke at it
tar -xvf Fedora-Cloud-Base-Vagrant-23_Beta_TC4-20150907.x86_64.vagrant-libvirt.box
qemu-kvm -machine pc,accel=kvm -m 2048 -display sdl box.img
Fails almost immediately with an 'emulation failure'.
However if I up the memory to -m 4096, it doesn't throw an error, but the boot hangs after printing the Syslinux copyright banner.
Using accel=tcg -m 512 crashes with: qemu: fatal: Trying to execute code outside RAM or ROM at 0x000000002ef000f8
accel=tcg -m 2048 and -m 4096 seems to hang at the Syslinux banner as well.
All the above commands work for at least starting a kernel boot of the (different) Fedora-Cloud-Base image from here: https://dl.fedoraproject.org/pub/alt/stage/23_Beta_TC4/Cloud_Images/x86_64/Images/Fedora-Cloud-Base-23_Beta_TC4-20150907.x86_64.qcow2
Tried with F23 qemu-kvm-2.4.0-2.fc23.x86_64 and the F21 qemu-2.1.3-10.fc21 compiled locally and both reproduced.
All f23 vagrant images seem to have the issue, but the f22 GA vagrant images work fine.
So my guess is that either some syslinux change is tickling a bug in qemu or maybe seabios, or the f23 images are actually bogus.
Dusty, how is the vagrant disk image different from the cloud base image? Does the cloud image use syslinux or grub?
Gerd, paolo, any suggestions how to debug this further?
(In reply to Cole Robinson from comment #2)
> Dusty, how is the vagrant disk image different from the cloud base image?
> Does the cloud image use syslinux or grub?
The vagrant image uses the cloud kickstart as a base (so they are very similar, but see changes in links below). Unfortunately in this case it looks like cloud base was switched to use grub for f23 while the vagrant box is still using extlinux. This could be the cause of some of the issues.
I am going to get a rebuild of the vagrant box with grub rather than extlinux. If that doesn't show any failures then is this issue of concern?
In other words, if we somehow managed to throw garbage at qemu/kvm is it still considered a bug or is it invalid?
(In reply to Dusty Mabe from comment #4)
> In other words, if we somehow managed to throw garbage at qemu/kvm is it
> still considered a bug or is it invalid?
I'm not an expert here but it may or may not be a qemu bug... really depends on if what the guest is doing makes sense or not. If the guest is completely legitimate then this is probably qemu/seabios issue that needs solving at some point, but if the image is messed up or syslinux is busted then qemu/kvm falling over like this might be considered fine. Would need someone like Gerd or Paolo who understand this stuff better to chime in.
But I'm guessing the quickest path to 'fixing' this WRT to vagrant usage is to switch to what the cloud images are doing for booting
> EAX=00000020 EBX=0010e704 ECX=000000fd EDX=2ef000f8
> ESI=00e09241 EDI=00116fe1 EBP=00007b0e ESP=0032afdc
> EIP=d35f676c EFL=00010246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0028 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
> CS =0020 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
> SS =0028 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
> DS =0028 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
> CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
EIP looks like it jumped into nowhere. Paging is not yet enabled, so there is nothing at that address. So kvm most likely barfs on trying to emulate an invalid instruction.
Segments look like the bootloader (i.e. extlinux) is running at that point.
Unlikely it is something in qemu. I'd guess either extlinux does something strange (possibly due to toolchain issues), or the x86 emulation in the kernel is buggy and extlinux trips over it.
Hmm, just seeing tcg fails too, for the same reason (try execute code outside ram). Makes extlinux being buggy more likely.
Does the same extlinux version work on real hardware?
So I'm going to say this was an issue on our end. We made the cloud base image use grub and removed the 'helper code for extlinux in our %post in anaconda' but we didn't modify the vagrant image's kickstart (which inherits from cloud base) to use grub so it still used extlinux.
This resulted in an image that was configured to use extlinux but didn't have any of the "helper code" in place to make it happen. I'm going to close this bug as invalid unless anyone thinks it should be left open.
Nah let's close it. Thanks for following up