Bug 1175502 - qemu instances started by nova fail to boot
Summary: qemu instances started by nova fail to boot
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 6.0 (Juno)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 6.0 (Juno)
Assignee: Virtualization Maintenance
QA Contact: yeylon@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-12-17 21:26 UTC by Lars Kellogg-Stedman
Modified: 2016-04-18 06:59 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-12-16 15:31:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Lars Kellogg-Stedman 2014-12-17 21:26:52 UTC
With qemu-kvm-rhev-1.5.3-60.el7_0.10.x86_64 and "virt_type = qemu" in nova.conf, instances fail to boot.  Nova successfully starts a qemu-kvm process:

# ps -fe | grep instance-0000000c
qemu     27714     1 88 16:22 ?        00:01:59 /usr/libexec/qemu-kvm -name instance-0000000c -S -machine pc-i440fx-rhel7.0.0,accel=tcg,usb=off -m 512 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid c6a76589-2713-4298-911c-dc03bd01e992 -smbios type=1,manufacturer=Fedora Project,product=OpenStack Nova,version=2014.2.1-7.el7ost,serial=622e4ef0-ebb5-48b5-a1d2-d8567c30ea7c,uuid=c6a76589-2713-4298-911c-dc03bd01e992 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/instance-0000000c.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/nova/instances/c6a76589-2713-4298-911c-dc03bd01e992/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=25,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:93:bd:96,bus=pci.0,addr=0x3 -chardev file,id=charserial0,path=/var/lib/nova/instances/c6a76589-2713-4298-911c-dc03bd01e992/console.log -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0 -vnc 0.0.0.0:0 -k en-us -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

But the kernel gets stuck booting.  The last messages logged to the console are:

# nova console-log test1 | tail
[    0.184010] Freeing SMP alternatives: 24k freed
[    0.184010] ACPI: Core revision 20110623
[    0.204522] ftrace: allocating 27027 entries in 106 pages
[    0.219204] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.236014] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
[    0.236014] ...trying to set up timer (IRQ0) through the 8259A ...
[    0.236014] ..... (found apic 0 pin 2) ...
[    0.252015] ....... failed.
[    0.252015] ...trying to set up timer as Virtual Wire IRQ...

Downgrading to qemu-kvm-rhev-1.5.3-60.el7_0.7.x86_64 (and with no other changes), everything works as expected.

Comment 2 Richard W.M. Jones 2014-12-18 14:35:41 UTC
The bug you specifically mention looks a lot like one which
you could solve by adding no_timer_check to the kernel command
line (this is the default in modern kernels, but you don't mention
what kernel version this is).

Comment 3 Richard W.M. Jones 2014-12-18 14:36:42 UTC
I'd also like to pimp qemu-sanity-check:

http://people.redhat.com/~rjones/qemu-sanity-check/

It only has very minimal dependencies (just gcc, glibc-static and bash)
and can test if a kernel is compatible with a qemu.

Comment 4 Kashyap Chamarthy 2014-12-18 14:42:45 UTC
Ah, the no_timer_check was discussed here 

  https://bugs.launchpad.net/cirros/+bug/1312199

Speaking of no_timer_check, Daniel Berrnage once pointed 
me to this commit[1] in upstream Nova):


commit 6b86a61fee15ce1237303fab2f7896f8c3bcad47
Author: Attila Fazekas <afazekas>
Date:   Wed May 28 09:19:29 2014 +0200

    Use no_timer_check with soft-qemu
    
    The Linux kernel timer check not working properly
    when the hypervisor's thread preempted by the host CPU scheduler.
    
    The timer check is automatically disabled with other types
    of hypervisors including the hardware accelerated kvm,
    but timer_check is not disabled when qemu used without hardware acceleration.
    
    This issue is frequently mischaracterized as an SSH connectivity issue and
    causes rechecks and occasional boot failures.
    
    This change adds no_timer_check kernel parameter when we are using
    uec images with qemu.
    
    Closes-Bug: #1312199
    Change-Id: I3cfdfe9048fe219fc12cdac8a399b496f237e55e


[1]  https://review.openstack.org/#/c/96090/

Comment 5 Lars Kellogg-Stedman 2014-12-18 15:15:34 UTC
There are already bugs open against our guest images to add the no_timer_check parameter:

- https://bugzilla.redhat.com/show_bug.cgi?id=1144155
- https://bugzilla.redhat.com/show_bug.cgi?id=1147035

So maybe this is CLOSE NOTABUG, but there is a difference in behavior with these two qemu versions.

Comment 6 Lars Kellogg-Stedman 2014-12-18 15:20:06 UTC
Miroslav, do you know if there were any changes that might account for this? I don't see anything obvious in the package changelog.

Comment 7 Miroslav Rezanina 2014-12-18 17:32:45 UTC
Changelog contains all changes done in qemu-kvm-rhev between -7 and -10 version. I suspect the vmstate_xhci_event patches to be the culprit but I do not know how they can cause this?

Any idea Laszlo?

Comment 8 Laszlo Ersek 2014-12-19 11:56:36 UTC
Nothing seems relevant. I suggest trying each official build in the interval, and then bisecting the "culprit build" patch for patch.


Note You need to log in before you can comment on or make changes to this bug.