Created attachment 879852 [details] boot failure Description of problem: F20 cloud image, when launched never finishes booting. Uncertain what the issue is. Occasionally if I watch the console while it boots, the instance will fully launch, and I will be able to login, as well as ssh to the instance from a floating IP. However, most times, it fails with the error on the console: MP-BIOS bug: 8254 timer not connected to IO-APIC Version-Release number of selected component (if applicable): latest versions of openstack packages installed via packstack. How reproducible: setup latest icehouse, run packstack (two node configuration in my setup, with control+neutron on one node, compute on the other) Steps to Reproduce: 1. launch F20 cloud image. It will fail to fully launch 2. try the same thing with Cirros and it will launch. Additional info: Console screenshot attached.
I should mention, this is with nested virtualization, i.e. I'm running icehouse on two nodes under virt-manager on F20. and the F20 cloud image is being launched in RDO running on the two nodes with libvirt_type=qemu in nova.conf on the compute VM.
also, the two nodes under virt-manager are running F20 as well. If needed, I can provide the package manifests on control and compute nodes.
This is actually in RDO, icehouse for F20 is only available through the RDO repos.
Hi Matt, Do you have any insight on why this might have occurred and whether it's still likely to be an issue? Thanks, Steve
I don't have any idea -- this is a bit too low-level for my expertise. Are the kernel/virt people aware?
See alos: https://bugzilla.redhat.com/show_bug.cgi?id=1102592 You can try to use http://dl.fedoraproject.org/pub/alt/openstack/20/x86_64/Fedora-x86_64-20-20140618-sda.qcow2 which contains the no_timer_check option and used by the openstack gate tests. Now openstack passes the 'no_timer_check' option to the amazon style images when qemu used, but this is not possible with other images. The image has to be created with no_timer_check boot option.
Kambiz, can you please test with this new cloud image and confirm here if it works for you? You can examine the disk image to see the option Attila mentions: --------------- $ guestfish --ro -i -a Fedora-x86_64-20-20140618-sda.qcow2 [. . .] ><fs> cat /etc/grub.conf default=0 timeout=0 title Fedora (3.11.10-301.fc20.x86_64) root (hd0) kernel /boot/vmlinuz-3.11.10-301.fc20.x86_64 ro root=UUID=314b4a27-3885-49e8-9415-af098db4fd2a no_timer_check console=hvc0 LANG=en_US.UTF-8 initrd /boot/initramfs-3.11.10-301.fc20.x86_64.img ><fs> --------------- This is the relevant commit in Nova[1] that Attila is referring to when you use pure emulation (plain QEMU, with no hardware acceleration): commit 6b86a61fee15ce1237303fab2f7896f8c3bcad47 Author: Attila Fazekas <afazekas> Date: Wed May 28 09:19:29 2014 +0200 Use no_timer_check with soft-qemu The Linux kernel timer check not working properly when the hypervisor's thread preempted by the host CPU scheduler. The timer check is automatically disabled with other types of hypervisors including the hardware accelerated kvm, but timer_check is not disabled when qemu used without hardware acceleration. This issue is frequently mischaracterized as an SSH connectivity issue and causes rechecks and occasional boot failures. This change adds no_timer_check kernel parameter when we are using uec images with qemu. Closes-Bug: #1312199 Change-Id: I3cfdfe9048fe219fc12cdac8a399b496f237e55e [1] https://git.openstack.org/cgit/openstack/nova/commit/?id=6b86a61fee15ce1237303fab2f7896f8c3bcad47
The no_timer_check Fedora image fixes this problem for me.
Closing per comment #9