RDO tickets are now tracked in Jira https://issues.redhat.com/projects/RDO/issues/
Bug 1082030 - F20 cloud image fails to boot in icehouse running on F20
Summary: F20 cloud image fails to boot in icehouse running on F20
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: RDO
Classification: Community
Component: openstack-nova
Version: unspecified
Hardware: Unspecified
OS: Linux
unspecified
urgent
Target Milestone: ---
: Icehouse
Assignee: RHOS Maint
QA Contact: Ami Jeain
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-03-28 14:04 UTC by Kambiz Aghaiepour
Modified: 2016-04-26 14:39 UTC (History)
32 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-01-21 09:42:19 UTC
Embargoed:


Attachments (Terms of Use)
boot failure (20.82 KB, image/png)
2014-03-28 14:04 UTC, Kambiz Aghaiepour
no flags Details

Description Kambiz Aghaiepour 2014-03-28 14:04:50 UTC
Created attachment 879852 [details]
boot failure

Description of problem:

F20 cloud image, when launched never finishes booting.  Uncertain what the issue is.  Occasionally if I watch the console while it boots, the instance will fully launch, and I will be able to login, as well as ssh to the instance from a floating IP.  However, most times, it fails with the error on the console:

    MP-BIOS bug: 8254 timer not connected to IO-APIC


Version-Release number of selected component (if applicable):

latest versions of openstack packages installed via packstack.

How reproducible:

setup latest icehouse, run packstack (two node configuration in my setup, with control+neutron on one node, compute on the other)

Steps to Reproduce:
1.  launch F20 cloud image.  It will fail to fully launch
2.  try the same thing with Cirros and it will launch.


Additional info:
   Console screenshot attached.

Comment 1 Kambiz Aghaiepour 2014-03-28 14:59:48 UTC
I should mention, this is with nested virtualization, i.e. I'm running icehouse on two nodes under virt-manager on F20.  and the F20 cloud image is being launched in RDO running on the two nodes with 

libvirt_type=qemu

in nova.conf on the compute VM.

Comment 2 Kambiz Aghaiepour 2014-03-28 15:00:57 UTC
also, the two nodes under virt-manager are running F20 as well. If needed, I can provide the package manifests on control and compute nodes.

Comment 3 Xavier Queralt 2014-03-28 16:13:20 UTC
This is actually in RDO, icehouse for F20 is only available through the RDO repos.

Comment 5 Stephen Gordon 2014-05-22 16:01:17 UTC
Hi Matt,

Do you have any insight on why this might have occurred and whether it's still likely to be an issue?

Thanks,

Steve

Comment 6 Matthew Miller 2014-06-09 12:21:09 UTC
I don't have any idea -- this is a bit too low-level for my expertise. Are the kernel/virt people aware?

Comment 7 Attila Fazekas 2014-08-25 10:42:19 UTC
See alos: https://bugzilla.redhat.com/show_bug.cgi?id=1102592

You can try to use http://dl.fedoraproject.org/pub/alt/openstack/20/x86_64/Fedora-x86_64-20-20140618-sda.qcow2 which contains the no_timer_check option and used by the openstack gate tests.

Now openstack passes the 'no_timer_check' option to the amazon style images when qemu used, but this is not possible with other images. The image has to be created with no_timer_check boot option.

Comment 8 Kashyap Chamarthy 2014-08-25 11:04:16 UTC
Kambiz, can you please test with this new cloud image and confirm here if it works for you?


You can examine the disk image to see the option Attila mentions:
---------------
$ guestfish --ro -i -a Fedora-x86_64-20-20140618-sda.qcow2 
[. . .]
><fs> cat /etc/grub.conf 
default=0
timeout=0


title Fedora (3.11.10-301.fc20.x86_64)
        root (hd0)
        kernel /boot/vmlinuz-3.11.10-301.fc20.x86_64 ro root=UUID=314b4a27-3885-49e8-9415-af098db4fd2a no_timer_check console=hvc0 LANG=en_US.UTF-8
        initrd /boot/initramfs-3.11.10-301.fc20.x86_64.img

><fs> 
---------------

This is the relevant commit in Nova[1] that Attila is referring to when you use pure emulation (plain QEMU, with no hardware acceleration):

commit 6b86a61fee15ce1237303fab2f7896f8c3bcad47
Author: Attila Fazekas <afazekas>
Date:   Wed May 28 09:19:29 2014 +0200

    Use no_timer_check with soft-qemu

    The Linux kernel timer check not working properly
    when the hypervisor's thread preempted by the host CPU scheduler.

    The timer check is automatically disabled with other types
    of hypervisors including the hardware accelerated kvm,
    but timer_check is not disabled when qemu used without hardware acceleration.

    This issue is frequently mischaracterized as an SSH connectivity issue and
    causes rechecks and occasional boot failures.

    This change adds no_timer_check kernel parameter when we are using
    uec images with qemu.

    Closes-Bug: #1312199
    Change-Id: I3cfdfe9048fe219fc12cdac8a399b496f237e55e


  [1] https://git.openstack.org/cgit/openstack/nova/commit/?id=6b86a61fee15ce1237303fab2f7896f8c3bcad47

Comment 9 Ben Nemec 2014-09-18 19:23:04 UTC
The no_timer_check Fedora image fixes this problem for me.

Comment 10 Kashyap Chamarthy 2015-01-21 09:42:19 UTC
Closing per comment #9


Note You need to log in before you can comment on or make changes to this bug.