Bug 1007578

Summary: Unable to launch an instance
Product: [Community] RDO Reporter: Rob Crittenden <rcritten>
Component: openstack-novaAssignee: Xavier Queralt <xqueralt>
Status: CLOSED WORKSFORME QA Contact: Ami Jeain <ajeain>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: dallan, ndipanov, rbryant, rcritten, xqueralt, yeylon
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-03-10 13:42:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Nova logs in debug mode none

Description Rob Crittenden 2013-09-12 19:35:14 UTC
Description of problem:

Following the Test Day instructions at http://openstack.redhat.com/QuickStartLatest

Launching an instance returns an Error status.

Going to the Log tab results in:

Unable to get log for instance "<hash>".

I've been unable to find any errors logged in any of the OpenStack logs.

I've made several installation attempts on this machine, all fail in the same way. An installation of grizzly worked fine.

Version-Release number of selected component (if applicable):

Havana

Comment 1 Xavier Queralt 2013-09-19 09:11:53 UTC
If the instance goes to the error state there should be some info at least in one the nova logs about it.

If the issue still persist with the latest version of nova, could you attach all the logs under /var/log/nova to this bz so we can look for some insights on why the instance couldn't be spawned? Make sure you have debug set to True in your /etc/nova/nova.conf

Comment 2 Rob Crittenden 2013-09-19 12:42:37 UTC
Created attachment 799913 [details]
Nova logs in debug mode

Comment 3 Rob Crittenden 2013-09-19 12:43:23 UTC
The package versions are openstack-nova-*-2013.2-0.19.b3.el6.noarch

Comment 4 Xavier Queralt 2013-09-19 14:00:56 UTC
Thanks for the logs!

From the scheduler log I can see that the only compute node available (with hostname testday.greyoak.com) is ruled out from the possible hosts where to start an instance because its status hasn't been updated yet (last update was 577518.384811 seconds ago):

2013-09-19 08:41:22.384 3716 DEBUG nova.servicegroup.drivers.db [req-721f0db3-ad47-4862-9ebc-9d7e93e2ca38 3ae7358e02964914b7223baa50b51ced 004f6101d81744a8a024ef116f1ee8ed] DB_Driver.is_up last_heartbeat = 2013-09-12 20:16:04 elapsed = 577518.384811 is_up /usr/lib/python2.6/site-packages/nova/servicegroup/drivers/db.py:71
2013-09-19 08:41:22.385 3716 DEBUG nova.scheduler.filters.compute_filter [req-721f0db3-ad47-4862-9ebc-9d7e93e2ca38 3ae7358e02964914b7223baa50b51ced 004f6101d81744a8a024ef116f1ee8ed] (testday.greyoak.com, testday.greyoak.com) ram:6222 disk:33792 io_ops:0 instances:2 is disabled or has not been heard from in a while host_passes /usr/lib/python2.6/site-packages/nova/scheduler/filters/compute_filter.py:44
2013-09-19 08:41:22.385 3716 DEBUG nova.filters [req-721f0db3-ad47-4862-9ebc-9d7e93e2ca38 3ae7358e02964914b7223baa50b51ced 004f6101d81744a8a024ef116f1ee8ed] Filter ComputeFilter returned 0 host(s) get_filtered_objects /usr/lib/python2.6/site-packages/nova/filters.py:85

In compute log I see that the node joins the ServiceGroup at 2013-09-19 08:41:20.767, meaning that the first status update will happen 10 seconds after. On the other hand, the scheduler is looking for a candidate where to start an instance at 2013-09-19 08:41:22.384 but can't find any because the only available node hasn't updated its status for some time (according to the logs). That's why the instance goes to error state.

The only situation I can think about with such outcome is that your database already had an entry for the compute node testday.greyoak.com and because the new compute service hasn't updated it yet (remember that the boot happens between at 08:41:22.384 and the update is scheduled for 08:41:30.767) it thinks the node is not available and can't boot the instance anywhere.

This is what nova checks to verify if a node is up:

last_heartbeat = service_ref['updated_at'] or service_ref['created_at']
elapsed = utils.total_seconds(timeutils.utcnow() - last_heartbeat)

Besides this I can't see any other error in the logs, have you tried to boot any other instance after this one?

Comment 5 Rob Crittenden 2013-09-19 14:32:04 UTC
testday is the only machine, it is a packstack all-in-one install.

This is the third instance I've tried to start, they all fail in the same way as far as I can tell.

I guess this makes some amount of sense as 577518 seconds is just about a week, and that's when I filed the bug. I had shut down this machine after filing the bug and rebooted it to obtain the logs. So for the current instance, could it be that some service didn't start upon reboot (mysql wasn't chkconfig'd on, for example)?

Comment 6 Rob Crittenden 2013-11-20 16:39:29 UTC
Ran into a similar problem today using vftool. I blew the original instance away so I can't know for sure, but the fix today was to use i386 images.

Comment 7 Xavier Queralt 2014-03-10 13:42:29 UTC
I'm closing this as I don't see any way to debug the problem properly. From the previous logs it appears that the compute node didn't came online which prevented the scheduler from finding a suitable compute node where to start the instance.