1007578 – Unable to launch an instance

RDO tickets are now tracked in Jira https://issues.redhat.com/projects/RDO/issues/

Bug 1007578 - Unable to launch an instance

Summary: Unable to launch an instance

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	RDO
Classification:	Community
Component:	openstack-nova
Sub Component:
Version:	unspecified
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Xavier Queralt
QA Contact:	Ami Jeain
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-09-12 19:35 UTC by Rob Crittenden
Modified:	2015-06-04 21:53 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2014-03-10 13:42:29 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Nova logs in debug mode (186.50 KB, application/x-gzip) 2013-09-19 12:42 UTC, Rob Crittenden	no flags	Details
View All

Description Rob Crittenden 2013-09-12 19:35:14 UTC

Description of problem:

Following the Test Day instructions at http://openstack.redhat.com/QuickStartLatest

Launching an instance returns an Error status.

Going to the Log tab results in:

Unable to get log for instance "<hash>".

I've been unable to find any errors logged in any of the OpenStack logs.

I've made several installation attempts on this machine, all fail in the same way. An installation of grizzly worked fine.

Version-Release number of selected component (if applicable):

Havana

Comment 1 Xavier Queralt 2013-09-19 09:11:53 UTC

If the instance goes to the error state there should be some info at least in one the nova logs about it.

If the issue still persist with the latest version of nova, could you attach all the logs under /var/log/nova to this bz so we can look for some insights on why the instance couldn't be spawned? Make sure you have debug set to True in your /etc/nova/nova.conf

Comment 2 Rob Crittenden 2013-09-19 12:42:37 UTC

Created attachment 799913 [details]
Nova logs in debug mode

Comment 3 Rob Crittenden 2013-09-19 12:43:23 UTC

The package versions are openstack-nova-*-2013.2-0.19.b3.el6.noarch

Comment 4 Xavier Queralt 2013-09-19 14:00:56 UTC

Thanks for the logs!

From the scheduler log I can see that the only compute node available (with hostname testday.greyoak.com) is ruled out from the possible hosts where to start an instance because its status hasn't been updated yet (last update was 577518.384811 seconds ago):

2013-09-19 08:41:22.384 3716 DEBUG nova.servicegroup.drivers.db [req-721f0db3-ad47-4862-9ebc-9d7e93e2ca38 3ae7358e02964914b7223baa50b51ced 004f6101d81744a8a024ef116f1ee8ed] DB_Driver.is_up last_heartbeat = 2013-09-12 20:16:04 elapsed = 577518.384811 is_up /usr/lib/python2.6/site-packages/nova/servicegroup/drivers/db.py:71
2013-09-19 08:41:22.385 3716 DEBUG nova.scheduler.filters.compute_filter [req-721f0db3-ad47-4862-9ebc-9d7e93e2ca38 3ae7358e02964914b7223baa50b51ced 004f6101d81744a8a024ef116f1ee8ed] (testday.greyoak.com, testday.greyoak.com) ram:6222 disk:33792 io_ops:0 instances:2 is disabled or has not been heard from in a while host_passes /usr/lib/python2.6/site-packages/nova/scheduler/filters/compute_filter.py:44
2013-09-19 08:41:22.385 3716 DEBUG nova.filters [req-721f0db3-ad47-4862-9ebc-9d7e93e2ca38 3ae7358e02964914b7223baa50b51ced 004f6101d81744a8a024ef116f1ee8ed] Filter ComputeFilter returned 0 host(s) get_filtered_objects /usr/lib/python2.6/site-packages/nova/filters.py:85

In compute log I see that the node joins the ServiceGroup at 2013-09-19 08:41:20.767, meaning that the first status update will happen 10 seconds after. On the other hand, the scheduler is looking for a candidate where to start an instance at 2013-09-19 08:41:22.384 but can't find any because the only available node hasn't updated its status for some time (according to the logs). That's why the instance goes to error state.

The only situation I can think about with such outcome is that your database already had an entry for the compute node testday.greyoak.com and because the new compute service hasn't updated it yet (remember that the boot happens between at 08:41:22.384 and the update is scheduled for 08:41:30.767) it thinks the node is not available and can't boot the instance anywhere.

This is what nova checks to verify if a node is up:

last_heartbeat = service_ref['updated_at'] or service_ref['created_at']
elapsed = utils.total_seconds(timeutils.utcnow() - last_heartbeat)

Besides this I can't see any other error in the logs, have you tried to boot any other instance after this one?

Comment 5 Rob Crittenden 2013-09-19 14:32:04 UTC

testday is the only machine, it is a packstack all-in-one install.

This is the third instance I've tried to start, they all fail in the same way as far as I can tell.

I guess this makes some amount of sense as 577518 seconds is just about a week, and that's when I filed the bug. I had shut down this machine after filing the bug and rebooted it to obtain the logs. So for the current instance, could it be that some service didn't start upon reboot (mysql wasn't chkconfig'd on, for example)?

Comment 6 Rob Crittenden 2013-11-20 16:39:29 UTC

Ran into a similar problem today using vftool. I blew the original instance away so I can't know for sure, but the fix today was to use i386 images.

Comment 7 Xavier Queralt 2014-03-10 13:42:29 UTC

I'm closing this as I don't see any way to debug the problem properly. From the previous logs it appears that the compute node didn't came online which prevented the scheduler from finding a suitable compute node where to start the instance.

Note You need to log in before you can comment on or make changes to this bug.