Bug 1313359

Summary: osp-d deployment runs successfully, but when running scenario tests: "No valid host was found"
Product: Red Hat OpenStack Reporter: Arie Bregman <abregman>
Component: rhosp-directorAssignee: Ben Nemec <bnemec>
Status: CLOSED NOTABUG QA Contact: Arik Chernetsky <achernet>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.0 (Kilo)CC: abregman, berrange, bnemec, dasmith, dbecker, eglynn, fpercoco, hbrock, kchamart, mbooth, mburns, morazi, ndipanov, rhel-osp-director-maint, sbauza, sferdjao, sgordon, vromanso, yeylon
Target Milestone: ---Keywords: AutomationBlocker
Target Release: 8.0 (Liberty)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-03-13 07:17:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Arie Bregman 2016-03-01 13:04:53 UTC
Description of problem:
Running tempest scenario tests fail with: 'No valid host was found. There are not enough hosts available.'

In addition, when running 'nova service-list' on overcloud, there is no compute service in the list

Version-Release number of selected component (if applicable): 7 (using puddle)


How reproducible: 99%

Steps to Reproduce:
1. Deploy OSP-d 7 on existing openstack (OVB)
2. Run tempest tests or 'nova service-list' on overcloud

Actual results:
'No valid host was found. There are not enough hosts available.'
Compute service is not available

Expected results:
Successful tests

Additional info:

Comment 3 Matthew Booth 2016-03-01 14:06:32 UTC
It seems that the compute host didn't come up properly because rabbit wasn't up, yet. I'd fix that before looking for other causes.

Comment 4 Flavio Percoco 2016-03-01 15:11:20 UTC
Looks like the credentials in the compute node are incorrect:

2016-03-01 12:06:51.784 14588 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server 172.25.0.9:5672 closed the connection. Check login credentials: Socket closed
2016-03-01 12:07:51.864 14588 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server 172.25.0.9:5672 closed the connection. Check login credentials: Socket closed

Comment 6 Arie Bregman 2016-03-01 15:48:12 UTC
This seems to be related to installer and not nova. Switching from 'openstack-nova' to 'rhel-osp-director' component.

Comment 8 Hugh Brock 2016-03-02 08:54:41 UTC
Reassigned to Ben Nemec (invented OVB). Ben, could you have a look here? Thanks.

Comment 9 Ben Nemec 2016-03-02 17:45:35 UTC
I think the compute node is connecting fine once rabbit is up, and it does eventually contact conductor (the waiting for conductor messages stop), but I'm seeing a lot of

=ERROR REPORT==== 1-Mar-2016::12:21:17 ===
closing AMQP connection <0.4617.1> (172.25.0.8:60761 -> 172.25.0.9:5672):
{inet_error,etimedout}

in the rabbit logs on the controller (note that it appears 172.25.0.8 is the compute node, .9 is the controller).  I think we need to find the cause of these connection timeouts.  Is there any way I could get access to the environment where this is being seen?

Comment 12 Arie Bregman 2016-03-13 07:17:15 UTC
Issue resolved. 
MTU should be set to 1450 on all nodes interfaces.
This should be part of OVB templates.