Bug 1505424

Summary: [Splitstack] Overcloud is not functional after the deployment due
Product: Red Hat OpenStack Reporter: Gurenko Alex <agurenko>
Component: openstack-tripleo-heat-templatesAssignee: Martin André <maandre>
Status: CLOSED ERRATA QA Contact: Gurenko Alex <agurenko>
Severity: high Docs Contact:
Priority: high    
Version: 12.0 (Pike)CC: agurenko, dprince, jcoufal, jjoyce, jschluet, jslagle, m.andre, mburns, ohochman, rhel-osp-director-maint, sbaker
Target Milestone: betaKeywords: Triaged
Target Release: 12.0 (Pike)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-7.0.3-6.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-13 22:18:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1501852    
Bug Blocks:    
Attachments:
Description Flags
overcloudrc none

Description Gurenko Alex 2017-10-23 14:36:52 UTC
Created attachment 1342220 [details]
overcloudrc

Description of problem: After split stack deployment of 1 compute, 1 controller that completed with CREATE_COMPLETE the overcloud commands returns error


Version-Release number of selected component (if applicable): build 2017-10-17.2


How reproducible:


Steps to Reproduce:
1. Deploy split stack with 1 compute, 1 controller
2. source overcloudrc
3. type openstack catalog list

Actual results:

[stack@undercloud-0 ~]$ openstack catalog list
Failed to discover available identity versions when contacting http://192.168.25.28:5000/v2.0. Attempting to parse version from URL.
Unable to establish connection to http://192.168.25.28:5000/v2.0/tokens: HTTPConnectionPool(host='192.168.25.28', port=5000): Max retries exceeded with url: /v2.0/tokens (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x38e2890>: Failed to establish a new connection: [Errno 113] No route to host',))


Expected results:

get a catalog printer for the overcloud


Additional info:
overcloudrc specifies following line:

export OS_AUTH_URL=http://192.168.25.28:5000/v2.0

but overcloud controller has an ip of 192.168.25.25 and compute 192.168.25.23

Comment 2 James Slagle 2017-10-23 16:28:14 UTC
i see that on controller-0, the local docker daemon is not running:

Oct 23 14:53:52 controller-0.redhat.local os-collect-config[10452]: "2017-10-23 14:53:24,268 WARNING: 15741 -- retrying pulling image: 192.168.24.1:8787/rhosp12/openstack-memcached-docker:20171017.1",
Oct 23 14:53:52 controller-0.redhat.local os-collect-config[10452]: "2017-10-23 14:53:24,282 WARNING: 15740 -- docker pull failed: Cannot connect to the Docker daemon. Is the docker daemon running on this host?",

What is the expectation of the docker service prior to the deployment? Should it be running or not? We recommended to disable it first due to:

https://bugzilla.redhat.com/show_bug.cgi?id=1503021

Also note that the stack went to create_complete even though nothing got deployed on the overcloud. It seems paunch and/or heat-config-ansible is not properly signaling a failed deployment back to Heat (wrong exit code getting used somewhere probably). The deployment definitely should have been failed since nothing got deployed.

Comment 3 James Slagle 2017-10-23 16:29:12 UTC
> Also note that the stack went to create_complete even though nothing got
> deployed on the overcloud. It seems paunch and/or heat-config-ansible is not
> properly signaling a failed deployment back to Heat (wrong exit code getting
> used somewhere probably). The deployment definitely should have been failed
> since nothing got deployed.

Alex, can you file a new bug for this issue? I think it needs to be tracked separately. It's also for DFG:Containers.

Comment 4 James Slagle 2017-10-23 16:30:59 UTC
(In reply to James Slagle from comment #2)
> i see that on controller-0, the local docker daemon is not running:
> 
> Oct 23 14:53:52 controller-0.redhat.local os-collect-config[10452]:
> "2017-10-23 14:53:24,268 WARNING: 15741 -- retrying pulling image:
> 192.168.24.1:8787/rhosp12/openstack-memcached-docker:20171017.1",
> Oct 23 14:53:52 controller-0.redhat.local os-collect-config[10452]:
> "2017-10-23 14:53:24,282 WARNING: 15740 -- docker pull failed: Cannot
> connect to the Docker daemon. Is the docker daemon running on this host?",
> 
> What is the expectation of the docker service prior to the deployment?
> Should it be running or not? We recommended to disable it first due to:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1503021

Need input from DFG:Containers on what/how to bootstrap the docker service appropriately, taking into consideration this bug and bug 1503021

Comment 5 Gurenko Alex 2017-10-24 15:41:27 UTC
(In reply to James Slagle from comment #3)
> > Also note that the stack went to create_complete even though nothing got
> > deployed on the overcloud. It seems paunch and/or heat-config-ansible is not
> > properly signaling a failed deployment back to Heat (wrong exit code getting
> > used somewhere probably). The deployment definitely should have been failed
> > since nothing got deployed.
> 
> Alex, can you file a new bug for this issue? I think it needs to be tracked
> separately. It's also for DFG:Containers.

Here is BZ open for that issue with logs attached https://bugzilla.redhat.com/show_bug.cgi?id=1505495

Comment 6 Dan Prince 2017-10-25 20:40:36 UTC
It sounds like we could be missing a signal in the case where paunch fails to configure a service correctly. I will sync with Steve Baker and see if we have any ideas on this.

Comment 7 Steve Baker 2017-10-25 21:27:27 UTC
I've commented on bug 1505495, I think it is docker-puppet.py not handling puppet exitcodes correctly.

Comment 8 Dan Prince 2017-10-27 13:53:50 UTC
Marking this as depends on for bug 1501852. I think the real issue being described here is that deployment finished but in fact should not have because some of the containers (keystone in this example) was not deployed.

There is an actual issue here with deployment but my suspicion is that we are fixing that in bug docker bootstrapping with split stack. Perhaps related to bug 1503021

Comment 9 Dan Prince 2017-10-27 13:55:28 UTC
Marking as ON_DEV as the --detail-exit codes patch upstream has been proposed:

https://review.openstack.org/#/c/511509/

Comment 10 Omri Hochman 2017-10-27 13:59:51 UTC
*** Bug 1505495 has been marked as a duplicate of this bug. ***

Comment 12 Martin André 2017-11-10 10:07:40 UTC
https://review.openstack.org/#/c/517022/ merged in stable/pike.

Comment 13 Gurenko Alex 2017-11-15 15:14:25 UTC
1+1 topology is now deployable with Split stack

Comment 17 errata-xmlrpc 2017-12-13 22:18:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462