Bug 1505424 - [Splitstack] Overcloud is not functional after the deployment due
Summary: [Splitstack] Overcloud is not functional after the deployment due
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 12.0 (Pike)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: beta
: 12.0 (Pike)
Assignee: Martin André
QA Contact: Gurenko Alex
URL:
Whiteboard:
: 1505495 (view as bug list)
Depends On: 1501852
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-10-23 14:36 UTC by Gurenko Alex
Modified: 2018-02-05 19:15 UTC (History)
11 users (show)

Fixed In Version: openstack-tripleo-heat-templates-7.0.3-6.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-13 22:18:18 UTC
Target Upstream Version:


Attachments (Terms of Use)
overcloudrc (905 bytes, text/plain)
2017-10-23 14:36 UTC, Gurenko Alex
no flags Details


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 511509 0 None MERGED Add --detailed-exitcodes when running puppet via ansible 2021-01-08 12:21:32 UTC
OpenStack gerrit 517022 0 None MERGED Add --detailed-exitcodes when running puppet via ansible 2021-01-08 12:22:11 UTC
Red Hat Product Errata RHEA-2017:3462 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 12.0 Enhancement Advisory 2018-02-16 01:43:25 UTC

Description Gurenko Alex 2017-10-23 14:36:52 UTC
Created attachment 1342220 [details]
overcloudrc

Description of problem: After split stack deployment of 1 compute, 1 controller that completed with CREATE_COMPLETE the overcloud commands returns error


Version-Release number of selected component (if applicable): build 2017-10-17.2


How reproducible:


Steps to Reproduce:
1. Deploy split stack with 1 compute, 1 controller
2. source overcloudrc
3. type openstack catalog list

Actual results:

[stack@undercloud-0 ~]$ openstack catalog list
Failed to discover available identity versions when contacting http://192.168.25.28:5000/v2.0. Attempting to parse version from URL.
Unable to establish connection to http://192.168.25.28:5000/v2.0/tokens: HTTPConnectionPool(host='192.168.25.28', port=5000): Max retries exceeded with url: /v2.0/tokens (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x38e2890>: Failed to establish a new connection: [Errno 113] No route to host',))


Expected results:

get a catalog printer for the overcloud


Additional info:
overcloudrc specifies following line:

export OS_AUTH_URL=http://192.168.25.28:5000/v2.0

but overcloud controller has an ip of 192.168.25.25 and compute 192.168.25.23

Comment 2 James Slagle 2017-10-23 16:28:14 UTC
i see that on controller-0, the local docker daemon is not running:

Oct 23 14:53:52 controller-0.redhat.local os-collect-config[10452]: "2017-10-23 14:53:24,268 WARNING: 15741 -- retrying pulling image: 192.168.24.1:8787/rhosp12/openstack-memcached-docker:20171017.1",
Oct 23 14:53:52 controller-0.redhat.local os-collect-config[10452]: "2017-10-23 14:53:24,282 WARNING: 15740 -- docker pull failed: Cannot connect to the Docker daemon. Is the docker daemon running on this host?",

What is the expectation of the docker service prior to the deployment? Should it be running or not? We recommended to disable it first due to:

https://bugzilla.redhat.com/show_bug.cgi?id=1503021

Also note that the stack went to create_complete even though nothing got deployed on the overcloud. It seems paunch and/or heat-config-ansible is not properly signaling a failed deployment back to Heat (wrong exit code getting used somewhere probably). The deployment definitely should have been failed since nothing got deployed.

Comment 3 James Slagle 2017-10-23 16:29:12 UTC
> Also note that the stack went to create_complete even though nothing got
> deployed on the overcloud. It seems paunch and/or heat-config-ansible is not
> properly signaling a failed deployment back to Heat (wrong exit code getting
> used somewhere probably). The deployment definitely should have been failed
> since nothing got deployed.

Alex, can you file a new bug for this issue? I think it needs to be tracked separately. It's also for DFG:Containers.

Comment 4 James Slagle 2017-10-23 16:30:59 UTC
(In reply to James Slagle from comment #2)
> i see that on controller-0, the local docker daemon is not running:
> 
> Oct 23 14:53:52 controller-0.redhat.local os-collect-config[10452]:
> "2017-10-23 14:53:24,268 WARNING: 15741 -- retrying pulling image:
> 192.168.24.1:8787/rhosp12/openstack-memcached-docker:20171017.1",
> Oct 23 14:53:52 controller-0.redhat.local os-collect-config[10452]:
> "2017-10-23 14:53:24,282 WARNING: 15740 -- docker pull failed: Cannot
> connect to the Docker daemon. Is the docker daemon running on this host?",
> 
> What is the expectation of the docker service prior to the deployment?
> Should it be running or not? We recommended to disable it first due to:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1503021

Need input from DFG:Containers on what/how to bootstrap the docker service appropriately, taking into consideration this bug and bug 1503021

Comment 5 Gurenko Alex 2017-10-24 15:41:27 UTC
(In reply to James Slagle from comment #3)
> > Also note that the stack went to create_complete even though nothing got
> > deployed on the overcloud. It seems paunch and/or heat-config-ansible is not
> > properly signaling a failed deployment back to Heat (wrong exit code getting
> > used somewhere probably). The deployment definitely should have been failed
> > since nothing got deployed.
> 
> Alex, can you file a new bug for this issue? I think it needs to be tracked
> separately. It's also for DFG:Containers.

Here is BZ open for that issue with logs attached https://bugzilla.redhat.com/show_bug.cgi?id=1505495

Comment 6 Dan Prince 2017-10-25 20:40:36 UTC
It sounds like we could be missing a signal in the case where paunch fails to configure a service correctly. I will sync with Steve Baker and see if we have any ideas on this.

Comment 7 Steve Baker 2017-10-25 21:27:27 UTC
I've commented on bug 1505495, I think it is docker-puppet.py not handling puppet exitcodes correctly.

Comment 8 Dan Prince 2017-10-27 13:53:50 UTC
Marking this as depends on for bug 1501852. I think the real issue being described here is that deployment finished but in fact should not have because some of the containers (keystone in this example) was not deployed.

There is an actual issue here with deployment but my suspicion is that we are fixing that in bug docker bootstrapping with split stack. Perhaps related to bug 1503021

Comment 9 Dan Prince 2017-10-27 13:55:28 UTC
Marking as ON_DEV as the --detail-exit codes patch upstream has been proposed:

https://review.openstack.org/#/c/511509/

Comment 10 Omri Hochman 2017-10-27 13:59:51 UTC
*** Bug 1505495 has been marked as a duplicate of this bug. ***

Comment 12 Martin André 2017-11-10 10:07:40 UTC
https://review.openstack.org/#/c/517022/ merged in stable/pike.

Comment 13 Gurenko Alex 2017-11-15 15:14:25 UTC
1+1 topology is now deployable with Split stack

Comment 17 errata-xmlrpc 2017-12-13 22:18:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462


Note You need to log in before you can comment on or make changes to this bug.