Hide Forgot
Created attachment 1342220 [details] overcloudrc Description of problem: After split stack deployment of 1 compute, 1 controller that completed with CREATE_COMPLETE the overcloud commands returns error Version-Release number of selected component (if applicable): build 2017-10-17.2 How reproducible: Steps to Reproduce: 1. Deploy split stack with 1 compute, 1 controller 2. source overcloudrc 3. type openstack catalog list Actual results: [stack@undercloud-0 ~]$ openstack catalog list Failed to discover available identity versions when contacting http://192.168.25.28:5000/v2.0. Attempting to parse version from URL. Unable to establish connection to http://192.168.25.28:5000/v2.0/tokens: HTTPConnectionPool(host='192.168.25.28', port=5000): Max retries exceeded with url: /v2.0/tokens (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x38e2890>: Failed to establish a new connection: [Errno 113] No route to host',)) Expected results: get a catalog printer for the overcloud Additional info: overcloudrc specifies following line: export OS_AUTH_URL=http://192.168.25.28:5000/v2.0 but overcloud controller has an ip of 192.168.25.25 and compute 192.168.25.23
i see that on controller-0, the local docker daemon is not running: Oct 23 14:53:52 controller-0.redhat.local os-collect-config[10452]: "2017-10-23 14:53:24,268 WARNING: 15741 -- retrying pulling image: 192.168.24.1:8787/rhosp12/openstack-memcached-docker:20171017.1", Oct 23 14:53:52 controller-0.redhat.local os-collect-config[10452]: "2017-10-23 14:53:24,282 WARNING: 15740 -- docker pull failed: Cannot connect to the Docker daemon. Is the docker daemon running on this host?", What is the expectation of the docker service prior to the deployment? Should it be running or not? We recommended to disable it first due to: https://bugzilla.redhat.com/show_bug.cgi?id=1503021 Also note that the stack went to create_complete even though nothing got deployed on the overcloud. It seems paunch and/or heat-config-ansible is not properly signaling a failed deployment back to Heat (wrong exit code getting used somewhere probably). The deployment definitely should have been failed since nothing got deployed.
> Also note that the stack went to create_complete even though nothing got > deployed on the overcloud. It seems paunch and/or heat-config-ansible is not > properly signaling a failed deployment back to Heat (wrong exit code getting > used somewhere probably). The deployment definitely should have been failed > since nothing got deployed. Alex, can you file a new bug for this issue? I think it needs to be tracked separately. It's also for DFG:Containers.
(In reply to James Slagle from comment #2) > i see that on controller-0, the local docker daemon is not running: > > Oct 23 14:53:52 controller-0.redhat.local os-collect-config[10452]: > "2017-10-23 14:53:24,268 WARNING: 15741 -- retrying pulling image: > 192.168.24.1:8787/rhosp12/openstack-memcached-docker:20171017.1", > Oct 23 14:53:52 controller-0.redhat.local os-collect-config[10452]: > "2017-10-23 14:53:24,282 WARNING: 15740 -- docker pull failed: Cannot > connect to the Docker daemon. Is the docker daemon running on this host?", > > What is the expectation of the docker service prior to the deployment? > Should it be running or not? We recommended to disable it first due to: > > https://bugzilla.redhat.com/show_bug.cgi?id=1503021 Need input from DFG:Containers on what/how to bootstrap the docker service appropriately, taking into consideration this bug and bug 1503021
(In reply to James Slagle from comment #3) > > Also note that the stack went to create_complete even though nothing got > > deployed on the overcloud. It seems paunch and/or heat-config-ansible is not > > properly signaling a failed deployment back to Heat (wrong exit code getting > > used somewhere probably). The deployment definitely should have been failed > > since nothing got deployed. > > Alex, can you file a new bug for this issue? I think it needs to be tracked > separately. It's also for DFG:Containers. Here is BZ open for that issue with logs attached https://bugzilla.redhat.com/show_bug.cgi?id=1505495
It sounds like we could be missing a signal in the case where paunch fails to configure a service correctly. I will sync with Steve Baker and see if we have any ideas on this.
I've commented on bug 1505495, I think it is docker-puppet.py not handling puppet exitcodes correctly.
Marking this as depends on for bug 1501852. I think the real issue being described here is that deployment finished but in fact should not have because some of the containers (keystone in this example) was not deployed. There is an actual issue here with deployment but my suspicion is that we are fixing that in bug docker bootstrapping with split stack. Perhaps related to bug 1503021
Marking as ON_DEV as the --detail-exit codes patch upstream has been proposed: https://review.openstack.org/#/c/511509/
*** Bug 1505495 has been marked as a duplicate of this bug. ***
https://review.openstack.org/#/c/517022/ merged in stable/pike.
1+1 topology is now deployable with Split stack
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462