Hide Forgot
Description of problem: When trying to deploy the overcloud with basic configuration (1 controller and 1 compute). the deployment process is hang. Version-Release number of selected component (if applicable): Latest OSP8 release. How reproducible: Always! Steps to Reproduce: 1. install the undercloud 2. edit the deployment templates 3. run openstack deploy command (openstack overcloud deploy). Actual results: [stack@director ~]$ ./deploy.sh Deploying templates in the directory /usr/share/openstack-tripleo-heat-templates ERROR: Authentication failed. Please try again with option --include-password or export HEAT_INCLUDE_PASSWORD=1 Authentication required [stack@director ~]$ cat deploy.sh openstack overcloud deploy --templates -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /home/stack/templates/network-environment.yaml --control-flavor control --compute-flavor compute --neutron-public-interface ens9f0 --ntp-server 10.254.67.22 --neutron-network-type gre --neutron-tunnel-types gre [stack@director ~]$ heat resource-list overcloud |grep -i pro | Compute | 28f50806-afc2-4403-96e0-d10f222f46dc | OS::Heat::ResourceGroup | CREATE_IN_PROGRESS | 2016-02-22T02:16:49 | | Controller | 4692dce3-22f4-4797-8108-ee7e320e2b01 | OS::Heat::ResourceGroup | CREATE_IN_PROGRESS | 2016-02-22T02:16:49 | Expected results: Overcloud deployment is successful. Additional info: As you see, the deployment is still in progress, but we have see some authentication error. And also it says that we need a parameter like --include-password, but from the openstack overcloud deploy help, there isn't such parameter. I know to investigate this issue, you need more logs and files to check, I am glad to provide any requested files.
*** Bug 1310958 has been marked as a duplicate of this bug. ***
Finally it's failed. [stack@director ~]$ heat resource-list overcloud |grep FAIL | ComputeNodesPostDeployment | 8d89fa12-5a27-4950-a734-e9980b5dca20 | OS::TripleO::ComputePostDeployment | CREATE_FAILED | 2016-02-23T02:41:20 | | ControllerNodesPostDeployment | 25b95bd6-4877-4202-b93a-54be7636c130 | OS::TripleO::ControllerPostDeployment | CREATE_FAILED | 2016-02-23T02:41:20 | | ComputeAllNodesValidationDeployment | bcf7ebf9-ea4a-42ad-b9eb-1b6cd187be52 | OS::Heat::StructuredDeployments | CREATE_FAILED | 2016-02-23T02:41:21 | We have already gone through the page http://docs.openstack.org/developer/tripleo-docs/troubleshooting/troubleshooting-overcloud.html, but cannot get more details for such errors.
While checked the controller node log. we found that following errors. Feb 22 23:06:26 overcloud-controller-0.localdomain os-collect-config[10347]: [2016-02-22 23:06:26,639] (os-refresh-config) [INFO] Completed phase post-configure Feb 22 23:06:28 overcloud-controller-0.localdomain os-collect-config[10347]: 2016-02-22 23:06:28.294 10347 WARNING os-collect-config [-] Source [request] Unavailable. Feb 22 23:06:28 overcloud-controller-0.localdomain os-collect-config[10347]: 2016-02-22 23:06:28.295 10347 WARNING os_collect_config.local [-] /var/lib/os-collect-config/local-data not found. Skipping Feb 22 23:06:28 overcloud-controller-0.localdomain os-collect-config[10347]: 2016-02-22 23:06:28.295 10347 WARNING os_collect_config.local [-] No local metadata found (['/var/lib/os-collect-config/local-data']) Feb 22 23:06:28 overcloud-controller-0.localdomain os-collect-config[10347]: 2016-02-22 23:06:28.295 10347 WARNING os_collect_config.zaqar [-] No auth_url configured.
Hi, We retried to deploy again today with the latest Beta 7 code. But still cannot complete the deployment. [stack@director ~]$ heat resource-list overcloud |grep FAIL | ComputeAllNodesValidationDeployment | 4d5942e7-5a8e-46a7-b99c-5ac0373f0c00 | OS::Heat::StructuredDeployments | CREATE_FAILED | 2016-03-01T21:47:50 | | ComputeNodesPostDeployment | 21ed5f8e-0447-43dc-9acc-df0cbcc1c142 | OS::TripleO::ComputePostDeployment | CREATE_FAILED | 2016-03-01T21:47:50 | | ControllerNodesPostDeployment | 2dfdb0cf-71d9-4d07-90f8-ba6cbbd9f119 | OS::TripleO::ControllerPostDeployment | CREATE_FAILED | 2016-03-01T21:47:50 | I attached the logs for review. 1. the openstack deploy log 2. the os log from the conroller node.
Created attachment 1132147 [details] controller log
Created attachment 1132148 [details] openstack overcloud deploy log
this looks like a networking issue given the failure of ComputeAllNodesValidationDeployment, which validates that compute nodes can ping the controller IP addresses. mysql then appears to fail to start on the controllers likely because it could not bind to the configured IP address. can you login to the controllers and see if they have an interface called ens9f0? does the physical networking configuration of the boxes support how you are attempting to configure it with your network-environment.yaml file? can you attach the output from the following bash code run from the overcloud to get the stdout/stderr from the failed deployments? source stackrc for failed_deployment in $(heat resource-list --nested-depth 5 overcloud | grep FAILED | grep -E 'OS::Heat::SoftwareDeployment |OS::Heat::StructuredDeployment ' | cut -d '|' -f 3); do echo $failed_deployment; heat deployment-show $failed_deployment; done if this doesn't help you figure out what the issue might be, can you attach a tarball of /home/stack/templates? and a sosreport from each node (assuming you can ssh into them at all)?
(In reply to James Slagle from comment #8) > this looks like a networking issue given the failure of > ComputeAllNodesValidationDeployment, which validates that compute nodes can > ping the controller IP addresses. > mysql then appears to fail to start on the controllers likely because it > could not bind to the configured IP address. > > can you login to the controllers and see if they have an interface called > ens9f0? > > does the physical networking configuration of the boxes support how you are > attempting to configure it with your network-environment.yaml file? > > can you attach the output from the following bash code run from the > overcloud to get the stdout/stderr from the failed deployments? sorry, that ^^ should say "run from the undercloud" > > source stackrc > for failed_deployment in $(heat resource-list --nested-depth 5 overcloud | > grep FAILED | grep -E 'OS::Heat::SoftwareDeployment > |OS::Heat::StructuredDeployment ' | cut -d '|' -f 3); do > echo $failed_deployment; > heat deployment-show $failed_deployment; > done > > if this doesn't help you figure out what the issue might be, can you attach > a tarball of /home/stack/templates? and a sosreport from each node (assuming > you can ssh into them at all)?
Hi James, Appreciate your professional checking for our logs. The issue has been fixed now. Root causes are below. 1. Just as you see, some subnetwork which used by mysql is disconnected among nodes. 2. No external network connection between the undercloud and overcloud which caused the endpoints creation failed. Thanks again for your help. This case can be closed with NoBUG conclusion. BR Kaihua