| Summary: | overcloud deploy needs babysitting to complete. | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Steve Reichard <sreichar> |
| Component: | rhosp-director | Assignee: | Angus Thomas <athomas> |
| Status: | CLOSED WONTFIX | QA Contact: | Arik Chernetsky <achernet> |
| Severity: | medium | Docs Contact: | |
| Priority: | high | ||
| Version: | 8.0 (Liberty) | CC: | achernet, athomas, augol, bnemec, dbecker, emacchi, gdrapeau, johfulto, kholden, mburns, mcornea, morazi, rhel-osp-director-maint, skinjo |
| Target Milestone: | --- | Keywords: | Triaged |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-10-19 22:30:30 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Steve Reichard
2016-04-21 21:38:57 UTC
I had ComputeAllNodesValidationDeployment fail for what seem like the same network reasons. I see that the validation test only does a `ping -c 1 $IP` [0]. Would a `ping -c 4 $IP` give the network more time to come up? More details: During an OSP8 deploy [1] on 6 Dell servers my Heat Stack create failed because one of my compute nodes could not ping an address on my API network [2]. I got a list of all the IPs for each box and verified everyone one of them was pingable within 5 minutes [3] of this issue. I then simply ran the same deploy command again and the Stack UPDATE completed successfully [4]. The same deployment command on the same hardware/network worked repeatedly with OSP7 a few hours earlier; I see the order was changed [5] so perhaps this test happens earlier in the process. I will rebuild OSP8 a few times again the same way to see if this is easily reproducible and update this BZ with my results. [0] https://review.openstack.org/gitweb?p=openstack/tripleo-heat-templates.git;a=blob;f=validation-scripts/all-nodes.sh;h=38a5a55e10b26337aaed4bd7f916ff684d56db9d;hb=a6861730bd3eee0cd419c959048cac9a48ee8482#l18 [1] time openstack overcloud deploy --templates ~/templates/ -e ~/templates/clean_osd.yaml -e ~/templates/environments/puppet-pacemaker.yaml -e ~/templates/advanced-networking.yaml -e ~/templates/environments/puppet-ceph-external.yaml -e ~/templates/extraconfig/pre_deploy/rhel-registration/environment-rhel-registration.yaml -e ~/templates/extraconfig/pre_deploy/rhel-registration/rhel-registration-resource-registry.yaml --control-flavor control --control-scale 3 --compute-flavor compute --compute-scale 3 --log-file overcloud_deployment.log --ntp-server 10.5.26.10 --timeout 90 --neutron-bridge-mappings datacentre:br-ex,tenant:br-tenant --neutron-network-type vlan --neutron-network-vlan-ranges tenant:4051:4060 --neutron-disable-tunneling [2] 2016-05-04 02:46:19 [0]: SIGNAL_IN_PROGRESS Signal: deployment failed (1) 2016-05-04 02:46:19 [0]: CREATE_FAILED Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1 2016-05-04 02:46:20 [0]: SIGNAL_COMPLETE Unknown 2016-05-04 02:46:20 [overcloud-ComputeAllNodesValidationDeployment-uojcvmz4kfyo]: UPDATE_FAILED Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1 2016-05-04 02:46:21 [NetworkDeployment]: SIGNAL_COMPLETE Unknown 2016-05-04 02:46:21 [NovaComputeDeployment]: SIGNAL_COMPLETE Unknown 2016-05-04 02:46:21 [0]: SIGNAL_COMPLETE Unknown Stack overcloud CREATE_FAILED Heat Stack create failed. real 32m33.021s user 0m25.289s sys 0m2.416s [stack@hci-director ~]$ [stack@hci-director ~]$ heat stack-show 8409d34e-7c18-4b75-954a-fd4318d58189 | parameters | { | | "OS::project_id": "4a075850f7c2405fbd662676845724bf", | | "OS::stack_id": "8409d34e-7c18-4b75-954a-fd4318d58189", | | "OS::stack_name": "overcloud-ComputeAllNodesValidationDeployment-uojcvmz4kfyo" | | } | parent | f25ba434-00fa-496d-b911-1b1f3484a38c | stack_status | UPDATE_FAILED | stack_status_reason | Error: resources[0]: Deployment to server failed: | | deploy_status_code : Deployment exited with non-zero | | status code: 1 | updated_time | 2016-05-04T02:41:08 [stack@hci-director ~]$ heat deployment-show 1dd25c86-3349-4530-a714-8d3737880086 { "status": "FAILED", "server_id": "1b9a8cd1-b0f0-4de8-a960-7ecafd77acc9", "config_id": "60cb9cb3-c18e-42e2-be94-0e55d1b019bf", "output_values": { "deploy_stdout": "Trying to ping 172.16.1.14 for local network 172.16.1.0/24...SUCCESS\nTrying to ping 172.16.2.14 for local network 172.16.2.0/24...SUCCESS\nTrying to ping 192.168.2.15 for local network 192.168.2.0/24...FAILURE\n", "deploy_stderr": "192.168.2.15 is not pingable. Local Network: 192.168.2.0/24\n", "deploy_status_code": 1 }, "creation_time": "2016-05-04T02:41:11", "updated_time": "2016-05-04T02:46:19", "input_values": {}, "action": "CREATE", "status_reason": "deploy_status_code : Deployment exited with non-zero status code: 1", "id": "1dd25c86-3349-4530-a714-8d3737880086" } [stack@hci-director ~]$ [3] Get list of IPs and generate cross-product ping test command ansible all -b -m shell -a "ip a | egrep '192.168|172.16'" ansible all -b -m shell -a "ping -c 2 192.168.3.15" ansible all -b -m shell -a "ping -c 2 172.16.2.16" ansible all -b -m shell -a "ping -c 2 172.16.1.16" ansible all -b -m shell -a "ping -c 2 192.168.2.17" ansible all -b -m shell -a "ping -c 2 192.168.3.11" ansible all -b -m shell -a "ping -c 2 172.16.2.12" ansible all -b -m shell -a "ping -c 2 172.16.1.12" ansible all -b -m shell -a "ping -c 2 192.168.2.13" ansible all -b -m shell -a "ping -c 2 172.16.2.14" ansible all -b -m shell -a "ping -c 2 172.16.1.14" ansible all -b -m shell -a "ping -c 2 192.168.2.15" ansible all -b -m shell -a "ping -c 2 192.168.3.10" ansible all -b -m shell -a "ping -c 2 172.16.2.11" ansible all -b -m shell -a "ping -c 2 172.16.1.11" ansible all -b -m shell -a "ping -c 2 192.168.2.12" ansible all -b -m shell -a "ping -c 2 192.168.3.12" ansible all -b -m shell -a "ping -c 2 172.16.2.13" ansible all -b -m shell -a "ping -c 2 172.16.1.13" ansible all -b -m shell -a "ping -c 2 192.168.2.14" ansible all -b -m shell -a "ping -c 2 192.168.1.38" ansible all -b -m shell -a "ping -c 2 192.168.3.14" ansible all -b -m shell -a "ping -c 2 172.16.2.15" ansible all -b -m shell -a "ping -c 2 172.16.1.15" ansible all -b -m shell -a "ping -c 2 192.168.2.16" [4] 2016-05-04 04:19:47 [overcloud]: UPDATE_COMPLETE Stack UPDATE completed successfully Stack overcloud UPDATE_COMPLETE Overcloud Endpoint: http://10.19.139.37:5000/v2.0 Overcloud Deployed real 15m6.304s user 0m16.803s sys 0m1.629s [stack@hci-director ~]$ [5] https://bugs.launchpad.net/tripleo/+bug/1553243 closed, no need for needinfo. |