Description of problem: Deployment of clustered ODL (3 Controllers + 3 ODLs on same node) with 1 compute node succeeded but then scaling to 23 compute nodes failed in step 4 with the following error on overcloud deploy command overcloud.AllNodesDeploySteps.1029pComputeDeployment_Step4.8: resource_type: OS::Heat::StructuredDeployment physical_resource_id: 8110622e-2921-43ec-93be-d06110059ad4 status: CREATE_FAILED status_reason: | Error: resources[8]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2 deploy_stdout: | ... "Error: curl -k -o /dev/null --fail --silent --head -u admin:3q3BD7yp42KyZ3gY8C8BUxJEd http://172.16.0.10:8081/restconf/operational/network-topology:network-topology/topology/netvirt:1 returned 22 instead of one of [0]", "Error: /Stage[main]/Neutron::Plugins::Ovs::Opendaylight/Exec[Wait for NetVirt OVSDB to come up]/returns: change from notrun to 0 failed: curl -k -o /dev/null --fail --silent --head -u admin:3q3BD7yp42KyZ3gY8C8BUxJEd http://172.16.0.10:8081/restconf/operational/network-topology:network-topology/topology/netvirt:1 returned 22 instead of one of [0]", "Warning: /Stage[main]/Neutron::Plugins::Ovs::Opendaylight/Exec[Set OVS Manager to OpenDaylight]: Skipping because of failed dependencies" ] } to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/f3e4f578-b89d-419e-a283-67d51968346a_playbook.retry Version-Release number of selected component (if applicable): OSP13 opendaylight-8.3.0-1.el7ost.noarch How reproducible: Intermittent Steps to Reproduce: 1. Deploy overcloud with ODL 2. Update deployment with more compute nodes 3. Actual results: Update fails Expected results: Update should succeed Additional info:
Created attachment 1472896 [details] controller-0
Created attachment 1472897 [details] controller-1
Created attachment 1472898 [details] controller-2
Please provide logs of the update process and ODL container up time before and after scale update.
Janki, Overcloud deployment logs in the sense, the output from the overcloud deploy comamnd or something else?
Output of scale update command. Is that the same deploy command?
Sai, Please try with this patch https://review.openstack.org/#/c/612663/. BZ https://bugzilla.redhat.com/show_bug.cgi?id=1623123 has the same root cause.
Queens/OSP13 patches are https://review.openstack.org/#/c/614578/ https://review.openstack.org/#/c/615122/
I do not have a setup with the scale described in this bug description to verify.
Scaling from 1 to 2 computes should be fine too. The process of scaling needs to be verified and not the amount to scale to.
Still seeing on ith non-zero status code: 2 2018-11-15 18:12:11Z [overcloud-AllNodesDeploySteps-4qyog3wzcbqu-ComputeDeployment_Step4-5zujtp4ilxv4]: UPDATE_FAILED Resource UPDATE failed: Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2 2018-11-15 18:12:11Z [overcloud-AllNodesDeploySteps-4qyog3wzcbqu.ComputeDeployment_Step4]: UPDATE_FAILED Error: resources.ComputeDeployment_Step4.resources[1]: Deployment to server failed: deploy_status_code: D eployment exited with non-zero status code: 2 2018-11-15 18:12:11Z [overcloud-AllNodesDeploySteps-4qyog3wzcbqu]: UPDATE_FAILED Resource UPDATE failed: Error: resources.ComputeDeployment_Step4.resources[1]: Deployment to server failed: deploy_status_code: D eployment exited with non-zero status code: 2 2018-11-15 18:12:11Z [AllNodesDeploySteps]: UPDATE_FAILED Error: resources.AllNodesDeploySteps.resources.ComputeDeployment_Step4.resources[1]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2 2018-11-15 18:12:11Z [overcloud]: UPDATE_FAILED Resource UPDATE failed: Error: resources.AllNodesDeploySteps.resources.ComputeDeployment_Step4.resources[1]: Deployment to server failed: deploy_status_code: Depl oyment exited with non-zero status code: 2 Stack overcloud UPDATE_FAILED overcloud.AllNodesDeploySteps.ComputeDeployment_Step4.1: resource_type: OS::Heat::StructuredDeployment physical_resource_id: eea502b5-3c54-4c9b-b2b4-157edc94cddb status: UPDATE_FAILED status_reason: | Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2 deploy_stdout: | ... "Error: curl -k -o /dev/null --fail --silent --head -u admin:YANBwTyEKhEdKXyebDNvcgWHQ http://172.16.0.30:8081/restconf/operational/network-topology:network-topology/topology/netvirt:1 returned 22 in stead of one of [0]", "Error: /Stage[main]/Neutron::Plugins::Ovs::Opendaylight/Exec[Wait for NetVirt OVSDB to come up]/returns: change from notrun to 0 failed: curl -k -o /dev/null --fail --silent --head -u admin:YANBwTyE KhEdKXyebDNvcgWHQ http://172.16.0.30:8081/restconf/operational/network-topology:network-topology/topology/netvirt:1 returned 22 instead of one of [0]", "Warning: /Stage[main]/Neutron::Plugins::Ovs::Opendaylight/Exec[Set OVS Manager to OpenDaylight]: Skipping because of failed dependencies" ] } to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/feba5525-9931-47c2-9ab8-ccba81e84dac_playbook.retry puddle 2018-10-30.1. I'm guessing the puddle doesn't have required changes?
Isn't https://bugzilla.redhat.com/show_bug.cgi?id=1623123 a duplicate of this bug.
Tried the Queens patches manually, update still failing.
Moving to ON_DEV as the update was failing. Testing locally with this patch https://review.openstack.org/#/c/620053/ and I successfully scaled out from 1 compute to 2 computes.
To verify: 1) Delete the stack: openstack stack delete overcloud --wait --yes 2) In ~/virt/nodes_data.yaml - Change: ComputeCount: 1 3) Deploy: ./overcloud_deploy.sh 4) Once it is deployed, change in ~/virt/nodes_data.yaml: ComputeCount: 2 5) Redeploy: ./overcloud_deploy.sh
According to our records, this should be resolved by openstack-tripleo-heat-templates-8.0.7-21.el7ost. This build is available now.
As per depreciation notice [1], closing this bug. Please reopen if relevant for RHOSP13, as this is the only version shipping ODL. [1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/14/html-single/release_notes/index#deprecated_functionality