Bug 1611960 - [Deployment] [Scale] Stack Update fails on scaling out in a Clustered ODL setup
Summary: [Deployment] [Scale] Stack Update fails on scaling out in a Clustered ODL setup
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: z5
: 13.0 (Queens)
Assignee: Janki
QA Contact: Sai Sindhur Malleni
URL:
Whiteboard: Deployment
Depends On:
Blocks: 1652446
TreeView+ depends on / blocked
 
Reported: 2018-08-03 07:25 UTC by Sai Sindhur Malleni
Modified: 2021-12-10 16:59 UTC (History)
8 users (show)

Fixed In Version: openstack-tripleo-heat-templates-8.0.7-12.el7ost
Doc Type: Release Note
Doc Text:
With this update, Compute nodes in a Red Hat OpenStack Platform environment that uses OpenDaylight as a back end can be scaled successfully.
Clone Of:
: 1652446 (view as bug list)
Environment:
Last Closed: 2019-03-06 16:16:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
controller-0 (6.99 MB, application/x-gzip)
2018-08-03 07:34 UTC, Sai Sindhur Malleni
no flags Details
controller-1 (7.30 MB, application/x-gzip)
2018-08-03 07:35 UTC, Sai Sindhur Malleni
no flags Details
controller-2 (6.97 MB, application/x-gzip)
2018-08-03 07:37 UTC, Sai Sindhur Malleni
no flags Details


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 612663 0 None MERGED Delete empty karaf directory on host 2020-07-10 10:17:56 UTC
OpenStack gerrit 613881 0 None MERGED Delete empty karaf directory on host 2020-07-10 10:17:56 UTC
OpenStack gerrit 614578 0 None MERGED Delete empty karaf directory on host 2020-07-10 10:17:57 UTC
OpenStack gerrit 615122 0 None MERGED [queens-only] Correct misindentation 2020-07-10 10:17:55 UTC
OpenStack gerrit 620053 0 None MERGED Don't mount data folder 2020-07-10 10:17:56 UTC
Red Hat Issue Tracker ODL-49 0 None None None 2021-12-10 16:59:45 UTC
Red Hat Issue Tracker OSP-11518 0 None None None 2021-12-10 16:59:47 UTC

Description Sai Sindhur Malleni 2018-08-03 07:25:38 UTC
Description of problem:
Deployment of clustered ODL (3 Controllers + 3 ODLs on same node) with 1 compute node succeeded but then scaling to 23 compute nodes failed in step 4 with the following error on overcloud deploy command

overcloud.AllNodesDeploySteps.1029pComputeDeployment_Step4.8:
  resource_type: OS::Heat::StructuredDeployment
  physical_resource_id: 8110622e-2921-43ec-93be-d06110059ad4
  status: CREATE_FAILED
  status_reason: |
    Error: resources[8]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
  deploy_stdout: |
    ...
            "Error: curl -k -o /dev/null --fail --silent --head -u admin:3q3BD7yp42KyZ3gY8C8BUxJEd http://172.16.0.10:8081/restconf/operational/network-topology:network-topology/topology/netvirt:1 returned 22 instead of one of [0]",
            "Error: /Stage[main]/Neutron::Plugins::Ovs::Opendaylight/Exec[Wait for NetVirt OVSDB to come up]/returns: change from notrun to 0 failed: curl -k -o /dev/null --fail --silent --head -u admin:3q3BD7yp42KyZ3gY8C8BUxJEd http://172.16.0.10:8081/restconf/operational/network-topology:network-topology/topology/netvirt:1 returned 22 instead of one of [0]",
            "Warning: /Stage[main]/Neutron::Plugins::Ovs::Opendaylight/Exec[Set OVS Manager to OpenDaylight]: Skipping because of failed dependencies"
        ]
    }
        to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/f3e4f578-b89d-419e-a283-67d51968346a_playbook.retry

Version-Release number of selected component (if applicable):
OSP13
opendaylight-8.3.0-1.el7ost.noarch

How reproducible:
Intermittent

Steps to Reproduce:
1. Deploy overcloud with ODL
2. Update deployment with more compute nodes
3.

Actual results:
Update fails

Expected results:
Update should succeed

Additional info:

Comment 1 Sai Sindhur Malleni 2018-08-03 07:34:27 UTC
Created attachment 1472896 [details]
controller-0

Comment 2 Sai Sindhur Malleni 2018-08-03 07:35:49 UTC
Created attachment 1472897 [details]
controller-1

Comment 3 Sai Sindhur Malleni 2018-08-03 07:37:28 UTC
Created attachment 1472898 [details]
controller-2

Comment 5 Janki 2018-08-08 05:12:48 UTC
Please provide logs of the update process and ODL container up time before and after scale update.

Comment 6 Sai Sindhur Malleni 2018-08-27 13:56:12 UTC
Janki,

Overcloud deployment logs in the sense, the output from the overcloud deploy comamnd or something else?

Comment 7 Janki 2018-08-28 08:59:30 UTC
Output of scale update command. Is that the same deploy command?

Comment 8 Janki 2018-10-23 13:09:37 UTC
Sai, Please try with this patch https://review.openstack.org/#/c/612663/.

BZ https://bugzilla.redhat.com/show_bug.cgi?id=1623123 has the same root cause.

Comment 10 Sai Sindhur Malleni 2018-11-07 16:47:45 UTC
I do not have a setup with the scale described in this bug description to verify.

Comment 11 Janki 2018-11-09 05:19:50 UTC
Scaling from 1 to 2 computes should be fine too. The process of scaling needs to be verified and not the amount to scale to.

Comment 12 Sai Sindhur Malleni 2018-11-15 19:01:13 UTC
Still seeing on 
ith non-zero status code: 2
2018-11-15 18:12:11Z [overcloud-AllNodesDeploySteps-4qyog3wzcbqu-ComputeDeployment_Step4-5zujtp4ilxv4]: UPDATE_FAILED  Resource UPDATE failed: Error: resources[1]: Deployment to server failed: deploy_status_code
 : Deployment exited with non-zero status code: 2
2018-11-15 18:12:11Z [overcloud-AllNodesDeploySteps-4qyog3wzcbqu.ComputeDeployment_Step4]: UPDATE_FAILED  Error: resources.ComputeDeployment_Step4.resources[1]: Deployment to server failed: deploy_status_code: D
eployment exited with non-zero status code: 2
2018-11-15 18:12:11Z [overcloud-AllNodesDeploySteps-4qyog3wzcbqu]: UPDATE_FAILED  Resource UPDATE failed: Error: resources.ComputeDeployment_Step4.resources[1]: Deployment to server failed: deploy_status_code: D
eployment exited with non-zero status code: 2
2018-11-15 18:12:11Z [AllNodesDeploySteps]: UPDATE_FAILED  Error: resources.AllNodesDeploySteps.resources.ComputeDeployment_Step4.resources[1]: Deployment to server failed: deploy_status_code: Deployment exited 
with non-zero status code: 2
2018-11-15 18:12:11Z [overcloud]: UPDATE_FAILED  Resource UPDATE failed: Error: resources.AllNodesDeploySteps.resources.ComputeDeployment_Step4.resources[1]: Deployment to server failed: deploy_status_code: Depl
oyment exited with non-zero status code: 2

 Stack overcloud UPDATE_FAILED 

overcloud.AllNodesDeploySteps.ComputeDeployment_Step4.1:
  resource_type: OS::Heat::StructuredDeployment
  physical_resource_id: eea502b5-3c54-4c9b-b2b4-157edc94cddb
  status: UPDATE_FAILED
  status_reason: |
    Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
  deploy_stdout: |
    ...
            "Error: curl -k -o /dev/null --fail --silent --head -u admin:YANBwTyEKhEdKXyebDNvcgWHQ http://172.16.0.30:8081/restconf/operational/network-topology:network-topology/topology/netvirt:1 returned 22 in
stead of one of [0]", 
            "Error: /Stage[main]/Neutron::Plugins::Ovs::Opendaylight/Exec[Wait for NetVirt OVSDB to come up]/returns: change from notrun to 0 failed: curl -k -o /dev/null --fail --silent --head -u admin:YANBwTyE
KhEdKXyebDNvcgWHQ http://172.16.0.30:8081/restconf/operational/network-topology:network-topology/topology/netvirt:1 returned 22 instead of one of [0]", 
            "Warning: /Stage[main]/Neutron::Plugins::Ovs::Opendaylight/Exec[Set OVS Manager to OpenDaylight]: Skipping because of failed dependencies"
        ]
    }
        to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/feba5525-9931-47c2-9ab8-ccba81e84dac_playbook.retry
    
puddle 2018-10-30.1. I'm guessing the puddle doesn't have required changes?

Comment 13 Sai Sindhur Malleni 2018-11-15 19:16:34 UTC
Isn't https://bugzilla.redhat.com/show_bug.cgi?id=1623123  a duplicate of this bug.

Comment 14 Sai Sindhur Malleni 2018-11-16 14:12:32 UTC
Tried the Queens patches manually, update still failing.

Comment 15 Janki 2018-11-29 15:20:49 UTC
Moving to ON_DEV as the update was failing.

Testing locally with this patch

https://review.openstack.org/#/c/620053/

and I successfully scaled out from 1 compute to 2 computes.

Comment 18 Noam Manos 2018-12-17 14:21:08 UTC
To verify:

1) Delete the stack: openstack stack delete overcloud --wait --yes

2) In ~/virt/nodes_data.yaml - Change: ComputeCount: 1 

3) Deploy: ./overcloud_deploy.sh

4) Once it is deployed, change in ~/virt/nodes_data.yaml: ComputeCount: 2

5) Redeploy: ./overcloud_deploy.sh

Comment 20 Lon Hohberger 2019-01-17 11:33:57 UTC
According to our records, this should be resolved by openstack-tripleo-heat-templates-8.0.7-21.el7ost.  This build is available now.

Comment 21 Franck Baudin 2019-03-06 16:16:20 UTC
As per depreciation notice [1], closing this bug. Please reopen if relevant for RHOSP13, as this is the only version shipping ODL.

[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/14/html-single/release_notes/index#deprecated_functionality

Comment 22 Franck Baudin 2019-03-06 16:17:40 UTC
As per depreciation notice [1], closing this bug. Please reopen if relevant for RHOSP13, as this is the only version shipping ODL.

[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/14/html-single/release_notes/index#deprecated_functionality


Note You need to log in before you can comment on or make changes to this bug.