Bug 1714584 - [OSP15] Use of skip-deploy-identifier flag cause compute scale-out failure
Summary: [OSP15] Use of skip-deploy-identifier flag cause compute scale-out failure
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 15.0 (Stein)
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: rc
: 15.0 (Stein)
Assignee: Martin Schuppert
QA Contact: Victor Voronkov
URL:
Whiteboard:
Depends On:
Blocks: 1714590 1752900
TreeView+ depends on / blocked
 
Reported: 2019-05-28 11:28 UTC by Victor Voronkov
Modified: 2019-09-26 10:51 UTC (History)
7 users (show)

Fixed In Version: openstack-tripleo-heat-templates-10.6.1-0.20190711090428.245f17c.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1714590 (view as bug list)
Environment:
Last Closed: 2019-09-21 11:22:34 UTC
Target Upstream Version:
Embargoed:
vvoronko: needinfo+


Attachments (Terms of Use)
Tempest results (112.69 KB, application/xhtml+xml)
2019-07-16 06:13 UTC, Victor Voronkov
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1831711 0 None None None 2019-06-05 08:19:33 UTC
OpenStack gerrit 669802 0 None MERGED Move nova cell v2 discovery to deploy_steps_tasks 2020-09-30 19:58:51 UTC
Red Hat Product Errata RHEA-2019:2811 0 None None None 2019-09-21 11:22:55 UTC

Description Victor Voronkov 2019-05-28 11:28:37 UTC
Description of problem:
Recent changes for edge scenarios caused intended move of discovery from controller to bootstrap compute node, so now this task is triggered by deploy-identifier [1], meaning - with --skip-deploy-identifier flag used, discovery will not be triggered at all and as result causing failures in previously supported scenarios [2]

[1] - https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/nova/nova-compute-container-puppet.yaml#L667

[2] - http://staging-jenkins2-qe-playground.usersys.redhat.com/view/DFG/view/df/view/rfe/job/DFG-df-rfe-15-virsh-3cont_2comp_3ceph-skip-deploy-identifier-scaleup-poc/1/testReport/

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Martin Schuppert 2019-05-28 12:12:56 UTC
related to BZ1693563 where host discovery was moved to computes, specifically the bootstrap node to not run into a race when the discovery command gets triggered on multiple computes at the same time.

To make sure we run at every overcloud deploy/scale/.. the deploy identifier gets passed to the step. if we skip the identifier the step does not run.

A workaround would be to manually run the nova-manage discovery after the deploy run when --skip-deploy-identifier was used.

Comment 3 Martin Schuppert 2019-06-03 12:10:37 UTC
(In reply to Martin Schuppert from comment #1)
> related to BZ1693563 where host discovery was moved to computes,
> specifically the bootstrap node to not run into a race when the discovery
> command gets triggered on multiple computes at the same time.
> 
> To make sure we run at every overcloud deploy/scale/.. the deploy identifier
> gets passed to the step. if we skip the identifier the step does not run.
> 
> A workaround would be to manually run the nova-manage discovery after the
> deploy run when --skip-deploy-identifier was used.

just to extend the information of the mentioned workaround:

1) ssh to a node of the overcloud, e.g. one of the controllers running nova_api 

2) enter the container and run the cell v2 discovery to map the scaled out compute to the cell
[root@overcloud-controller-0 /]# docker exec -it -u root nova_api sh
()[root@overcloud-controller-0 /]$ nova-manage cell_v2 discover_hosts --by-service --verbose

Comment 4 Victor Voronkov 2019-06-10 05:30:45 UTC
Proposed workaround was tested, I ssh-ed to a controller-0 and performed

# podman exec -it -u root nova_api sh
nova-manage cell_v2 discover_hosts --by-service --verbose

It found and it mapped compute-2, tempest smoke doesn't fail on "host compute-2 is not mapped to any cell" anymore

Comment 7 Victor Voronkov 2019-07-16 06:13:06 UTC
Verified on puddle RHOS_TRUNK-15.0-RHEL-8-20190714.n.0

All tempest tests passed

Comment 8 Victor Voronkov 2019-07-16 06:13:56 UTC
Created attachment 1590973 [details]
Tempest results

Comment 10 errata-xmlrpc 2019-09-21 11:22:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:2811


Note You need to log in before you can comment on or make changes to this bug.