Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1714584

Summary: [OSP15] Use of skip-deploy-identifier flag cause compute scale-out failure
Product: Red Hat OpenStack Reporter: Victor Voronkov <vvoronko>
Component: openstack-tripleo-heat-templatesAssignee: Martin Schuppert <mschuppe>
Status: CLOSED ERRATA QA Contact: Victor Voronkov <vvoronko>
Severity: high Docs Contact:
Priority: unspecified    
Version: 15.0 (Stein)CC: lyarwood, mbooth, mburns, mcornea, mschuppe, ravsingh, slinaber
Target Milestone: rcKeywords: Patch, Triaged
Target Release: 15.0 (Stein)Flags: vvoronko: needinfo+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-10.6.1-0.20190711090428.245f17c.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1714590 (view as bug list) Environment:
Last Closed: 2019-09-21 11:22:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1714590, 1752900    
Attachments:
Description Flags
Tempest results none

Description Victor Voronkov 2019-05-28 11:28:37 UTC
Description of problem:
Recent changes for edge scenarios caused intended move of discovery from controller to bootstrap compute node, so now this task is triggered by deploy-identifier [1], meaning - with --skip-deploy-identifier flag used, discovery will not be triggered at all and as result causing failures in previously supported scenarios [2]

[1] - https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/nova/nova-compute-container-puppet.yaml#L667

[2] - http://staging-jenkins2-qe-playground.usersys.redhat.com/view/DFG/view/df/view/rfe/job/DFG-df-rfe-15-virsh-3cont_2comp_3ceph-skip-deploy-identifier-scaleup-poc/1/testReport/

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Martin Schuppert 2019-05-28 12:12:56 UTC
related to BZ1693563 where host discovery was moved to computes, specifically the bootstrap node to not run into a race when the discovery command gets triggered on multiple computes at the same time.

To make sure we run at every overcloud deploy/scale/.. the deploy identifier gets passed to the step. if we skip the identifier the step does not run.

A workaround would be to manually run the nova-manage discovery after the deploy run when --skip-deploy-identifier was used.

Comment 3 Martin Schuppert 2019-06-03 12:10:37 UTC
(In reply to Martin Schuppert from comment #1)
> related to BZ1693563 where host discovery was moved to computes,
> specifically the bootstrap node to not run into a race when the discovery
> command gets triggered on multiple computes at the same time.
> 
> To make sure we run at every overcloud deploy/scale/.. the deploy identifier
> gets passed to the step. if we skip the identifier the step does not run.
> 
> A workaround would be to manually run the nova-manage discovery after the
> deploy run when --skip-deploy-identifier was used.

just to extend the information of the mentioned workaround:

1) ssh to a node of the overcloud, e.g. one of the controllers running nova_api 

2) enter the container and run the cell v2 discovery to map the scaled out compute to the cell
[root@overcloud-controller-0 /]# docker exec -it -u root nova_api sh
()[root@overcloud-controller-0 /]$ nova-manage cell_v2 discover_hosts --by-service --verbose

Comment 4 Victor Voronkov 2019-06-10 05:30:45 UTC
Proposed workaround was tested, I ssh-ed to a controller-0 and performed

# podman exec -it -u root nova_api sh
nova-manage cell_v2 discover_hosts --by-service --verbose

It found and it mapped compute-2, tempest smoke doesn't fail on "host compute-2 is not mapped to any cell" anymore

Comment 7 Victor Voronkov 2019-07-16 06:13:06 UTC
Verified on puddle RHOS_TRUNK-15.0-RHEL-8-20190714.n.0

All tempest tests passed

Comment 8 Victor Voronkov 2019-07-16 06:13:56 UTC
Created attachment 1590973 [details]
Tempest results

Comment 10 errata-xmlrpc 2019-09-21 11:22:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:2811