Bug 1752900

Summary: [OSP13] Use of skip-deploy-identifier flag cause compute scale-out failure
Product: Red Hat OpenStack Reporter: Martin Schuppert <mschuppe>
Component: openstack-tripleo-heat-templatesAssignee: Martin Schuppert <mschuppe>
Status: CLOSED ERRATA QA Contact: James Parker <jparker>
Severity: high Docs Contact:
Priority: high    
Version: 14.0 (Rocky)CC: apetrich, jhakimra, jparker, mbooth, mburns, mcornea, mschuppe, owalsh, ravsingh, sasha, slinaber
Target Milestone: z11Keywords: Patch, Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-common-8.7.1-5.el7ost openstack-tripleo-heat-templates-8.4.1-17.el7ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: 1714590 Environment:
Last Closed: 2020-03-10 11:22:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1714584, 1714590    
Bug Blocks: 1726483    

Description Martin Schuppert 2019-09-17 14:08:52 UTC
+++ This bug was initially created as a clone of Bug #1714590 +++

+++ This bug was initially created as a clone of Bug #1714584 +++

Description of problem:
Recent changes for edge scenarios caused intended move of discovery from controller to bootstrap compute node, this change was also backported to RHOS 14 so now this task is triggered by deploy-identifier [1], meaning - with --skip-deploy-identifier flag used, discovery will not be triggered at all and as result causing failures in previously supported scenarios [2]

[1] - https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/nova/nova-compute-container-puppet.yaml#L667

[2] - https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/DFG-df-rfe-14-virsh-3cont_2comp_3ceph-skip-deploy-identifier-scaleup/24//artifact/tempest-results/tempest-results-smoke.1.html

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Victor Voronkov on 2019-05-28 11:49:29 UTC ---

Documentation here https://docs.openstack.org/tripleo-docs/latest/install/advanced_deployment/server_blacklist.html#skip-deploy-identifier

--- Additional comment from Martin Schuppert on 2019-05-28 12:11:34 UTC ---

related to BZ1693563 where host discovery was moved to computes, specifically the bootstrap node to not run into a race when the discovery command gets triggered on multiple computes at the same time.

To make sure we run at every overcloud deploy/scale/.. the deploy identifier gets passed to the step. if we skip the identifier the step does not run.

A workaround would be to manually run the nova-manage discovery after the deploy run when --skip-deploy-identifier was used.


================================

This BZ is to track the feasibility to fix this in queens. The fix in rocky+ can't be backported
as we don't have a resource which runs deploy_steps_task when not using config-download. Therefore
host discovery does not work when config-download is not used, which is the default in queens.

Comment 6 errata-xmlrpc 2020-03-10 11:22:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0760