Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1714584

Summary:

[OSP15] Use of skip-deploy-identifier flag cause compute scale-out failure

Product:

Red Hat OpenStack

Reporter:

Victor Voronkov <vvoronko>

Component:

openstack-tripleo-heat-templates

Assignee:

Martin Schuppert <mschuppe>

Status:

CLOSED ERRATA

QA Contact:

Victor Voronkov <vvoronko>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

15.0 (Stein)

CC:

lyarwood, mbooth, mburns, mcornea, mschuppe, ravsingh, slinaber

Target Milestone:

Keywords:

Patch, Triaged

Target Release:

15.0 (Stein)

Flags:

vvoronko: needinfo+

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

openstack-tripleo-heat-templates-10.6.1-0.20190711090428.245f17c.el8ost

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Clones:

1714590 (view as bug list)

Environment:

Last Closed:

2019-09-21 11:22:34 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1714590, 1752900

Attachments:

Description	Flags
Tempest results	none

Description Victor Voronkov 2019-05-28 11:28:37 UTC

Description of problem:
Recent changes for edge scenarios caused intended move of discovery from controller to bootstrap compute node, so now this task is triggered by deploy-identifier [1], meaning - with --skip-deploy-identifier flag used, discovery will not be triggered at all and as result causing failures in previously supported scenarios [2]

[1] - https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/nova/nova-compute-container-puppet.yaml#L667

[2] - http://staging-jenkins2-qe-playground.usersys.redhat.com/view/DFG/view/df/view/rfe/job/DFG-df-rfe-15-virsh-3cont_2comp_3ceph-skip-deploy-identifier-scaleup-poc/1/testReport/

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Martin Schuppert 2019-05-28 12:12:56 UTC

related to BZ1693563 where host discovery was moved to computes, specifically the bootstrap node to not run into a race when the discovery command gets triggered on multiple computes at the same time.

To make sure we run at every overcloud deploy/scale/.. the deploy identifier gets passed to the step. if we skip the identifier the step does not run.

A workaround would be to manually run the nova-manage discovery after the deploy run when --skip-deploy-identifier was used.

Comment 3 Martin Schuppert 2019-06-03 12:10:37 UTC

(In reply to Martin Schuppert from comment #1)
> related to BZ1693563 where host discovery was moved to computes,
> specifically the bootstrap node to not run into a race when the discovery
> command gets triggered on multiple computes at the same time.
> 
> To make sure we run at every overcloud deploy/scale/.. the deploy identifier
> gets passed to the step. if we skip the identifier the step does not run.
> 
> A workaround would be to manually run the nova-manage discovery after the
> deploy run when --skip-deploy-identifier was used.

just to extend the information of the mentioned workaround:

1) ssh to a node of the overcloud, e.g. one of the controllers running nova_api 

2) enter the container and run the cell v2 discovery to map the scaled out compute to the cell
[root@overcloud-controller-0 /]# docker exec -it -u root nova_api sh
()[root@overcloud-controller-0 /]$ nova-manage cell_v2 discover_hosts --by-service --verbose

Comment 4 Victor Voronkov 2019-06-10 05:30:45 UTC

Proposed workaround was tested, I ssh-ed to a controller-0 and performed

# podman exec -it -u root nova_api sh
nova-manage cell_v2 discover_hosts --by-service --verbose

It found and it mapped compute-2, tempest smoke doesn't fail on "host compute-2 is not mapped to any cell" anymore

Comment 7 Victor Voronkov 2019-07-16 06:13:06 UTC

Verified on puddle RHOS_TRUNK-15.0-RHEL-8-20190714.n.0

All tempest tests passed

Comment 8 Victor Voronkov 2019-07-16 06:13:56 UTC

Created attachment 1590973 [details]
Tempest results

Comment 10 errata-xmlrpc 2019-09-21 11:22:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:2811