Description of problem: Recent changes for edge scenarios caused intended move of discovery from controller to bootstrap compute node, so now this task is triggered by deploy-identifier [1], meaning - with --skip-deploy-identifier flag used, discovery will not be triggered at all and as result causing failures in previously supported scenarios [2] [1] - https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/nova/nova-compute-container-puppet.yaml#L667 [2] - http://staging-jenkins2-qe-playground.usersys.redhat.com/view/DFG/view/df/view/rfe/job/DFG-df-rfe-15-virsh-3cont_2comp_3ceph-skip-deploy-identifier-scaleup-poc/1/testReport/ Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
related to BZ1693563 where host discovery was moved to computes, specifically the bootstrap node to not run into a race when the discovery command gets triggered on multiple computes at the same time. To make sure we run at every overcloud deploy/scale/.. the deploy identifier gets passed to the step. if we skip the identifier the step does not run. A workaround would be to manually run the nova-manage discovery after the deploy run when --skip-deploy-identifier was used.
(In reply to Martin Schuppert from comment #1) > related to BZ1693563 where host discovery was moved to computes, > specifically the bootstrap node to not run into a race when the discovery > command gets triggered on multiple computes at the same time. > > To make sure we run at every overcloud deploy/scale/.. the deploy identifier > gets passed to the step. if we skip the identifier the step does not run. > > A workaround would be to manually run the nova-manage discovery after the > deploy run when --skip-deploy-identifier was used. just to extend the information of the mentioned workaround: 1) ssh to a node of the overcloud, e.g. one of the controllers running nova_api 2) enter the container and run the cell v2 discovery to map the scaled out compute to the cell [root@overcloud-controller-0 /]# docker exec -it -u root nova_api sh ()[root@overcloud-controller-0 /]$ nova-manage cell_v2 discover_hosts --by-service --verbose
Proposed workaround was tested, I ssh-ed to a controller-0 and performed # podman exec -it -u root nova_api sh nova-manage cell_v2 discover_hosts --by-service --verbose It found and it mapped compute-2, tempest smoke doesn't fail on "host compute-2 is not mapped to any cell" anymore
Verified on puddle RHOS_TRUNK-15.0-RHEL-8-20190714.n.0 All tempest tests passed
Created attachment 1590973 [details] Tempest results
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:2811