Description of problem: Overcloud deploy command with pre-provisioned resources not triggered config-download after heat stack update. Version-Release number of selected component (if applicable): Red Hat OpenStack Platform release 16.0.1 (Train) python3-tripleo-common-11.3.3-0.20200403044648.56c0fd5.el8ost.noarch openstack-tripleo-common-11.3.3-0.20200403044648.56c0fd5.el8ost.noarch ansible-role-tripleo-modify-image-1.1.1-0.20200302230738.bb6f78d.el8ost.noarch openstack-tripleo-validations-11.3.2-0.20200318124452.3fd14c9.el8ost.noarch puppet-tripleo-11.4.1-0.20200402130301.b4678ba.el8ost.noarch tripleo-ansible-0.4.2-0.20200404124614.67005aa.el8ost.noarch python3-tripleoclient-heat-installer-12.3.2-0.20200405044622.fdce01f.el8ost.noarch ansible-tripleo-ipsec-9.2.1-0.20200302220300.0c8693c.el8ost.noarch openstack-tripleo-image-elements-10.6.2-0.20200314025720.8c91b46.el8ost.noarch openstack-tripleo-puppet-elements-11.2.2-0.20200302235857.a6fef08.el8ost.noarch python3-tripleoclient-12.3.2-0.20200405044622.fdce01f.el8ost.noarch openstack-tripleo-common-containers-11.3.3-0.20200403044648.56c0fd5.el8ost.noarch openstack-tripleo-heat-templates-11.3.2-0.20200405044622.ec9970c.el8ost.noarch How reproducible: 100% reproduced in the scale lab environment. Steps to Reproduce: Reference guide: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.0/html-single/director_installation_and_usage/index#configuring-a-basic-overcloud-with-pre-provisioned-nodes 1. Successfully executed section 9.1, 9.2, 9.3, 9.4, https://gist.github.com/pradiptapks/59cf762918dd08e61e64f41c7104ddc1 2. Updated THT resource in template: section 9.5 and 9.7 https://gist.githubusercontent.com/pradiptapks/28dfd6b196af519509020f62f7531464/raw/afc5c98cda03b4866a29635e0101d2e6dda54693/ctlplane-assignments.yaml https://gist.githubusercontent.com/pradiptapks/28dfd6b196af519509020f62f7531464/raw/afc5c98cda03b4866a29635e0101d2e6dda54693/hostname-map.yaml https://gist.githubusercontent.com/pradiptapks/28dfd6b196af519509020f62f7531464/raw/afc5c98cda03b4866a29635e0101d2e6dda54693/swift-data-map.yaml 3. Skip 9.8 "Ceph Storage for pre-provisioned nodes" as we didn't include Ceph nodes. 4. Successfully deploy the heat stack with below script. https://gist.githubusercontent.com/pradiptapks/28dfd6b196af519509020f62f7531464/raw/afc5c98cda03b4866a29635e0101d2e6dda54693/deploy.sh ~~~ time openstack overcloud deploy \ --timeout 240 \ --templates /usr/share/openstack-tripleo-heat-templates \ --stack overcloud \ --libvirt-type kvm \ --ntp-server clock1.rdu2.redhat.com \ --disable-validations \ -e /usr/share/openstack-tripleo-heat-templates/environments/deployed-server-environment.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/virt/ctlplane-assignments.yaml \ -e /home/stack/virt/hostname-map.yaml \ -e /home/stack/virt/swift-data-map.yaml \ -e /home/stack/virt/debug.yaml \ -e /home/stack/containers-prepare-parameter.yaml \ -e /home/stack/virt/docker-images.yaml \ --overcloud-ssh-key ~/.ssh/id_rsa --log-file overcloud_install.log &> /home/stack/overcloud_install.log ~~~ 5. $ openstack stack list +--------------------------------------+------------+----------------------------------+-----------------+----------------------+--------------+ | ID | Stack Name | Project | Stack Status | Creation Time | Updated Time | +--------------------------------------+------------+----------------------------------+-----------------+----------------------+--------------+ | ded41a83-f3c0-484d-be80-aad4fefd2d85 | overcloud | ab9a2a109d4842c395fe0569df6fc6dc | CREATE_COMPLETE | 2020-04-17T13:17:43Z | None | +--------------------------------------+------------+----------------------------------+-----------------+----------------------+--------------+ $ openstack overcloud status +-----------+-------------------+ | Plan Name | Deployment Status | +-----------+-------------------+ | overcloud | DEPLOY_FAILED | +-----------+-------------------+ 6. It seems after the heat stack update, the deploy command ended without triggering config-download. ~~~ 2020-04-17 13:22:31Z [overcloud.AllNodesDeploySteps.ControllerPostConfig]: CREATE_COMPLETE state changed 2020-04-17 13:22:31Z [overcloud.AllNodesDeploySteps]: CREATE_COMPLETE Stack CREATE completed successfully 2020-04-17 13:22:31Z [overcloud.AllNodesDeploySteps]: CREATE_COMPLETE state changed 2020-04-17 13:22:31Z [overcloud]: CREATE_COMPLETE Stack CREATE completed successfully Stack overcloud/ded41a83-f3c0-484d-be80-aad4fefd2d85 CREATE_COMPLETE Deploying overcloud configuration ~~~ 7. Validation of deployed-server heat rtesource: $ openstack stack resource list -n 5 overcloud | grep deployed-server | deployed-server | 96130122-0861-4f0b-8dd3-f074300e0cfb | OS::Heat::DeployedServer | CREATE_COMPLETE | 2020-04-17T13:20:04Z | overcloud-Compute-x5ijvin4wl2f-0-qbrusz3qbzhr-NovaCompute-dt24cay4q5vl | | deployed-server | 40061e14-fa1c-46ba-a27a-fc7092c44535 | OS::Heat::DeployedServer | CREATE_COMPLETE | 2020-04-17T13:22:08Z | overcloud-Controller-jgwapbtofkbk-0-yd5mrqcnrsqg-Controller-xyrbvwvi2tg4 8. Also, metadata server updated for deployed-server: $ openstack stack resource metadata --fit-width overcloud-Controller-jgwapbtofkbk-0-yd5mrqcnrsqg-Controller-xyrbvwvi2tg4 deployed-server | jq -r '.["os-collect-config"].request.metadata_url' https://192.168.24.2:13808/v1/AUTH_ab9a2a109d4842c395fe0569df6fc6dc/ov-ntroller-xyrbvwvi2tg4-deployed-server-nox2draamhqu/cb4d2a63-b2c1-4ba5-8580-55467c3fd1d1?temp_url_sig=c69fd50fbaa3c3f4fe63bbbb3f6af52528e99485&temp_url_expires=2147483586 $ openstack stack resource metadata --fit-width overcloud-Compute-x5ijvin4wl2f-0-qbrusz3qbzhr-NovaCompute-dt24cay4q5vl deployed-server | jq -r '.["os-collect-config"].request.metadata_url' https://192.168.24.2:13808/v1/AUTH_ab9a2a109d4842c395fe0569df6fc6dc/ov-aCompute-dt24cay4q5vl-deployed-server-6cyimglbgajc/fed580d3-d10e-4f5f-ace6-5f22fa8008d9?temp_url_sig=6d930c472f3b2ff9300048a89a237ceb3e705d54&temp_url_expires=2147483586 9. Referring upstream doc [1], have suggested to os-collect-config which I assume not applicable in OSP16 as we moved to the config-download. Actual results: Overcloud deployment failed without triggering mitral config-download Additional info: [1] https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deployed_server.html#deployment-command
Hi Luke, It seems ansible failed to generate the inventory due to no ctlplane network mapping. As per the upstream guide [2], I guess below env parameters should help. Since these env values not included in our downstream guide[1], so we would like to know thought on before to proceed. ~~~ export OVERCLOUD_ROLES="ControllerDeployedServer ComputeDeployedServer" export ControllerDeployedServer_hosts="192.168.25.1 192.168.25.2 192.168.25.3" export ComputeDeployedServer_hosts="192.168.25.4" ~~~ Reproduced steps: ---------------- 1. Copied the existing deployment script and updated with the config-download parameter to forcefully trigger ansible inventory. ~~~ $ diff test-deploy.sh test-deploy-config-download.sh 18c18,19 < --overcloud-ssh-key ~/.ssh/id_rsa --log-file overcloud_install.log &> /home/stack/overcloud_install.log --- > --config-download-only --config-download-timeout 300 \ > --overcloud-ssh-key ~/.ssh/id_rsa --log-file overcloud_install_config.log &> /home/stack/overcloud_install_config.log ~~~ 2. Overcloud failed to configure Ansible inventory. ~~~ $ tail -f overcloud_install_config.log Waiting for messages on queue 'tripleo' with no timeout. 2020-04-20 14:06:49.748 372067 WARNING tripleoclient.plugin [ admin] Waiting for messages on queue 'tripleo' with no timeout. 2020-04-20 14:07:27.419 372067 ERROR openstack [ admin] Overcloud configuration failed. or Controller role on ctlplane network', action_cls='<class 'mistral.actions.action_factory.AnsibleGenerateInventoryAction'>', attributes='{}', params='{'ansible_ssh_user': 'tripleo-admin', 'ansible_python_interpreter': None, 'work_dir': '/var/lib/mistral/overcloud', 'plan_name': 'overcloud', 'ssh_network': 'ctlplane', 'undercloud_key_file': '/var/lib/mistral/.ssh/tripleo-admin-rsa'}'] ~~~ $ openstack overcloud failures --stack overcloud Ansible errors file not found at /var/lib/mistral/overcloud/ansible-errors.json $ openstack action execution list --fit-width|grep -v SUCCESS +--------------------------------------+--------------------------------------------+-------------------------------------------------------+--------------------+------------------------------+--------------------------------------+---------+----------+---------------------+---------------------+ | ID | Name | Workflow name | Workflow namespace | Task name | Task ID | State | Accepted | Created at | Updated at | +--------------------------------------+--------------------------------------------+-------------------------------------------------------+--------------------+------------------------------+--------------------------------------+---------+----------+---------------------+---------------------+ | b23cd350-78ed-449c-8b2a-3769d90b8d1a | tripleo.ansible-generate-inventory | tripleo.deployment.v1.config_download_deploy | | generate_inventory | baea976e-6be3-4496-a30b-3d5133afaaaa | ERROR | True | 2020-04-20 14:07:25 | 2020-04-20 14:07:26 | +--------------------------------------+--------------------------------------------+-------------------------------------------------------+--------------------+------------------------------+--------------------------------------+---------+----------+---------------------+---------------------+ $ openstack action execution show b23cd350-78ed-449c-8b2a-3769d90b8d1a --fit-width +--------------------+----------------------------------------------+ | Field | Value | +--------------------+----------------------------------------------+ | ID | b23cd350-78ed-449c-8b2a-3769d90b8d1a | | Name | tripleo.ansible-generate-inventory | | Workflow name | tripleo.deployment.v1.config_download_deploy | | Workflow namespace | | | Task name | generate_inventory | | Task ID | baea976e-6be3-4496-a30b-3d5133afaaaa | | State | ERROR | | State info | None | | Accepted | True | | Created at | 2020-04-20 14:07:25 | | Updated at | 2020-04-20 14:07:26 | +--------------------+----------------------------------------------+ $ openstack action execution output show b23cd350-78ed-449c-8b2a-3769d90b8d1a { "result": "The action raised an exception [action_ex_id=b23cd350-78ed-449c-8b2a-3769d90b8d1a, msg='No IPs found for Controller role on ctlplane network', action_cls='<class 'mistral.actions.action_factory.AnsibleGenerateInventoryAction'>', attributes='{}', params='{'ansible_ssh_user': 'tripleo-admin', 'ansible_python_interpreter': None, 'work_dir': '/var/lib/mistral/overcloud', 'plan_name': 'overcloud', 'ssh_network': 'ctlplane', 'undercloud_key_file': '/var/lib/mistral/.ssh/tripleo-admin-rsa'}']" } [1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.0/html-single/director_installation_and_usage/index#configuring-a-basic-overcloud-with-pre-provisioned-nodes [2] https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deployed_server.html#deployment-command BR, Pradipta
The problems outlined here have been addressed outside of the BugZilla. To recap everything so far: 1. The Undercloud does not provide DHCP / IP addressing to pre-provisioned servers. It is expected that those nodes are managed by an external source. 2. NIC configurations can be modified using a custom os-net-config configuration via a "net-config-*.yaml" template. 3. We were able to get the deployment to work with and without network isolation. All the issues have come down to a misunderstanding of the expectations of pre-provisioned nodes and issues with incorrect custom THT settings.