Bug 1825315 - [OSP16] Overcloud stack deployment with Pre-provised nodes not triggered mistral config-download
Summary: [OSP16] Overcloud stack deployment with Pre-provised nodes not triggered mist...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 16.0 (Train)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: ---
Assignee: Luke Short
QA Contact: David Rosenfeld
URL:
Whiteboard:
Depends On: 1827695
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-17 16:37 UTC by Pradipta Kumar Sahoo
Modified: 2020-05-04 17:35 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-04 17:35:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Pradipta Kumar Sahoo 2020-04-17 16:37:06 UTC
Description of problem:
Overcloud deploy command with pre-provisioned resources not triggered config-download after heat stack update.

Version-Release number of selected component (if applicable):
Red Hat OpenStack Platform release 16.0.1 (Train)
python3-tripleo-common-11.3.3-0.20200403044648.56c0fd5.el8ost.noarch
openstack-tripleo-common-11.3.3-0.20200403044648.56c0fd5.el8ost.noarch
ansible-role-tripleo-modify-image-1.1.1-0.20200302230738.bb6f78d.el8ost.noarch
openstack-tripleo-validations-11.3.2-0.20200318124452.3fd14c9.el8ost.noarch
puppet-tripleo-11.4.1-0.20200402130301.b4678ba.el8ost.noarch
tripleo-ansible-0.4.2-0.20200404124614.67005aa.el8ost.noarch
python3-tripleoclient-heat-installer-12.3.2-0.20200405044622.fdce01f.el8ost.noarch
ansible-tripleo-ipsec-9.2.1-0.20200302220300.0c8693c.el8ost.noarch
openstack-tripleo-image-elements-10.6.2-0.20200314025720.8c91b46.el8ost.noarch
openstack-tripleo-puppet-elements-11.2.2-0.20200302235857.a6fef08.el8ost.noarch
python3-tripleoclient-12.3.2-0.20200405044622.fdce01f.el8ost.noarch
openstack-tripleo-common-containers-11.3.3-0.20200403044648.56c0fd5.el8ost.noarch
openstack-tripleo-heat-templates-11.3.2-0.20200405044622.ec9970c.el8ost.noarch

How reproducible: 100% reproduced in the scale lab environment.


Steps to Reproduce:
Reference guide: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.0/html-single/director_installation_and_usage/index#configuring-a-basic-overcloud-with-pre-provisioned-nodes

1. Successfully executed section 9.1, 9.2, 9.3, 9.4,
	https://gist.github.com/pradiptapks/59cf762918dd08e61e64f41c7104ddc1

2. Updated THT resource in template: section 9.5 and 9.7
	https://gist.githubusercontent.com/pradiptapks/28dfd6b196af519509020f62f7531464/raw/afc5c98cda03b4866a29635e0101d2e6dda54693/ctlplane-assignments.yaml
	https://gist.githubusercontent.com/pradiptapks/28dfd6b196af519509020f62f7531464/raw/afc5c98cda03b4866a29635e0101d2e6dda54693/hostname-map.yaml
	https://gist.githubusercontent.com/pradiptapks/28dfd6b196af519509020f62f7531464/raw/afc5c98cda03b4866a29635e0101d2e6dda54693/swift-data-map.yaml
3. Skip 9.8 "Ceph Storage for pre-provisioned nodes" as we didn't include Ceph nodes.
4. Successfully deploy the heat stack with below script.
	https://gist.githubusercontent.com/pradiptapks/28dfd6b196af519509020f62f7531464/raw/afc5c98cda03b4866a29635e0101d2e6dda54693/deploy.sh
	~~~
	time openstack overcloud deploy \
	--timeout 240 \
	--templates /usr/share/openstack-tripleo-heat-templates \
	--stack overcloud \
	--libvirt-type kvm \
	--ntp-server clock1.rdu2.redhat.com \
	--disable-validations \
	-e /usr/share/openstack-tripleo-heat-templates/environments/deployed-server-environment.yaml \
	-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
	-e /home/stack/virt/ctlplane-assignments.yaml \
	-e /home/stack/virt/hostname-map.yaml \
	-e /home/stack/virt/swift-data-map.yaml \
	-e /home/stack/virt/debug.yaml \
	-e /home/stack/containers-prepare-parameter.yaml \
	-e /home/stack/virt/docker-images.yaml \
	--overcloud-ssh-key ~/.ssh/id_rsa --log-file overcloud_install.log &> /home/stack/overcloud_install.log
	~~~

5. 

$ openstack stack list
+--------------------------------------+------------+----------------------------------+-----------------+----------------------+--------------+
| ID                                   | Stack Name | Project                          | Stack Status    | Creation Time        | Updated Time |
+--------------------------------------+------------+----------------------------------+-----------------+----------------------+--------------+
| ded41a83-f3c0-484d-be80-aad4fefd2d85 | overcloud  | ab9a2a109d4842c395fe0569df6fc6dc | CREATE_COMPLETE | 2020-04-17T13:17:43Z | None         |
+--------------------------------------+------------+----------------------------------+-----------------+----------------------+--------------+

$ openstack overcloud status
+-----------+-------------------+
| Plan Name | Deployment Status |
+-----------+-------------------+
| overcloud |   DEPLOY_FAILED   |
+-----------+-------------------+

6. It seems after the heat stack update, the deploy command ended without triggering config-download.
	~~~
	2020-04-17 13:22:31Z [overcloud.AllNodesDeploySteps.ControllerPostConfig]: CREATE_COMPLETE  state changed
	2020-04-17 13:22:31Z [overcloud.AllNodesDeploySteps]: CREATE_COMPLETE  Stack CREATE completed successfully
	2020-04-17 13:22:31Z [overcloud.AllNodesDeploySteps]: CREATE_COMPLETE  state changed
	2020-04-17 13:22:31Z [overcloud]: CREATE_COMPLETE  Stack CREATE completed successfully

	 Stack overcloud/ded41a83-f3c0-484d-be80-aad4fefd2d85 CREATE_COMPLETE 

	Deploying overcloud configuration
	~~~

7. Validation of deployed-server heat rtesource:
$ openstack stack resource list -n 5 overcloud | grep deployed-server
| deployed-server                            | 96130122-0861-4f0b-8dd3-f074300e0cfb                                                                                                                | OS::Heat::DeployedServer                
                                                                                                      | CREATE_COMPLETE | 2020-04-17T13:20:04Z | overcloud-Compute-x5ijvin4wl2f-0-qbrusz3qbzhr-NovaCompute-dt24cay4q5vl                      
                                                       |
| deployed-server                            | 40061e14-fa1c-46ba-a27a-fc7092c44535                                                                                                                | OS::Heat::DeployedServer                
                                                                                                      | CREATE_COMPLETE | 2020-04-17T13:22:08Z | overcloud-Controller-jgwapbtofkbk-0-yd5mrqcnrsqg-Controller-xyrbvwvi2tg4                    


8. Also, metadata server updated for deployed-server:

$ openstack stack resource metadata --fit-width overcloud-Controller-jgwapbtofkbk-0-yd5mrqcnrsqg-Controller-xyrbvwvi2tg4 deployed-server | jq -r '.["os-collect-config"].request.metadata_url'
https://192.168.24.2:13808/v1/AUTH_ab9a2a109d4842c395fe0569df6fc6dc/ov-ntroller-xyrbvwvi2tg4-deployed-server-nox2draamhqu/cb4d2a63-b2c1-4ba5-8580-55467c3fd1d1?temp_url_sig=c69fd50fbaa3c3f4fe63bbbb3f6af52528e99485&temp_url_expires=2147483586


$ openstack stack resource metadata --fit-width overcloud-Compute-x5ijvin4wl2f-0-qbrusz3qbzhr-NovaCompute-dt24cay4q5vl deployed-server | jq -r '.["os-collect-config"].request.metadata_url'
https://192.168.24.2:13808/v1/AUTH_ab9a2a109d4842c395fe0569df6fc6dc/ov-aCompute-dt24cay4q5vl-deployed-server-6cyimglbgajc/fed580d3-d10e-4f5f-ace6-5f22fa8008d9?temp_url_sig=6d930c472f3b2ff9300048a89a237ceb3e705d54&temp_url_expires=2147483586

9. Referring upstream doc [1], have suggested to os-collect-config which I assume not applicable in OSP16 as we moved to the config-download.

Actual results:
Overcloud deployment failed without triggering mitral config-download

Additional info:
[1] https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deployed_server.html#deployment-command

Comment 1 Pradipta Kumar Sahoo 2020-04-20 15:30:12 UTC
Hi Luke,

It seems ansible failed to generate the inventory due to no ctlplane network mapping. 
As per the upstream guide [2], I guess below env parameters should help. Since these env values not included in our downstream guide[1], so we would like to know thought on before to proceed.

~~~
export OVERCLOUD_ROLES="ControllerDeployedServer ComputeDeployedServer"
export ControllerDeployedServer_hosts="192.168.25.1 192.168.25.2 192.168.25.3"
export ComputeDeployedServer_hosts="192.168.25.4"
~~~

Reproduced steps:
----------------
1. Copied the existing deployment script and updated with the config-download parameter to forcefully trigger ansible inventory.

	~~~
	$ diff test-deploy.sh test-deploy-config-download.sh
	18c18,19
	< --overcloud-ssh-key ~/.ssh/id_rsa --log-file overcloud_install.log &> /home/stack/overcloud_install.log
	---
	> --config-download-only --config-download-timeout 300 \
	> --overcloud-ssh-key ~/.ssh/id_rsa --log-file overcloud_install_config.log &> /home/stack/overcloud_install_config.log
	~~~

2. Overcloud failed to configure Ansible inventory.
	~~~
	$ tail -f overcloud_install_config.log
	Waiting for messages on queue 'tripleo' with no timeout.
	2020-04-20 14:06:49.748 372067 WARNING tripleoclient.plugin [  admin] Waiting for messages on queue 'tripleo' with no timeout.                                                                                                               
	2020-04-20 14:07:27.419 372067 ERROR openstack [  admin] Overcloud configuration failed.
	or Controller role on ctlplane network', action_cls='<class 'mistral.actions.action_factory.AnsibleGenerateInventoryAction'>', attributes='{}', params='{'ansible_ssh_user': 'tripleo-admin', 'ansible_python_interpreter': None, 'work_dir':
	'/var/lib/mistral/overcloud', 'plan_name': 'overcloud', 'ssh_network': 'ctlplane', 'undercloud_key_file': '/var/lib/mistral/.ssh/tripleo-admin-rsa'}']                                                                                       
	~~~

	$ openstack overcloud failures --stack overcloud
	Ansible errors file not found at /var/lib/mistral/overcloud/ansible-errors.json



	$ openstack action execution list --fit-width|grep -v SUCCESS
	+--------------------------------------+--------------------------------------------+-------------------------------------------------------+--------------------+------------------------------+--------------------------------------+---------+----------+---------------------+---------------------+
	| ID                                   | Name                                       | Workflow name                                         | Workflow namespace | Task name                    | Task ID                              | State   | Accepted | Created at          | Updated at          |
	+--------------------------------------+--------------------------------------------+-------------------------------------------------------+--------------------+------------------------------+--------------------------------------+---------+----------+---------------------+---------------------+
	| b23cd350-78ed-449c-8b2a-3769d90b8d1a | tripleo.ansible-generate-inventory         | tripleo.deployment.v1.config_download_deploy          |                    | generate_inventory           | baea976e-6be3-4496-a30b-3d5133afaaaa | ERROR   | True     | 2020-04-20 14:07:25 | 2020-04-20 14:07:26 |
	+--------------------------------------+--------------------------------------------+-------------------------------------------------------+--------------------+------------------------------+--------------------------------------+---------+----------+---------------------+---------------------+

	$ openstack action execution show b23cd350-78ed-449c-8b2a-3769d90b8d1a --fit-width
	+--------------------+----------------------------------------------+
	| Field              | Value                                        |
	+--------------------+----------------------------------------------+
	| ID                 | b23cd350-78ed-449c-8b2a-3769d90b8d1a         |
	| Name               | tripleo.ansible-generate-inventory           |
	| Workflow name      | tripleo.deployment.v1.config_download_deploy |
	| Workflow namespace |                                              |
	| Task name          | generate_inventory                           |
	| Task ID            | baea976e-6be3-4496-a30b-3d5133afaaaa         |
	| State              | ERROR                                        |
	| State info         | None                                         |
	| Accepted           | True                                         |
	| Created at         | 2020-04-20 14:07:25                          |
	| Updated at         | 2020-04-20 14:07:26                          |
	+--------------------+----------------------------------------------+

	$ openstack action execution output show b23cd350-78ed-449c-8b2a-3769d90b8d1a 
	{
	    "result": "The action raised an exception [action_ex_id=b23cd350-78ed-449c-8b2a-3769d90b8d1a, msg='No IPs found for Controller role on ctlplane network', action_cls='<class 'mistral.actions.action_factory.AnsibleGenerateInventoryAction'>', attributes='{}', params='{'ansible_ssh_user': 'tripleo-admin', 'ansible_python_interpreter': None, 'work_dir': '/var/lib/mistral/overcloud', 'plan_name': 'overcloud', 'ssh_network': 'ctlplane', 'undercloud_key_file': '/var/lib/mistral/.ssh/tripleo-admin-rsa'}']"
	}


[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.0/html-single/director_installation_and_usage/index#configuring-a-basic-overcloud-with-pre-provisioned-nodes
[2] https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deployed_server.html#deployment-command

BR,
Pradipta

Comment 3 Luke Short 2020-05-04 17:35:54 UTC
The problems outlined here have been addressed outside of the BugZilla. To recap everything so far:

1. The Undercloud does not provide DHCP / IP addressing to pre-provisioned servers. It is expected that those nodes are managed by an external source.
2. NIC configurations can be modified using a custom os-net-config configuration via a "net-config-*.yaml" template.
3. We were able to get the deployment to work with and without network isolation.

All the issues have come down to a misunderstanding of the expectations of pre-provisioned nodes and issues with incorrect custom THT settings.


Note You need to log in before you can comment on or make changes to this bug.