The default timeout value is defined in the workbook. https://opendev.org/openstack/tripleo-common/src/branch/stable/train/workbooks/deployment.yaml#L124 This is what gets passed into the deployment workflow. https://opendev.org/openstack/tripleo-common/src/branch/stable/train/workbooks/deployment.yaml#L211 However if you specify --config-download-timeout, this should be used as the value for the deployment timeout. https://opendev.org/openstack/python-tripleoclient/src/branch/stable/train/tripleoclient/v1/overcloud_deploy.py#L1076 If you don't specify --config-download-timeout, the remainder of the time specified from the --timeout value should be used. https://opendev.org/openstack/python-tripleoclient/src/branch/stable/train/tripleoclient/v1/overcloud_deploy.py#L1098 The config_download_timeout is specified: https://opendev.org/openstack/python-tripleoclient/src/branch/stable/train/tripleoclient/workflows/deployment.py#L355 The default is 14400 (240mins): https://opendev.org/openstack/tripleo-common/src/branch/stable/train/workbooks/deployment.yaml#L376 This should be used when invoking ansible-playbook: https://opendev.org/openstack/tripleo-common/src/branch/stable/train/workbooks/deployment.yaml#L526 In theory --timeout 480 and --config-download-timeout 28800 should extend the overall deployment and config download timeouts.
sorry --config-download-timeout 480 should be enough because we do the necessary math timeout = parsed_args.config_download_timeout * 60
I tracked down where the timeout issue is actually happening. So while the timeouts are configurable, the overall deployment process is still at the mercy of the keystone auth token timeout. The deployment workflow in mistral is running and it continually posts the output to zaqar so that the client can follow along. The problem comes when mistral fails to post the message to zaqar (failed: Error response from Zaqar. Code: 401) so the client quits and errors while the ansible execution may still be running. environments/undercloud.yaml: TokenExpiration: 14400 So the default timeout is 240minutes. The TokenExpiration needs to be larger than the longest deployment time. Providing an update via an environment file in undercloud.conf and re-running the undercloud installation should increase this.
Set qe_test_coverage to - because large deployments aren't appropriate for automation in CI. Also, it looks like the fix may be to change existing configuration parameters.
latest comment in case from customer: So the latest advice from engineering we got was to run /var/lib/mistral/overcloud/ansible-playbook-command.sh directly and reduce the number of forks to 25 from 480. This method somehow works however it is extremely slow. We're hitting playbook errors as we go which we clear along the way. However last error happened after more than 16hrs of the playbook run, which means that we have to rerun and wait at least another 16hrs for playbook to finish. Sunday 07 February 2021 03:21:57 +0000 (0:00:08.434) 16:55:46.307 ****** =============================================================================== tripleo-hosts-entries : Render out the hosts entries ----------------------------------------------------------------------------------------------------------------------------------------------------- 504.35s Render all_nodes data as group_vars for overcloud -------------------------------------------------------------------------------------------------------------------------------------------------------- 465.34s redhat-subscription : Manage Red Hat subscription -------------------------------------------------------------------------------------------------------------------------------------------------------- 341.54s redhat-subscription : Configure repository subscriptions ------------------------------------------------------------------------------------------------------------------------------------------------- 250.51s redhat-subscription : Manage Red Hat subscription -------------------------------------------------------------------------------------------------------------------------------------------------------- 162.71s redhat-subscription : Manage Red Hat subscription -------------------------------------------------------------------------------------------------------------------------------------------------------- 133.45s redhat-subscription : Manage Red Hat subscription -------------------------------------------------------------------------------------------------------------------------------------------------------- 132.44s tripleo-hieradata : Render hieradata from template ------------------------------------------------------------------------------------------------------------------------------------------------------- 123.87s include_tasks -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 120.23s Ensure ansible_managed hieradata file exists ------------------------------------------------------------------------------------------------------------------------------------------------------------- 118.00s Hieradata from vars -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 117.82s include_role : tripleo-ssh-known-hosts ------------------------------------------------------------------------------------------------------------------------------------------------------------------- 116.98s Hiera config --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 116.80s redhat-subscription : Configure repository subscriptions ------------------------------------------------------------------------------------------------------------------------------------------------- 115.70s Configure Hosts Entries ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 114.73s include_role : tripleo-bootstrap ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 113.16s redhat-subscription : Configure repository subscriptions ------------------------------------------------------------------------------------------------------------------------------------------------- 103.10s include_role : tuned ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 101.40s redhat-subscription : Configure repository subscriptions -------------------------------------------------------------------------------------------------------------------------------------------------- 98.88s Install, Configure and Run Chrony ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 98.54s I'm going to include /var/lib/mistral/overcloud directory which apart from the other stuff contains ansible.log + ansible.cfg.
The upstream patch is to allow a user to specify the keystone token expiration as part of the undercloud.conf since it is something that needs to be configured easily for scale. Efforts to address ansible issues as part of the OSP deployment processes are being tracked as part of Bug 1911891
We'll be using this bug to track the ability to configure the keystone life time via undercloud.conf. The additional issues with execution time will be tracked via Bug 1911891
Used a 1cont_1comp_3ceph topology: With auth_token_lifetime = 500 in undercloud.conf overcloud deploy times out With auth_token_lifetime = 1000 in undercloud.conf overcloud deploy is successful That means auth_token_lifetime may be specified in undercloud.conf and the specified value used.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.6 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2097