Description of problem: Timeout for heat deployment 'create_admin' when integrating external ceph On one of the controllers Oct 17 03:27:10 controller02 os-collect-config: PLAY [localhost] *************************************************************** Oct 17 03:27:10 controller02 os-collect-config: TASK [Gathering Facts] ********************************************************* Oct 17 03:27:10 controller02 os-collect-config: ok: [localhost] Oct 17 03:27:10 controller02 os-collect-config: TASK [create user tripleo-admin] *********************************************** Oct 17 03:27:10 controller02 os-collect-config: changed: [localhost] Oct 17 03:27:10 controller02 os-collect-config: TASK [grant admin rights to user tripleo-admin] ******************************** Oct 17 03:27:10 controller02 os-collect-config: changed: [localhost] Oct 17 03:27:10 controller02 os-collect-config: TASK [ensure .ssh dir exists for user tripleo-admin] *************************** Oct 17 03:27:10 controller02 os-collect-config: changed: [localhost] Oct 17 03:27:10 controller02 os-collect-config: TASK [ensure authorized_keys file exists for user tripleo-admin] *************** Oct 17 03:27:10 controller02 os-collect-config: changed: [localhost] Oct 17 03:27:10 controller02 os-collect-config: TASK [authorize TripleO Mistral key for user tripleo-admin] ******************** Oct 17 03:27:10 controller02 os-collect-config: changed: [localhost] Oct 17 03:27:10 controller02 os-collect-config: PLAY RECAP ********************************************************************* Oct 17 03:27:10 controller02 os-collect-config: localhost : ok=6 changed=5 unreachable=0 failed=0 The tripleo-admin user has been successfully created and the signal has been sent to the undercloud successfully. Oct 17 03:27:11 controller02 os-collect-config: [2019-10-17 03:27:11,126] (heat-config) [DEBUG] [2019-10-17 03:27:11,070] (heat-config-notify) [DEBUG] Signaling to http://172.100.65.1:8080/v1/AUTH_bd0f9000bbcd4961b5841ad73a5c1b85/create_admin-0f9a524d-6336-4c09-ba98-5c0e690797a4/234345de-2edf-4bd3-9d95-cc33a8d2a209?temp_url_sig=9e1abd850d85e84475adecdca292f85643e3f00f&temp_url_expires=1571300790 via PUT No SSL errors found in the controllers but we do find errors in the subject in mistral/engine.log. Version-Release number of selected component (if applicable): OSP13 How reproducible: 100% in customer's site Steps to Reproduce: 1. Re-run the deploy command 2. 3. Actual results: Expected results: Additional info:
Because this bug is no longer occurring, I'm going to close it. If the issue recurs, then feel free to re-open.
NOTABUG nothing to automate/test per close loop.
We know the following: - uc ran access workbook [1] - uc asked compute to create access workbook and paused to wait for confirmation it was done - compute created tripleo-admin user - compute sent HTTP PUT to swift on undercloud and received 201 from undercloud - uc swift logs show 201 received ??? - uc access workbook timed out while waiting for confirmation that user was created - after the workbook time out, overcloud deployment stopped itself as it was unable to continue We don't know what happened at '???' but need to know that in order to find the bug. [1] https://github.com/openstack/tripleo-common/blob/stable/queens/workbooks/access.yaml
Ah! You're missing https://code.engineering.redhat.com/gerrit/#/c/195663/ which is in openstack-tripleo-common-8.7.1-17.el7ost. I don't think it has made it to a zstream yet. May be you can use a hotfix. As you can see the deployment is started at 17:16:43 19:16:43,698] (heat-config) [DEBUG] Running /usr/libexec/heat-config/hooks/ansible < /var/lib/heat-config/deployed/d1c4f248-fb74-416d-8d10-c3c620a892d2.json However, ansible takes 11 mins (possibly timeouts) when gathering facts and hence the delay. Apr 8 19:16:44 overcloud-compute-0 ansible-setup: Invoked with filter=* gather_subset=['all'] fact_path=/etc/ansible/facts.d gather_timeout=10 Apr 8 19:27:48 overcloud-compute-0 ansible-user: Invoked with comment=None ssh_key_bits=0 update_password=always non_unique=False force=False ssh_key_type=rsa create_home=True password_lock=None ssh_key_passphrase=NOT_LOGGING_PARAMETER uid=None home=None append=False skeleton=None ssh_key_comment=ansible-generated on overcloud-compute-0 group=None system=False state=present hidden=None local=None shell=None expires=None ssh_key_file=None groups=None move_home=False password=NOT_LOGGING_PARAMETER name=tripleo-admin seuser=None remove=False login_class=None generate_ssh_key=None
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 13.0 director bug fix advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4388
*** Bug 1904588 has been marked as a duplicate of this bug. ***