Bug 1763672 - Timeout for heat deployment 'create_admin' when integrating external ceph
Summary: Timeout for heat deployment 'create_admin' when integrating external ceph
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Rabi Mishra
QA Contact: David Rosenfeld
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-21 10:23 UTC by Chen
Modified: 2024-03-25 15:28 UTC (History)
11 users (show)

Fixed In Version: openstack-tripleo-common-8.7.1-21.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-28 18:23:41 UTC
Target Upstream Version:
Embargoed:
tshefi: automate_bug-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 723313 0 None MERGED [stable-only] Pass timeout in mins to create_temp_url() 2021-01-30 12:04:22 UTC
Red Hat Issue Tracker OSP-28282 0 None None None 2023-09-07 20:53:39 UTC
Red Hat Product Errata RHBA-2020:4388 0 None None None 2020-10-28 18:23:57 UTC

Description Chen 2019-10-21 10:23:40 UTC
Description of problem:

 Timeout for heat deployment 'create_admin' when integrating external ceph

On one of the controllers

Oct 17 03:27:10 controller02 os-collect-config: PLAY [localhost] ***************************************************************
Oct 17 03:27:10 controller02 os-collect-config: TASK [Gathering Facts] *********************************************************
Oct 17 03:27:10 controller02 os-collect-config: ok: [localhost]
Oct 17 03:27:10 controller02 os-collect-config: TASK [create user tripleo-admin] ***********************************************
Oct 17 03:27:10 controller02 os-collect-config: changed: [localhost]
Oct 17 03:27:10 controller02 os-collect-config: TASK [grant admin rights to user tripleo-admin] ********************************
Oct 17 03:27:10 controller02 os-collect-config: changed: [localhost]
Oct 17 03:27:10 controller02 os-collect-config: TASK [ensure .ssh dir exists for user tripleo-admin] ***************************
Oct 17 03:27:10 controller02 os-collect-config: changed: [localhost]
Oct 17 03:27:10 controller02 os-collect-config: TASK [ensure authorized_keys file exists for user tripleo-admin] ***************
Oct 17 03:27:10 controller02 os-collect-config: changed: [localhost]
Oct 17 03:27:10 controller02 os-collect-config: TASK [authorize TripleO Mistral key for user tripleo-admin] ********************
Oct 17 03:27:10 controller02 os-collect-config: changed: [localhost]
Oct 17 03:27:10 controller02 os-collect-config: PLAY RECAP *********************************************************************
Oct 17 03:27:10 controller02 os-collect-config: localhost                  : ok=6    changed=5    unreachable=0    failed=0

The tripleo-admin user has been successfully created and the signal has been sent to the undercloud successfully.

Oct 17 03:27:11 controller02 os-collect-config: [2019-10-17 03:27:11,126] (heat-config) [DEBUG] [2019-10-17 03:27:11,070] (heat-config-notify) [DEBUG] Signaling to http://172.100.65.1:8080/v1/AUTH_bd0f9000bbcd4961b5841ad73a5c1b85/create_admin-0f9a524d-6336-4c09-ba98-5c0e690797a4/234345de-2edf-4bd3-9d95-cc33a8d2a209?temp_url_sig=9e1abd850d85e84475adecdca292f85643e3f00f&temp_url_expires=1571300790 via PUT

No SSL errors found in the controllers but we do find errors in the subject in mistral/engine.log. 

Version-Release number of selected component (if applicable):

OSP13

How reproducible:

100% in customer's site

Steps to Reproduce:
1. Re-run the deploy command
2.
3.

Actual results:


Expected results:


Additional info:

Comment 4 John Fulton 2019-10-30 13:21:27 UTC
Because this bug is no longer occurring, I'm going to close it. If the issue recurs, then feel free to re-open.

Comment 5 Tzach Shefi 2019-11-17 13:44:43 UTC
NOTABUG nothing to automate/test per close loop.

Comment 9 John Fulton 2020-04-04 02:28:47 UTC
We know the following:

- uc ran access workbook [1]
- uc asked compute to create access workbook and paused to wait for confirmation it was done
- compute created tripleo-admin user
- compute sent HTTP PUT to swift on undercloud and received 201 from undercloud
- uc swift logs show 201 received
???
- uc access workbook timed out while waiting for confirmation that user was created
- after the workbook time out, overcloud deployment stopped itself as it was unable to continue

We don't know what happened at '???' but need to know that in order to find the bug.

[1] https://github.com/openstack/tripleo-common/blob/stable/queens/workbooks/access.yaml

Comment 21 Rabi Mishra 2020-04-14 17:35:08 UTC
Ah! You're missing https://code.engineering.redhat.com/gerrit/#/c/195663/ which is in openstack-tripleo-common-8.7.1-17.el7ost. I don't think it has made it to a zstream yet. May be you can use a hotfix.


As you can see the deployment is started at 17:16:43

19:16:43,698] (heat-config) [DEBUG] Running
/usr/libexec/heat-config/hooks/ansible <
/var/lib/heat-config/deployed/d1c4f248-fb74-416d-8d10-c3c620a892d2.json


However, ansible takes 11 mins (possibly timeouts) when gathering facts and hence the delay.

Apr  8 19:16:44 overcloud-compute-0 ansible-setup: Invoked with filter=*
gather_subset=['all'] fact_path=/etc/ansible/facts.d gather_timeout=10


Apr  8 19:27:48 overcloud-compute-0 ansible-user: Invoked with comment=None
ssh_key_bits=0 update_password=always non_unique=False force=False
ssh_key_type=rsa create_home=True password_lock=None
ssh_key_passphrase=NOT_LOGGING_PARAMETER uid=None home=None append=False
skeleton=None ssh_key_comment=ansible-generated on overcloud-compute-0
group=None system=False state=present hidden=None local=None shell=None
expires=None ssh_key_file=None groups=None move_home=False
password=NOT_LOGGING_PARAMETER name=tripleo-admin seuser=None remove=False
login_class=None generate_ssh_key=None

Comment 42 errata-xmlrpc 2020-10-28 18:23:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 13.0 director bug fix advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4388

Comment 43 Carlos Camacho 2020-12-14 14:19:24 UTC
*** Bug 1904588 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.