Bug 1828191 - OPS16.1 – Overcloud deployment fails on:”ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud [  admin] Could not import keys to one of”
Summary: OPS16.1 – Overcloud deployment fails on:”ERROR tripleoclient.v1.overcloud_dep...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-tripleoclient
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: beta
: ---
Assignee: Rabi Mishra
QA Contact: Sasha Smolyak
URL:
Whiteboard:
: 1828381 1829186 (view as bug list)
Depends On:
Blocks: 1666684 1829186
TreeView+ depends on / blocked
 
Reported: 2020-04-27 09:35 UTC by Arkady Shtempler
Modified: 2020-06-03 12:34 UTC (History)
10 users (show)

Fixed In Version: python-tripleoclient-12.3.2-0.20200430023422.8557bf9.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-06-03 12:34:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
OC deployment logs (587.42 KB, application/zip)
2020-04-27 09:35 UTC, Arkady Shtempler
no flags Details


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 723343 0 None MERGED [stable-only] Raise error for temp_ssh_key import failure 2021-02-10 10:47:06 UTC
OpenStack gerrit 723824 0 None MERGED [stable-only] Add retry for inserting temp_ssh_key 2021-02-10 10:47:06 UTC

Description Arkady Shtempler 2020-04-27 09:35:47 UTC
Created attachment 1682089 [details]
OC deployment logs

### Scenario ###
Deploy Overcloud using proper overcloud_deploy.sh scrip

### Actual Result ###
Overcloud Deployment fails + ERRORs in logs:


Errors logged:
------------------------------ LogPath: /home/stack/overcloud_deployme
2082-2020-04-26 09:57:26.476 118777 DEBUG tripleoclient.plugin [  admin] {'ttl': 3600, 'body': {'type': 'tripleo.deployment.v1.set_
2083-2020-04-26 09:57:27.321 118777 INFO tripleoclient.v1.overcloud_
2084:2020-04-26 09:58:43.335 118777 ERROR tripleoclient.v1.overcloud_Couldn't not import keys to one of ['192.168.24....<--LogTool-
2085-
2086-2020-04-26 09:58:43.335 118777 INFO tripleoclient.v1.overcloud_
2087-2020-04-26 09:58:43.336 118777 DEBUG keystoneauth.session [  admin] REQ: curl -g -i -X GET https://192.168.24.2: -H "User-Agen...<--LogTool-LINE IS TOO LONG!
2088-2020-04-26 09:58:43.337 118777 DEBUG urllib3.connectionpool [  admin] Resetting dropped connection: 192.168.24.2
2089-2020-04-26 09:58:43.511 118777 DEBUG urllib3.connectionpool [  admin] https://192.168.24.2: "GET /v2/executions HTTP/1.1" 200 208780
2090-2020-04-26 09:58:43.513 118777 DEBUG keystoneauth.session [  admin] RESP: [200] Content-Length: 208780 Content-Type: application/json Date: Sun, ...<--LogTool-LINE IS TOO LONG!
2091-2020-04-26 09:58:43.519 118777 DEBUG keystoneauth.session [  admin] RESP BODY: {"executions": [{"id": "75af3d66-98cd-4433-9fb4-
LogTool --> POTENTIAL BLOCK'S ISSUES:
2084:2020-04-26 09:58:43.335 118777 ERROR tripleoclient.v1.overcloud_
exists\"}, \"on-success\": \"send_message\", \"on-error\": \"create_container\", \"type\": \"direct\", \"name\": \"verify_container_doesnt_
 \"<% task().result.where($.finished = true and $.error != null).select($.uuid) + task().result.where($.finished = false).select($.uuid) %>\"}, \"publish-on-error\": {\"intros...
iled introspection. Attempt {1} of {2} '.format($.failed_
------------------------------ LogPath: /var/lib/mistral/overcloud/
321-2020-04-26 10:01:19,660 p=731 u=mistral |  <192.168.24.20> SSH: EXEC ssh -vvv -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o Contr...<--LogTool-LINE IS TOO LONG!
322-2020-04-26 10:01:19,670 p=731 u=mistral |  <192.168.24.7> (255, b'', b'OpenSSH_8.0p1, OpenSSL 1.1.1c FIPS  28 May 2019\r\ndebug1: Reading configur...<--LogTool-LINE IS TOO LONG!
323:2020-04-26 10:01:19,673 p=731 u=mistral |  [WARNING]: Unhandled error in Python interpreter discovery for host overcloud-
324-cephstorage-0: Failed to connect to the host via ssh: OpenSSH_8.0p1, OpenSSL
325-1.1.1c FIPS  28 May 2019  debug1: Reading configuration data
326-/etc/ssh/ssh_config  debug3: /etc/ssh/ssh_config line 51: Including file
327-/etc/ssh/ssh_config.d/05-
328-data /etc/ssh/ssh_config.d/05-
329-all' host 192.168.24.7 originally 192.168.24.7  debug3:
330-/etc/ssh/ssh_config.d/05-

------------------------------ LogPath: /var/lib/mistral/overcloud/
1090-2020-04-26 10:03:16,437 p=731 u=mistral |  <192.168.24.7> SSH: EXEC ssh -vvv -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o Contr...<--LogTool-LINE IS TOO LONG!
1091-2020-04-26 10:03:16,496 p=731 u=mistral |  <192.168.24.20> SSH: EXEC ssh -vvv -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o Cont...<--LogTool-LINE IS TOO LONG!
1092:2020-04-26 10:03:16,510 p=731 u=mistral |  fatal: [overcloud-novacompute-0]: UNREACHABLE! => {
1093-    "changed": false,
1094-    "msg": "Data could not be sent to remote host \"192.168.24.6\". Make sure this host can be reached over ssh: OpenSSH_8.0p1, OpenSSL 1.1.1c FI...<--LogTool-LINE IS TOO LONG!
1095-    "unreachable": true
1096-}
1097:2020-04-26 10:03:16,561 p=731 u=mistral |  fatal: [overcloud-cephstorage-0]: UNREACHABLE! => {
1098-    "changed": false,
1099-    "msg": "Data could not be sent to remote host \"192.168.24.7\". Make sure this host can be reached over ssh: OpenSSH_8.0p1, OpenSSL 1.1.1c FI...<--LogTool-LINE IS TOO LONG!
...
---< BLOCK IS TOO LONG >---
...
1115-    "changed": false,
1116-    "msg": "Data could not be sent to remote host \"192.168.24.24\". Make sure this host can be reached over ssh: OpenSSH_8.0p1, OpenSSL 1.1.1c F...<--LogTool-LINE IS TOO LONG!
1117-    "unreachable": true
1118-}
1119-2020-04-26 10:03:16,714 p=731 u=mistral |  NO MORE HOSTS LEFT ******************************
1120-2020-04-26 10:03:16,715 p=731 u=mistral |  PLAY RECAP ******************************
1121-2020-04-26 10:03:16,715 p=731 u=mistral |  overcloud-cephstorage-0    : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0 ...<--LogTool-LINE IS TOO LONG!
LogTool --> POTENTIAL BLOCK'S ISSUES:
rage-0    : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0

You can find log files attached.


### Expected Results ###
Overcloud deployment PASS

### Note ###
From: Rabi Mishra findings:
I  had a look at Akardy's environment and redeployment after the failure worked.

1.  It seems  for some reason the temp keys could not be pushed to nodes using heat-admin after the wait_for_ssh_port() succeeds (i.e immediately after  OpenSSH server daemon started on the nodes[1]).

Though I'm not sure on why we get 255 after the port is open, adding some sleep or retrying in case of 255 return code would possibly be a workaround.

2. The other issue is  that the code ignores the above error[2] and skips the  'admin enablement' workflow which would  fail in a later stage when using tripleo-admin user and should be fixed.
 
[1]
Apr 26 05:58:43 overcloud-controller-0 systemd[1]: Starting OpenSSH server daemon...
Apr 26 05:58:43 overcloud-controller-0 systemd[1]: Started OpenSSH server daemon.


[2] https://github.com/openstack/python-tripleoclient/blob/stable/train/tripleoclient/workflows/deployment.py#L202-L203

Comment 1 Cédric Jeanneret 2020-04-27 11:15:06 UTC
"why we get 255 after the port is open, adding some sleep or retrying in case of 255 return code would possibly be a workaround"
this is a "feature": pam/ssh refuses non-root login during the boot-up, even if the sshd service is up'n'running. One must wait for the boot completion before connecting as non-root, such as heat-admin.

We probably want to request a blocker on that one....

Comment 2 Eduardo Olivares 2020-04-27 15:54:52 UTC
*** Bug 1828381 has been marked as a duplicate of this bug. ***

Comment 3 Cédric Jeanneret 2020-04-30 08:59:09 UTC
*** Bug 1829186 has been marked as a duplicate of this bug. ***

Comment 4 Emilien Macchi 2020-05-06 18:38:23 UTC
*** Bug 1829186 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.