DescriptionArkady Shtempler
2020-04-27 09:35:47 UTC
Created attachment 1682089[details]
OC deployment logs
### Scenario ###
Deploy Overcloud using proper overcloud_deploy.sh scrip
### Actual Result ###
Overcloud Deployment fails + ERRORs in logs:
Errors logged:
------------------------------ LogPath: /home/stack/overcloud_deployme
2082-2020-04-26 09:57:26.476 118777 DEBUG tripleoclient.plugin [ admin] {'ttl': 3600, 'body': {'type': 'tripleo.deployment.v1.set_
2083-2020-04-26 09:57:27.321 118777 INFO tripleoclient.v1.overcloud_
2084:2020-04-26 09:58:43.335 118777 ERROR tripleoclient.v1.overcloud_Couldn't not import keys to one of ['192.168.24....<--LogTool-
2085-
2086-2020-04-26 09:58:43.335 118777 INFO tripleoclient.v1.overcloud_
2087-2020-04-26 09:58:43.336 118777 DEBUG keystoneauth.session [ admin] REQ: curl -g -i -X GET https://192.168.24.2: -H "User-Agen...<--LogTool-LINE IS TOO LONG!
2088-2020-04-26 09:58:43.337 118777 DEBUG urllib3.connectionpool [ admin] Resetting dropped connection: 192.168.24.2
2089-2020-04-26 09:58:43.511 118777 DEBUG urllib3.connectionpool [ admin] https://192.168.24.2: "GET /v2/executions HTTP/1.1" 200 208780
2090-2020-04-26 09:58:43.513 118777 DEBUG keystoneauth.session [ admin] RESP: [200] Content-Length: 208780 Content-Type: application/json Date: Sun, ...<--LogTool-LINE IS TOO LONG!
2091-2020-04-26 09:58:43.519 118777 DEBUG keystoneauth.session [ admin] RESP BODY: {"executions": [{"id": "75af3d66-98cd-4433-9fb4-
LogTool --> POTENTIAL BLOCK'S ISSUES:
2084:2020-04-26 09:58:43.335 118777 ERROR tripleoclient.v1.overcloud_
exists\"}, \"on-success\": \"send_message\", \"on-error\": \"create_container\", \"type\": \"direct\", \"name\": \"verify_container_doesnt_
\"<% task().result.where($.finished = true and $.error != null).select($.uuid) + task().result.where($.finished = false).select($.uuid) %>\"}, \"publish-on-error\": {\"intros...
iled introspection. Attempt {1} of {2} '.format($.failed_
------------------------------ LogPath: /var/lib/mistral/overcloud/
321-2020-04-26 10:01:19,660 p=731 u=mistral | <192.168.24.20> SSH: EXEC ssh -vvv -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o Contr...<--LogTool-LINE IS TOO LONG!
322-2020-04-26 10:01:19,670 p=731 u=mistral | <192.168.24.7> (255, b'', b'OpenSSH_8.0p1, OpenSSL 1.1.1c FIPS 28 May 2019\r\ndebug1: Reading configur...<--LogTool-LINE IS TOO LONG!
323:2020-04-26 10:01:19,673 p=731 u=mistral | [WARNING]: Unhandled error in Python interpreter discovery for host overcloud-
324-cephstorage-0: Failed to connect to the host via ssh: OpenSSH_8.0p1, OpenSSL
325-1.1.1c FIPS 28 May 2019 debug1: Reading configuration data
326-/etc/ssh/ssh_config debug3: /etc/ssh/ssh_config line 51: Including file
327-/etc/ssh/ssh_config.d/05-
328-data /etc/ssh/ssh_config.d/05-
329-all' host 192.168.24.7 originally 192.168.24.7 debug3:
330-/etc/ssh/ssh_config.d/05-
------------------------------ LogPath: /var/lib/mistral/overcloud/
1090-2020-04-26 10:03:16,437 p=731 u=mistral | <192.168.24.7> SSH: EXEC ssh -vvv -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o Contr...<--LogTool-LINE IS TOO LONG!
1091-2020-04-26 10:03:16,496 p=731 u=mistral | <192.168.24.20> SSH: EXEC ssh -vvv -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o Cont...<--LogTool-LINE IS TOO LONG!
1092:2020-04-26 10:03:16,510 p=731 u=mistral | fatal: [overcloud-novacompute-0]: UNREACHABLE! => {
1093- "changed": false,
1094- "msg": "Data could not be sent to remote host \"192.168.24.6\". Make sure this host can be reached over ssh: OpenSSH_8.0p1, OpenSSL 1.1.1c FI...<--LogTool-LINE IS TOO LONG!
1095- "unreachable": true
1096-}
1097:2020-04-26 10:03:16,561 p=731 u=mistral | fatal: [overcloud-cephstorage-0]: UNREACHABLE! => {
1098- "changed": false,
1099- "msg": "Data could not be sent to remote host \"192.168.24.7\". Make sure this host can be reached over ssh: OpenSSH_8.0p1, OpenSSL 1.1.1c FI...<--LogTool-LINE IS TOO LONG!
...
---< BLOCK IS TOO LONG >---
...
1115- "changed": false,
1116- "msg": "Data could not be sent to remote host \"192.168.24.24\". Make sure this host can be reached over ssh: OpenSSH_8.0p1, OpenSSL 1.1.1c F...<--LogTool-LINE IS TOO LONG!
1117- "unreachable": true
1118-}
1119-2020-04-26 10:03:16,714 p=731 u=mistral | NO MORE HOSTS LEFT ******************************
1120-2020-04-26 10:03:16,715 p=731 u=mistral | PLAY RECAP ******************************
1121-2020-04-26 10:03:16,715 p=731 u=mistral | overcloud-cephstorage-0 : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ...<--LogTool-LINE IS TOO LONG!
LogTool --> POTENTIAL BLOCK'S ISSUES:
rage-0 : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0
You can find log files attached.
### Expected Results ###
Overcloud deployment PASS
### Note ###
From: Rabi Mishra findings:
I had a look at Akardy's environment and redeployment after the failure worked.
1. It seems for some reason the temp keys could not be pushed to nodes using heat-admin after the wait_for_ssh_port() succeeds (i.e immediately after OpenSSH server daemon started on the nodes[1]).
Though I'm not sure on why we get 255 after the port is open, adding some sleep or retrying in case of 255 return code would possibly be a workaround.
2. The other issue is that the code ignores the above error[2] and skips the 'admin enablement' workflow which would fail in a later stage when using tripleo-admin user and should be fixed.
[1]
Apr 26 05:58:43 overcloud-controller-0 systemd[1]: Starting OpenSSH server daemon...
Apr 26 05:58:43 overcloud-controller-0 systemd[1]: Started OpenSSH server daemon.
[2] https://github.com/openstack/python-tripleoclient/blob/stable/train/tripleoclient/workflows/deployment.py#L202-L203
"why we get 255 after the port is open, adding some sleep or retrying in case of 255 return code would possibly be a workaround"
this is a "feature": pam/ssh refuses non-root login during the boot-up, even if the sshd service is up'n'running. One must wait for the boot completion before connecting as non-root, such as heat-admin.
We probably want to request a blocker on that one....