Created attachment 1682089 [details] OC deployment logs ### Scenario ### Deploy Overcloud using proper overcloud_deploy.sh scrip ### Actual Result ### Overcloud Deployment fails + ERRORs in logs: Errors logged: ------------------------------ LogPath: /home/stack/overcloud_deployme 2082-2020-04-26 09:57:26.476 118777 DEBUG tripleoclient.plugin [ admin] {'ttl': 3600, 'body': {'type': 'tripleo.deployment.v1.set_ 2083-2020-04-26 09:57:27.321 118777 INFO tripleoclient.v1.overcloud_ 2084:2020-04-26 09:58:43.335 118777 ERROR tripleoclient.v1.overcloud_Couldn't not import keys to one of ['192.168.24....<--LogTool- 2085- 2086-2020-04-26 09:58:43.335 118777 INFO tripleoclient.v1.overcloud_ 2087-2020-04-26 09:58:43.336 118777 DEBUG keystoneauth.session [ admin] REQ: curl -g -i -X GET https://192.168.24.2: -H "User-Agen...<--LogTool-LINE IS TOO LONG! 2088-2020-04-26 09:58:43.337 118777 DEBUG urllib3.connectionpool [ admin] Resetting dropped connection: 192.168.24.2 2089-2020-04-26 09:58:43.511 118777 DEBUG urllib3.connectionpool [ admin] https://192.168.24.2: "GET /v2/executions HTTP/1.1" 200 208780 2090-2020-04-26 09:58:43.513 118777 DEBUG keystoneauth.session [ admin] RESP: [200] Content-Length: 208780 Content-Type: application/json Date: Sun, ...<--LogTool-LINE IS TOO LONG! 2091-2020-04-26 09:58:43.519 118777 DEBUG keystoneauth.session [ admin] RESP BODY: {"executions": [{"id": "75af3d66-98cd-4433-9fb4- LogTool --> POTENTIAL BLOCK'S ISSUES: 2084:2020-04-26 09:58:43.335 118777 ERROR tripleoclient.v1.overcloud_ exists\"}, \"on-success\": \"send_message\", \"on-error\": \"create_container\", \"type\": \"direct\", \"name\": \"verify_container_doesnt_ \"<% task().result.where($.finished = true and $.error != null).select($.uuid) + task().result.where($.finished = false).select($.uuid) %>\"}, \"publish-on-error\": {\"intros... iled introspection. Attempt {1} of {2} '.format($.failed_ ------------------------------ LogPath: /var/lib/mistral/overcloud/ 321-2020-04-26 10:01:19,660 p=731 u=mistral | <192.168.24.20> SSH: EXEC ssh -vvv -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o Contr...<--LogTool-LINE IS TOO LONG! 322-2020-04-26 10:01:19,670 p=731 u=mistral | <192.168.24.7> (255, b'', b'OpenSSH_8.0p1, OpenSSL 1.1.1c FIPS 28 May 2019\r\ndebug1: Reading configur...<--LogTool-LINE IS TOO LONG! 323:2020-04-26 10:01:19,673 p=731 u=mistral | [WARNING]: Unhandled error in Python interpreter discovery for host overcloud- 324-cephstorage-0: Failed to connect to the host via ssh: OpenSSH_8.0p1, OpenSSL 325-1.1.1c FIPS 28 May 2019 debug1: Reading configuration data 326-/etc/ssh/ssh_config debug3: /etc/ssh/ssh_config line 51: Including file 327-/etc/ssh/ssh_config.d/05- 328-data /etc/ssh/ssh_config.d/05- 329-all' host 192.168.24.7 originally 192.168.24.7 debug3: 330-/etc/ssh/ssh_config.d/05- ------------------------------ LogPath: /var/lib/mistral/overcloud/ 1090-2020-04-26 10:03:16,437 p=731 u=mistral | <192.168.24.7> SSH: EXEC ssh -vvv -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o Contr...<--LogTool-LINE IS TOO LONG! 1091-2020-04-26 10:03:16,496 p=731 u=mistral | <192.168.24.20> SSH: EXEC ssh -vvv -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o Cont...<--LogTool-LINE IS TOO LONG! 1092:2020-04-26 10:03:16,510 p=731 u=mistral | fatal: [overcloud-novacompute-0]: UNREACHABLE! => { 1093- "changed": false, 1094- "msg": "Data could not be sent to remote host \"192.168.24.6\". Make sure this host can be reached over ssh: OpenSSH_8.0p1, OpenSSL 1.1.1c FI...<--LogTool-LINE IS TOO LONG! 1095- "unreachable": true 1096-} 1097:2020-04-26 10:03:16,561 p=731 u=mistral | fatal: [overcloud-cephstorage-0]: UNREACHABLE! => { 1098- "changed": false, 1099- "msg": "Data could not be sent to remote host \"192.168.24.7\". Make sure this host can be reached over ssh: OpenSSH_8.0p1, OpenSSL 1.1.1c FI...<--LogTool-LINE IS TOO LONG! ... ---< BLOCK IS TOO LONG >--- ... 1115- "changed": false, 1116- "msg": "Data could not be sent to remote host \"192.168.24.24\". Make sure this host can be reached over ssh: OpenSSH_8.0p1, OpenSSL 1.1.1c F...<--LogTool-LINE IS TOO LONG! 1117- "unreachable": true 1118-} 1119-2020-04-26 10:03:16,714 p=731 u=mistral | NO MORE HOSTS LEFT ****************************** 1120-2020-04-26 10:03:16,715 p=731 u=mistral | PLAY RECAP ****************************** 1121-2020-04-26 10:03:16,715 p=731 u=mistral | overcloud-cephstorage-0 : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ...<--LogTool-LINE IS TOO LONG! LogTool --> POTENTIAL BLOCK'S ISSUES: rage-0 : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0 You can find log files attached. ### Expected Results ### Overcloud deployment PASS ### Note ### From: Rabi Mishra findings: I had a look at Akardy's environment and redeployment after the failure worked. 1. It seems for some reason the temp keys could not be pushed to nodes using heat-admin after the wait_for_ssh_port() succeeds (i.e immediately after OpenSSH server daemon started on the nodes[1]). Though I'm not sure on why we get 255 after the port is open, adding some sleep or retrying in case of 255 return code would possibly be a workaround. 2. The other issue is that the code ignores the above error[2] and skips the 'admin enablement' workflow which would fail in a later stage when using tripleo-admin user and should be fixed. [1] Apr 26 05:58:43 overcloud-controller-0 systemd[1]: Starting OpenSSH server daemon... Apr 26 05:58:43 overcloud-controller-0 systemd[1]: Started OpenSSH server daemon. [2] https://github.com/openstack/python-tripleoclient/blob/stable/train/tripleoclient/workflows/deployment.py#L202-L203
"why we get 255 after the port is open, adding some sleep or retrying in case of 255 return code would possibly be a workaround" this is a "feature": pam/ssh refuses non-root login during the boot-up, even if the sshd service is up'n'running. One must wait for the boot completion before connecting as non-root, such as heat-admin. We probably want to request a blocker on that one....
*** Bug 1828381 has been marked as a duplicate of this bug. ***
*** Bug 1829186 has been marked as a duplicate of this bug. ***