Bug 1835828
| Summary: | Overcloud deployment times out | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Filip Hubík <fhubik> |
| Component: | python-paunch | Assignee: | Steve Baker <sbaker> |
| Status: | CLOSED ERRATA | QA Contact: | nlevinki <nlevinki> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 13.0 (Queens) | CC: | apevec, aschultz, bcafarel, bdobreli, chrisw, drosenfe, rhos-maint, sbaker, wznoinsk |
| Target Milestone: | --- | Keywords: | Regression, Triaged, ZStream |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | python-paunch-2.5.3-6.el7ost | Doc Type: | No Doc Update |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-06-24 11:34:18 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Filip Hubík
2020-05-14 14:39:28 UTC
Quick note: the netdev_tc_offloads error logs are erroneous/harmless, this is bug #1737982 and is most probably not what is causing the timeout/deployment failure. Correction from above: OC nodes are provisioned, but it seems like initial stage of their OC deployment fail, in detail:
I see OC deployment being stuck indefinitely:
$ openstack software deployment list
+--------------------------------------+--------------------------------------+--------------------------------------+--------+-------------+
| id | config_id | server_id | action | status |
+--------------------------------------+--------------------------------------+--------------------------------------+--------+-------------+
| 73ddef0e-7da9-4a6c-aa47-f106d5fd44a4 | eea0d2f1-5380-4515-8c94-4140e5cf24ea | 164614c4-7571-4d02-9f1a-ee1c15102b95 | CREATE | IN_PROGRESS |
| 7514db86-9b9d-400d-aa6d-19cd2eca2d07 | fa29c3ef-62ca-4340-b81c-3624803183c1 | da891512-cc02-4ef4-8971-f0d05cd8bd46 | CREATE | IN_PROGRESS |
| 17df9b92-d3eb-46ba-892f-07b63837438f | b13d46c4-0b8e-4ea1-82f6-c39aac55c22e | 1cab8560-146b-49a2-9aeb-b4cf98eea0fc | CREATE | IN_PROGRESS |
+--------------------------------------+--------------------------------------+--------------------------------------+--------+-------------+
$ openstack software deployment show 73ddef0e-7da9-4a6c-aa47-f106d5fd44a4
+---------------+--------------------------------------------------------+
| Field | Value |
+---------------+--------------------------------------------------------+
| id | 73ddef0e-7da9-4a6c-aa47-f106d5fd44a4 |
| server_id | 164614c4-7571-4d02-9f1a-ee1c15102b95 |
| config_id | eea0d2f1-5380-4515-8c94-4140e5cf24ea |
| creation_time | 2020-05-14T15:28:22Z |
| updated_time | |
| status | IN_PROGRESS |
| status_reason | Deploy data available |
| input_values | {u'interface_name': u'nic1', u'bridge_name': u'br-ex'} |
| action | CREATE |
+---------------+--------------------------------------------------------+
$ openstack software config show xyz # shows relation to network configuration
on OC nodes I see no br-ex (ovs-vsctl).
Also /var/log/messages on OC nodes report docker related failure
May 15 10:09:41 compute-0 os-collect-config: dib-run-parts Fri May 15 10:09:41 EDT 2020 Running /usr/libexec/os-refresh-config/configure.d/50-heat-config-docker-cmd
May 15 10:09:41 compute-0 os-collect-config: Traceback (most recent call last):
May 15 10:09:41 compute-0 os-collect-config: File "/usr/libexec/os-refresh-config/configure.d/50-heat-config-docker-cmd", line 62, in <module>
May 15 10:09:41 compute-0 os-collect-config: sys.exit(main(sys.argv))
May 15 10:09:41 compute-0 os-collect-config: File "/usr/libexec/os-refresh-config/configure.d/50-heat-config-docker-cmd", line 57, in main
May 15 10:09:41 compute-0 os-collect-config: docker_cmd=DOCKER_CMD
May 15 10:09:41 compute-0 os-collect-config: File "/usr/lib/python2.7/site-packages/paunch/__init__.py", line 78, in cleanup
May 15 10:09:41 compute-0 os-collect-config: r.rename_containers()
May 15 10:09:41 compute-0 os-collect-config: File "/usr/lib/python2.7/site-packages/paunch/runner.py", line 114, in rename_containers
May 15 10:09:41 compute-0 os-collect-config: for entry in self.container_names():
May 15 10:09:41 compute-0 os-collect-config: TypeError: 'NoneType' object is not iterable
May 15 10:09:41 compute-0 os-collect-config: [2020-05-15 10:09:41,882] (os-refresh-config) [ERROR] during configure phase. [Command '['dib-run-parts', '/usr/libexec/os-refresh-config/configure.d']' returned non-zero exit status 1]
May 15 10:09:41 compute-0 os-collect-config: [2020-05-15 10:09:41,883] (os-refresh-config) [ERROR] Aborting...
which seem to be iterating over container in "rename_containers" function (/usr/lib/python2.7/site-packages/paunch/runner.py):
def rename_containers(self):
current_containers = []
need_renaming = {}
renamed = False
for entry in self.container_names():
...
^- above happening periodically, which can explain the timeout.
This doesn't seem to be related to networking, maybe DF DFG folks can help to find the root cause of this. regression caused by https://review.opendev.org/#/c/711432/ I can confirm with https://review.opendev.org/#/c/728477/ change pulled manually into overcloud-full.qcow2 right before OC deployment, OC deployment of OSP13 (2020-05-11.2) passed. Build is successful now: https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/ReleaseDelivery/view/OSP13/job/phase1-13_director-rhel-7.8-virthost-1cont_1comp_1ceph-ipv4-vxlan-ceph-containers/29/ Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2718 |