Bug 1691049
Summary: | openstack overcloud node provide hangs due the connectivity issues between containers | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Yuri Obshansky <yobshans> |
Component: | openstack-nova | Assignee: | OSP DFG:Compute <osp-dfg-compute> |
Status: | CLOSED DUPLICATE | QA Contact: | OSP DFG:Compute <osp-dfg-compute> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | 15.0 (Stein) | CC: | dasmith, eglynn, jhakimra, jslagle, kchamart, mbayer, michele, sasha, sbauza, sgordon, vromanso |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-03-22 14:19:44 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Yuri Obshansky
2019-03-20 17:48:15 UTC
Had a look today on an environment provided by Yuri The symptoms I'm seeing are slightly different from the ones originally reported, but the consequence is the same: "openstack overcloud node provide --all-manageable" is stuck and never finishes. A couple of remarks: 1. The DB disconnection logs reported above are most probably a red herring. It reminds me of the errors that you get when configuring too many workers for a service [1]. What changed in the mean time is that now with Mariadb 10.3, the error is also reported server-side, which probably explains those logs in /var/log/containers/mysql/mariadb.log: 2019-03-19 19:07:18 17 [Warning] Aborted connection 17 to db: 'keystone' user: 'keystone' host: 'site-undercloud-0.localdomain' (Got an error read ing communication packets) [...] 2. When connected on the env, I can see that both mysql and rabbitmq containers are still running and responding fine apparently: $ ironic node-list +--------------------------------------+-------------------+---------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+-------------------+---------------+-------------+--------------------+-------------+ | 1a535c0c-d26e-4ce3-a5c7-4d08e62a09a4 | dcn1-compute-0 | None | power off | available | False | | f783df0c-f123-4b90-adc5-ecfc3a93d5be | dcn2-compute-0 | None | power off | available | False | | f237f47a-f0b6-41d8-ba1a-89f61785a318 | site-compute-0 | None | power off | available | False | | b6fc6beb-dda9-44e5-ae65-605168bc5224 | site-controller-0 | None | power off | available | False | | cdd1003b-b784-47d7-be2b-f2623f8b5b0d | site-controller-1 | None | power off | available | False | | d8af97a4-94be-466b-bf5c-561835b7192a | site-controller-2 | None | power off | available | False | +--------------------------------------+-------------------+---------------+-------------+--------------------+-------------+ So I wonder if the stalled behaviour in "openstack overcloud node provide --all-manageable" isn't due to mistral (openstack overcloud commands run through mistral). 3. looking at mistral logs, I effectively see that some workflow errored out: 2019-03-22 12:23:40.801 1 WARNING mistral.actions.openstack.base [req-d5012f25-6d22-44ac-bd1f-6907353af620 96b0314a8bbc43f487377f2b8fb7e260 2a61b3 502c83486ca907c87d758964e0 - default default] Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/mistral/actions/openstack/base.py", line 117, in run result = method(**self._kwargs_for_run) File "/usr/lib/python3.6/site-packages/novaclient/base.py", line 418, in find raise exceptions.NotFound(404, msg) novaclient.exceptions.NotFound: No Hypervisor matching {'hypervisor_hostname': 'cdd1003b-b784-47d7-be2b-f2623f8b5b0d'}. (HTTP 404) : novaclient.exceptions.NotFound: No Hypervisor matching {'hypervisor_hostname': 'cdd1003b-b784-47d7-be2b-f2623f8b5b0d'}. (HTTP 404) 2019-03-22 12:23:40.801 1 WARNING mistral.executors.default_executor [req-d5012f25-6d22-44ac-bd1f-6907353af620 96b0314a8bbc43f487377f2b8fb7e260 2a 61b3502c83486ca907c87d758964e0 - default default] The action raised an exception [action_ex_id=cfa5959b-2741-41fd-8ef6-67d8d13f88e9, action_cls='< class 'mistral.actions.action_factory.NovaAction'>', attributes='{'client_method_name': 'hypervisors.find'}', params='{'hypervisor_hostname': 'cdd 1003b-b784-47d7-be2b-f2623f8b5b0d'}'] NovaAction.hypervisors.find failed: No Hypervisor matching {'hypervisor_hostname': 'cdd1003b-b784-47d7-be2b-f2623f8b5b0d'}. (HTTP 404): mistral.e xceptions.ActionException: NovaAction.hypervisors.find failed: No Hypervisor matching {'hypervisor_hostname': 'cdd1003b-b784-47d7-be2b-f2623f8b5b0 d'}. (HTTP 404) 2019-03-22 12:23:40.801 1 ERROR mistral.executors.default_executor Traceback (most recent call last): 2019-03-22 12:23:40.801 1 ERROR mistral.executors.default_executor File "/usr/lib/python3.6/site-packages/mistral/actions/openstack/base.py", li ne 117, in run 2019-03-22 12:23:40.801 1 ERROR mistral.executors.default_executor result = method(**self._kwargs_for_run) 2019-03-22 12:23:40.801 1 ERROR mistral.executors.default_executor File "/usr/lib/python3.6/site-packages/novaclient/base.py", line 418, in find 2019-03-22 12:23:40.801 1 ERROR mistral.executors.default_executor raise exceptions.NotFound(404, msg) 2019-03-22 12:23:40.801 1 ERROR mistral.executors.default_executor novaclient.exceptions.NotFound: No Hypervisor matching {'hypervisor_hostname': 'cdd1003b-b784-47d7-be2b-f2623f8b5b0d'}. (HTTP 404) 2019-03-22 12:23:40.801 1 ERROR mistral.executors.default_executor 2019-03-22 12:23:40.801 1 ERROR mistral.executors.default_executor During handling of the above exception, another exception occurred: 2019-03-22 12:23:40.801 1 ERROR mistral.executors.default_executor 2019-03-22 12:23:40.801 1 ERROR mistral.executors.default_executor Traceback (most recent call last): 2019-03-22 12:23:40.801 1 ERROR mistral.executors.default_executor File "/usr/lib/python3.6/site-packages/mistral/executors/default_executor.py", line 114, in run_action 2019-03-22 12:23:40.801 1 ERROR mistral.executors.default_executor result = action.run(action_ctx) 2019-03-22 12:23:40.801 1 ERROR mistral.executors.default_executor File "/usr/lib/python3.6/site-packages/mistral/actions/openstack/base.py", line 130, in run 2019-03-22 12:23:40.801 1 ERROR mistral.executors.default_executor (self.__class__.__name__, self.client_method_name, str(e)) 2019-03-22 12:23:40.801 1 ERROR mistral.executors.default_executor mistral.exceptions.ActionException: NovaAction.hypervisors.find failed: No Hypervisor matching {'hypervisor_hostname': 'cdd1003b-b784-47d7-be2b-f2623f8b5b0d'}. (HTTP 404) 2019-03-22 12:23:40.801 1 ERROR mistral.executors.default_executor So it looks like a nova API returned an error 4. when looking at nova logs, I can see: 2019-03-22 12:23:29.938 1 ERROR nova.virt.ironic.driver [req-688eac33-96a1-4ef3-8002-6fb1ee7d6425 - - - - -] An unknown error has occurred when tr ying to get the list of nodes from the Ironic inventory. Error: maximum recursion depth exceeded while calling a Python object: RecursionError: ma ximum recursion depth exceeded while calling a Python object 2019-03-22 12:23:29.938 1 WARNING nova.compute.manager [req-688eac33-96a1-4ef3-8002-6fb1ee7d6425 - - - - -] Virt driver is not ready.: nova.except ion.VirtDriverNotReady: Virt driver is not ready. So, bottom line, it looks like the DB or rabbit are not the root cause of the failure, but something in Nova got misconfigured in the first place? [1] http://lists.openstack.org/pipermail/openstack-dev/2015-December/082717.html *** This bug has been marked as a duplicate of bug 1686817 *** |