Bug 1630069

Summary:

Instance gets stuck in BUILD during launch. ResourceProviderSyncFailed: Failed to synchronize the placement service with resource provider information supplied by the compute host.

Product:

Red Hat OpenStack

Reporter:

Alexander Chuzhoy <sasha>

Component:

openstack-nova

Assignee:

Lee Yarwood <lyarwood>

Status:

CLOSED ERRATA

QA Contact:

OSP DFG:Compute <osp-dfg-compute>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

14.0 (Rocky)

CC:

berrange, codeology.lab, dasmith, egallen, eglynn, fhubik, jhakimra, kchamart, lyarwood, mbooth, ohochman, sasha, sbauza, sgordon, smooney, vromanso, wznoinsk, yprokule

Target Milestone:

Keywords:

Triaged

Target Release:

14.0 (Rocky)

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

openstack-nova-18.0.3-0.20181011032838.d1243fe.el7ost

Doc Type:

No Doc Update

Doc Text:

undefined

Story Points:

---

Clone Of:

Clones:

1636399 (view as bug list)

Environment:

Last Closed:

2019-01-11 11:53:11 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1653807

Attachments:

Description	Flags
placement_wsgi_error logs	none

Description Alexander Chuzhoy 2018-09-17 22:32:48 UTC

Unable to launch instance after deployment.
Gets stuck in BUILD. 
nova-compute shows:
ResourceProviderSyncFailed: Failed to synchronize the placement service with resource provider information supplied by the compute host.

Environment:
openstack-nova-common-18.0.1-0.20180830232802.43f6f74.el7ost.noarch
python2-novaclient-11.0.0-0.20180809174649.f1005ce.el7ost.noarch
openstack-nova-migration-18.0.1-0.20180830232802.43f6f74.el7ost.noarch
puppet-nova-13.3.1-0.20180831195236.ce0efbe.el7ost.noarch
python-nova-18.0.1-0.20180830232802.43f6f74.el7ost.noarch
openstack-nova-compute-18.0.1-0.20180830232802.43f6f74.el7ost.noarch
openstack-tripleo-heat-templates-9.0.0-0.20180831204457.17bb71e.0rc1.el7ost
instack-undercloud-9.3.1-0.20180831000258.e464799.el7ost.noarch

Steps to reproduce:

Deployed OC on BM setup.
Try to launch an instance.

Result:
The instance gets stuck:
(overcloud) [stack@undercloud ~]$ openstack server list
+--------------------------------------+--------------+--------+----------+--------+---------+
| ID                                   | Name         | Status | Networks | Image  | Flavor  |
+--------------------------------------+--------------+--------+----------+--------+---------+
| ecf1d7c4-8b03-4528-96b5-ddc3a21b2558 | after_deploy | BUILD  |          | cirros | m1.tiny |
+--------------------------------------+--------------+--------+----------+--------+---------+



<h1>Internal Server Error</h1>
<p>The server encountered an internal error or
misconfiguration and was unable to complete
your request.</p>
<p>Please contact the server administrator at 
 [no address given] to inform them of the time this error occurred,
 and the actions you performed just before this error.</p>
<p>More information about this error may be available
in the server error log.</p>
</body></html>
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager [req-268c54e7-9f4f-459f-84f1-5c1f3ab41746 - - - - -] Error updating resources for node overcloud-compute-0.localdomain.: ResourceProvider
SyncFailed: Failed to synchronize the placement service with resource provider information supplied by the compute host.
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager Traceback (most recent call last):
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7722, in _update_available_resource_for_node
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager     rt.update_available_resource(context, nodename)
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 703, in update_available_resource
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager     self._update_available_resource(context, resources)
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 274, in inner
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager     return f(*args, **kwargs)
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 726, in _update_available_resource
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager     self._init_compute_node(context, resources)
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 593, in _init_compute_node
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager     self._update(context, cn)
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/retrying.py", line 68, in wrapped_f
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager     return Retrying(*dargs, **dkw).call(f, *args, **kw)
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/retrying.py", line 223, in call
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager     return attempt.get(self._wrap_exception)
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/retrying.py", line 261, in get
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager     six.reraise(self.value[0], self.value[1], self.value[2])
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/retrying.py", line 217, in call
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager     attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 938, in _update
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager     self._update_to_placement(context, compute_node)
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 907, in _update_to_placement
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager     reportclient.update_from_provider_tree(context, prov_tree)
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 37, in __run_method
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager     return getattr(self.instance, __name)(*args, **kwargs)
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 1535, in update_from_provider_tree
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager     raise exception.ResourceProviderSyncFailed()
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager ResourceProviderSyncFailed: Failed to synchronize the placement service with resource provider information supplied by the compute host.
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager 



Repeating error in /var/log/containers/httpd/nova-placement/placement_wsgi_error.log on controllers:
[Mon Sep 17 19:37:25.276009 2018] [:error] [pid 16] [remote 10.19.94.28:80] mod_wsgi (pid=16): Exception occurred processing WSGI script '/var/www/cgi-bin/nova/nova-placement-api'.
[Mon Sep 17 19:37:25.276033 2018] [:error] [pid 16] [remote 10.19.94.28:80] Traceback (most recent call last):
[Mon Sep 17 19:37:25.276055 2018] [:error] [pid 16] [remote 10.19.94.28:80]   File "/var/www/cgi-bin/nova/nova-placement-api", line 54, in <module>
[Mon Sep 17 19:37:25.276092 2018] [:error] [pid 16] [remote 10.19.94.28:80]     application = init_application()
[Mon Sep 17 19:37:25.276105 2018] [:error] [pid 16] [remote 10.19.94.28:80]   File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/wsgi.py", line 102, in init_application
[Mon Sep 17 19:37:25.276132 2018] [:error] [pid 16] [remote 10.19.94.28:80]     _parse_args([], default_config_files=[conffile])
[Mon Sep 17 19:37:25.276144 2018] [:error] [pid 16] [remote 10.19.94.28:80]   File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/wsgi.py", line 62, in _parse_args
[Mon Sep 17 19:37:25.276164 2018] [:error] [pid 16] [remote 10.19.94.28:80]     logging.register_options(conf.CONF)
[Mon Sep 17 19:37:25.276174 2018] [:error] [pid 16] [remote 10.19.94.28:80]   File "/usr/lib/python2.7/site-packages/oslo_log/log.py", line 250, in register_options
[Mon Sep 17 19:37:25.276195 2018] [:error] [pid 16] [remote 10.19.94.28:80]     conf.register_cli_opts(_options.common_cli_opts)
[Mon Sep 17 19:37:25.276209 2018] [:error] [pid 16] [remote 10.19.94.28:80]   File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2501, in __inner
[Mon Sep 17 19:37:25.276927 2018] [:error] [pid 16] [remote 10.19.94.28:80]     result = f(self, *args, **kwargs)
[Mon Sep 17 19:37:25.276949 2018] [:error] [pid 16] [remote 10.19.94.28:80]   File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2783, in register_cli_opts
[Mon Sep 17 19:37:25.276974 2018] [:error] [pid 16] [remote 10.19.94.28:80]     self.register_cli_opt(opt, group, clear_cache=False)
[Mon Sep 17 19:37:25.276985 2018] [:error] [pid 16] [remote 10.19.94.28:80]   File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2505, in __inner
[Mon Sep 17 19:37:25.277004 2018] [:error] [pid 16] [remote 10.19.94.28:80]     return f(self, *args, **kwargs)
[Mon Sep 17 19:37:25.277014 2018] [:error] [pid 16] [remote 10.19.94.28:80]   File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2775, in register_cli_opt
[Mon Sep 17 19:37:25.277032 2018] [:error] [pid 16] [remote 10.19.94.28:80]     raise ArgsAlreadyParsedError("cannot register CLI option")
[Mon Sep 17 19:37:25.277057 2018] [:error] [pid 16] [remote 10.19.94.28:80] ArgsAlreadyParsedError: arguments already parsed: cannot register CLI option
[Mon Sep 17 19:38:55.686870 2018] [:error] [pid 19] [remote 10.19.94.28:88] mod_wsgi (pid=19): Target WSGI script '/var/www/cgi-bin/nova/nova-placement-api' cannot be loaded as Python module.

Comment 2 Omri Hochman 2018-09-17 22:56:09 UTC

probably regression from the latest puddle, After deployment, we cannot launch_instance.

Comment 6 Alexander Chuzhoy 2018-09-18 12:19:12 UTC

After adding the exec perms on every nova_placement container with:

"sudo docker exec -u root nova_placement chmod +x /var/www/cgi-bin/nova/nova-placement-api"

The issue got resolved.

Comment 8 Lee Yarwood 2018-09-18 15:27:50 UTC

(In reply to Alexander Chuzhoy from comment #6)
> After adding the exec perms on every nova_placement container with:
> 
> "sudo docker exec -u root nova_placement chmod +x
> /var/www/cgi-bin/nova/nova-placement-api"
> 
> The issue got resolved.

FWIW I don't think that's the underlying issue, rather the Nova API DB wasn't fully sync'd / created by the nova_api container(s) when the nova_placement container started on controller-1 leading to an ERROR. Unfortunately we don't handle the following reloads of the wsgi app well and end up repeatedly failing as we have already registered config options. I'll look into this with our deployment squad to confirm and confirm if we can workaround this by calling nova-manage api_db sync from the nova_placement container at start up.

Comment 9 Lee Yarwood 2018-09-18 15:47:04 UTC

mschuppert reminded me how the deploy and start orders work [1] and I can see now in THT that nova-api will call `nova-manage api_db sync` [2] at the same time as we start nova-placement [3] during deployment step 3 on the controllers.

We could have nova-placement also call api_db sync before starting the app or have it fail in a cleaner way allowing the app to restart later once the DB is actually ready...

I'll look into this now.

[1] https://github.com/openstack/tripleo-heat-templates/blob/master/docker/services/README.rst#docker-steps
[2] https://github.com/openstack/tripleo-heat-templates/blob/cbdb2356f9bea5a927c73f1671bb09e930153e41/docker/services/nova-api.yaml#L213-L227
[3] https://github.com/openstack/tripleo-heat-templates/blob/cbdb2356f9bea5a927c73f1671bb09e930153e41/docker/services/nova-placement.yaml#L122-L150

Comment 19 Martin Schuppert 2018-10-04 13:48:12 UTC

*** Bug 1623613 has been marked as a duplicate of this bug. ***

Comment 20 Alexander Chuzhoy 2018-10-09 22:38:28 UTC

Environment:
openstack-nova-common-18.0.1-0.20180922200406.03188f7.el7ost.noarch
openstack-nova-migration-18.0.1-0.20180922200406.03188f7.el7ost.noarch
openstack-nova-compute-18.0.1-0.20180922200406.03188f7.el7ost.noarch


If after OC deployment we restart nova_placement on all controllers, then the instance is launched successfully.

Without restarting nova_placement, the instance gets status ERROR:
(overcloud) [stack@undercloud ~]$ openstack server list
+--------------------------------------+--------------+--------+----------+--------+---------+
| ID                                   | Name         | Status | Networks | Image  | Flavor  |
+--------------------------------------+--------------+--------+----------+--------+---------+
| 5522d013-9f89-4f88-a529-6ebb78265e6c | after_deploy | ERROR  |          | cirros | m1.tiny |
+--------------------------------------+--------------+--------+----------+--------+---------+


Looking in nova-compute.log on compute node:

2018-10-09 22:30:46.712 1 ERROR nova.compute.manager [req-b7e4491f-0666-47e2-9abc-c7f1963077ad - - - - -] Error updating resources for node overcloud-cmp-leaf0-0.localdomain.: ResourceProviderCreationFailed: Failed to create resource provider overcloud-cmp-leaf0-0.localdomain
2018-10-09 22:30:46.712 1 ERROR nova.compute.manager Traceback (most recent call last):
2018-10-09 22:30:46.712 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7736, in _update_available_resource_for_node
2018-10-09 22:30:46.712 1 ERROR nova.compute.manager     rt.update_available_resource(context, nodename)
2018-10-09 22:30:46.712 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 724, in update_available_resource
2018-10-09 22:30:46.712 1 ERROR nova.compute.manager     self._update_available_resource(context, resources)
2018-10-09 22:30:46.712 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 274, in inner
2018-10-09 22:30:46.712 1 ERROR nova.compute.manager     return f(*args, **kwargs)
2018-10-09 22:30:46.712 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 747, in _update_available_resource
2018-10-09 22:30:46.712 1 ERROR nova.compute.manager     self._init_compute_node(context, resources)
2018-10-09 22:30:46.712 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 564, in _init_compute_node
2018-10-09 22:30:46.712 1 ERROR nova.compute.manager     self._update(context, cn)
2018-10-09 22:30:46.712 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/retrying.py", line 68, in wrapped_f
2018-10-09 22:30:46.712 1 ERROR nova.compute.manager     return Retrying(*dargs, **dkw).call(f, *args, **kw)
2018-10-09 22:30:46.712 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/retrying.py", line 223, in call
2018-10-09 22:30:46.712 1 ERROR nova.compute.manager     return attempt.get(self._wrap_exception)
2018-10-09 22:30:46.712 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/retrying.py", line 261, in get
2018-10-09 22:30:46.712 1 ERROR nova.compute.manager     six.reraise(self.value[0], self.value[1], self.value[2])
2018-10-09 22:30:46.712 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/retrying.py", line 217, in call
2018-10-09 22:30:46.712 1 ERROR nova.compute.manager     attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
2018-10-09 22:30:46.712 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 959, in _update
2018-10-09 22:30:46.712 1 ERROR nova.compute.manager     self._update_to_placement(context, compute_node)
2018-10-09 22:30:46.712 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 915, in _update_to_placement
2018-10-09 22:30:46.712 1 ERROR nova.compute.manager     context, compute_node.uuid, name=compute_node.hypervisor_hostname)
2018-10-09 22:30:46.712 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 37, in __run_method
2018-10-09 22:30:46.712 1 ERROR nova.compute.manager     return getattr(self.instance, __name)(*args, **kwargs)
2018-10-09 22:30:46.712 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 1006, in get_provider_tree_and_ensure_root
2018-10-09 22:30:46.712 1 ERROR nova.compute.manager     parent_provider_uuid=parent_provider_uuid)
2018-10-09 22:30:46.712 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 672, in _ensure_resource_provider
2018-10-09 22:30:46.712 1 ERROR nova.compute.manager     parent_provider_uuid=parent_provider_uuid)
2018-10-09 22:30:46.712 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 74, in wrapper
2018-10-09 22:30:46.712 1 ERROR nova.compute.manager     return f(self, *a, **k)
2018-10-09 22:30:46.712 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 606, in _create_resource_provider
2018-10-09 22:30:46.712 1 ERROR nova.compute.manager     raise exception.ResourceProviderCreationFailed(name=name)
2018-10-09 22:30:46.712 1 ERROR nova.compute.manager ResourceProviderCreationFailed: Failed to create resource provider overcloud-cmp-leaf0-0.localdomain
2018-10-09 22:30:46.712 1 ERROR nova.compute.manager

Comment 21 Lee Yarwood 2018-10-10 13:54:23 UTC

(In reply to Alexander Chuzhoy from comment #20)
> Environment:
> openstack-nova-common-18.0.1-0.20180922200406.03188f7.el7ost.noarch
> openstack-nova-migration-18.0.1-0.20180922200406.03188f7.el7ost.noarch
> openstack-nova-compute-18.0.1-0.20180922200406.03188f7.el7ost.noarch
> 
> 
> If after OC deployment we restart nova_placement on all controllers, then
> the instance is launched successfully.
> 
> Without restarting nova_placement, the instance gets status ERROR:
> (overcloud) [stack@undercloud ~]$ openstack server list
> +--------------------------------------+--------------+--------+----------+--
> ------+---------+
> | ID                                   | Name         | Status | Networks |
> Image  | Flavor  |
> +--------------------------------------+--------------+--------+----------+--
> ------+---------+
> | 5522d013-9f89-4f88-a529-6ebb78265e6c | after_deploy | ERROR  |          |
> cirros | m1.tiny |
> +--------------------------------------+--------------+--------+----------+--
> ------+---------+
> 
> 
> Looking in nova-compute.log on compute node:
> 
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager
> [req-b7e4491f-0666-47e2-9abc-c7f1963077ad - - - - -] Error updating
> resources for node overcloud-cmp-leaf0-0.localdomain.:
> ResourceProviderCreationFailed: Failed to create resource provider
> overcloud-cmp-leaf0-0.localdomain
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager Traceback (most recent
> call last):
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager   File
> "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7736, in
> _update_available_resource_for_node
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager    
> rt.update_available_resource(context, nodename)
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager   File
> "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line
> 724, in update_available_resource
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager    
> self._update_available_resource(context, resources)
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager   File
> "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 274,
> in inner
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager     return f(*args,
> **kwargs)
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager   File
> "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line
> 747, in _update_available_resource
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager    
> self._init_compute_node(context, resources)
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager   File
> "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line
> 564, in _init_compute_node
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager    
> self._update(context, cn)
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager   File
> "/usr/lib/python2.7/site-packages/retrying.py", line 68, in wrapped_f
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager     return
> Retrying(*dargs, **dkw).call(f, *args, **kw)
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager   File
> "/usr/lib/python2.7/site-packages/retrying.py", line 223, in call
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager     return
> attempt.get(self._wrap_exception)
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager   File
> "/usr/lib/python2.7/site-packages/retrying.py", line 261, in get
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager    
> six.reraise(self.value[0], self.value[1], self.value[2])
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager   File
> "/usr/lib/python2.7/site-packages/retrying.py", line 217, in call
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager     attempt =
> Attempt(fn(*args, **kwargs), attempt_number, False)
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager   File
> "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line
> 959, in _update
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager    
> self._update_to_placement(context, compute_node)
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager   File
> "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line
> 915, in _update_to_placement
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager     context,
> compute_node.uuid, name=compute_node.hypervisor_hostname)
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager   File
> "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line
> 37, in __run_method
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager     return
> getattr(self.instance, __name)(*args, **kwargs)
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager   File
> "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line
> 1006, in get_provider_tree_and_ensure_root
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager    
> parent_provider_uuid=parent_provider_uuid)
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager   File
> "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line
> 672, in _ensure_resource_provider
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager    
> parent_provider_uuid=parent_provider_uuid)
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager   File
> "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 74,
> in wrapper
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager     return f(self, *a,
> **k)
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager   File
> "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line
> 606, in _create_resource_provider
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager     raise
> exception.ResourceProviderCreationFailed(name=name)
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager
> ResourceProviderCreationFailed: Failed to create resource provider
> overcloud-cmp-leaf0-0.localdomain
> 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager

Can you also provide the placement logs?

Comment 22 Alexander Chuzhoy 2018-10-10 15:13:15 UTC

Created attachment 1492604 [details]
placement_wsgi_error logs

Comment 23 Lee Yarwood 2018-10-11 11:19:15 UTC

Thanks for the logs, so we've worked past one issue and are now hitting another:

46087 [Wed Oct 10 15:10:49.433284 2018] [:error] [pid 14] [remote 172.25.0.10:208] mod_wsgi (pid=14): Target WSGI script '/var/www/cgi-bin/nova/nova-placement-api' cannot be loaded as Python module.
46088 [Wed Oct 10 15:10:49.433305 2018] [:error] [pid 14] [remote 172.25.0.10:208] mod_wsgi (pid=14): Exception occurred processing WSGI script '/var/www/cgi-bin/nova/nova-placement-api'.
46089 [Wed Oct 10 15:10:49.433320 2018] [:error] [pid 14] [remote 172.25.0.10:208] Traceback (most recent call last):
46090 [Wed Oct 10 15:10:49.433333 2018] [:error] [pid 14] [remote 172.25.0.10:208]   File "/var/www/cgi-bin/nova/nova-placement-api", line 54, in <module>
46091 [Wed Oct 10 15:10:49.433354 2018] [:error] [pid 14] [remote 172.25.0.10:208]     application = init_application()
46092 [Wed Oct 10 15:10:49.433361 2018] [:error] [pid 14] [remote 172.25.0.10:208]   File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/wsgi.py", line 108, in init_application
46093 [Wed Oct 10 15:10:49.433386 2018] [:error] [pid 14] [remote 172.25.0.10:208]     db_api.configure(conf.CONF)
46094 [Wed Oct 10 15:10:49.433392 2018] [:error] [pid 14] [remote 172.25.0.10:208]   File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/db_api.py", line 35, in configure
46095 [Wed Oct 10 15:10:49.433403 2018] [:error] [pid 14] [remote 172.25.0.10:208]     **_get_db_conf(conf.placement_database))
46096 [Wed Oct 10 15:10:49.433408 2018] [:error] [pid 14] [remote 172.25.0.10:208]   File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 788, in configure
46097 [Wed Oct 10 15:10:49.433420 2018] [:error] [pid 14] [remote 172.25.0.10:208]     self._factory.configure(**kw)
46098 [Wed Oct 10 15:10:49.433425 2018] [:error] [pid 14] [remote 172.25.0.10:208]   File "/usr/lib/python2.7/site-packages/debtcollector/renames.py", line 43, in decorator
46099 [Wed Oct 10 15:10:49.433435 2018] [:error] [pid 14] [remote 172.25.0.10:208]     return wrapped(*args, **kwargs)
46100 [Wed Oct 10 15:10:49.433440 2018] [:error] [pid 14] [remote 172.25.0.10:208]   File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 312, in configure
46101 [Wed Oct 10 15:10:49.433449 2018] [:error] [pid 14] [remote 172.25.0.10:208]     self._configure(False, kw)
46102 [Wed Oct 10 15:10:49.433453 2018] [:error] [pid 14] [remote 172.25.0.10:208]   File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 317, in _configure
46103 [Wed Oct 10 15:10:49.433462 2018] [:error] [pid 14] [remote 172.25.0.10:208]     raise TypeError("this TransactionFactory is already started")
46104 [Wed Oct 10 15:10:49.433473 2018] [:error] [pid 14] [remote 172.25.0.10:208] TypeError: this TransactionFactory is already started

Comment 24 Cody 2018-10-14 19:59:09 UTC

I face exactly the same issue with Queens, too. 

Environment:

openstack-nova-common-17.0.7-0.20181005101004.a48c112.el7.noarch
openstack-nova-migration-17.0.7-0.20181005101004.a48c112.el7.noarch
openstack-nova-compute-17.0.7-0.20181005101004.a48c112.el7.noarch
python2-novaclient-10.1.0-0.20180605113302.ed73aad.el7.noarch
python-nova-17.0.7-0.20181005101004.a48c112.el7.noarch
puppet-nova-12.4.1-0.20181004113806.5d25320.el7.noarch
openstack-tripleo-heat-templates-8.0.8-0.20181007022806.28100e8.el7.noarch
instack-undercloud-8.4.7-0.20181004133804.e68034f.el7.noarch

Logs:

nova-compute.log and placement_wsgi_error.log are the same as above.

Additional info:

My cluster used 3 controllers and was deployed with controller HA and fencing enabled. VM launch failures would continue until the nova_placement container became unhealthy and was given a restart.

Comment 25 Waldemar Znoinski 2018-10-18 20:42:36 UTC

*** Bug 1635665 has been marked as a duplicate of this bug. ***

Comment 26 Matthew Booth 2018-10-19 12:36:38 UTC

I think placement needs to create itself a new transaction context manager when calling configure. I'll propose something upstream.

Comment 27 smooney 2018-10-19 14:16:56 UTC

i have a patch that is related to this https://review.openstack.org/#/c/610034/
i believe it will correct the type error but i dont know if its sufficent to resolve the issue entirely.

Comment 28 Matthew Booth 2018-10-25 12:41:12 UTC

*** Bug 1641877 has been marked as a duplicate of this bug. ***

Comment 34 errata-xmlrpc 2019-01-11 11:53:11 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0045