Bug 1630069
Summary: | Instance gets stuck in BUILD during launch. ResourceProviderSyncFailed: Failed to synchronize the placement service with resource provider information supplied by the compute host. | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Alexander Chuzhoy <sasha> | ||||
Component: | openstack-nova | Assignee: | Lee Yarwood <lyarwood> | ||||
Status: | CLOSED ERRATA | QA Contact: | OSP DFG:Compute <osp-dfg-compute> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 14.0 (Rocky) | CC: | berrange, codeology.lab, dasmith, egallen, eglynn, fhubik, jhakimra, kchamart, lyarwood, mbooth, ohochman, sasha, sbauza, sgordon, smooney, vromanso, wznoinsk, yprokule | ||||
Target Milestone: | rc | Keywords: | Triaged | ||||
Target Release: | 14.0 (Rocky) | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | openstack-nova-18.0.3-0.20181011032838.d1243fe.el7ost | Doc Type: | No Doc Update | ||||
Doc Text: |
undefined
|
Story Points: | --- | ||||
Clone Of: | |||||||
: | 1636399 (view as bug list) | Environment: | |||||
Last Closed: | 2019-01-11 11:53:11 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1653807 | ||||||
Attachments: |
|
Description
Alexander Chuzhoy
2018-09-17 22:32:48 UTC
probably regression from the latest puddle, After deployment, we cannot launch_instance. After adding the exec perms on every nova_placement container with: "sudo docker exec -u root nova_placement chmod +x /var/www/cgi-bin/nova/nova-placement-api" The issue got resolved. (In reply to Alexander Chuzhoy from comment #6) > After adding the exec perms on every nova_placement container with: > > "sudo docker exec -u root nova_placement chmod +x > /var/www/cgi-bin/nova/nova-placement-api" > > The issue got resolved. FWIW I don't think that's the underlying issue, rather the Nova API DB wasn't fully sync'd / created by the nova_api container(s) when the nova_placement container started on controller-1 leading to an ERROR. Unfortunately we don't handle the following reloads of the wsgi app well and end up repeatedly failing as we have already registered config options. I'll look into this with our deployment squad to confirm and confirm if we can workaround this by calling nova-manage api_db sync from the nova_placement container at start up. mschuppert reminded me how the deploy and start orders work [1] and I can see now in THT that nova-api will call `nova-manage api_db sync` [2] at the same time as we start nova-placement [3] during deployment step 3 on the controllers. We could have nova-placement also call api_db sync before starting the app or have it fail in a cleaner way allowing the app to restart later once the DB is actually ready... I'll look into this now. [1] https://github.com/openstack/tripleo-heat-templates/blob/master/docker/services/README.rst#docker-steps [2] https://github.com/openstack/tripleo-heat-templates/blob/cbdb2356f9bea5a927c73f1671bb09e930153e41/docker/services/nova-api.yaml#L213-L227 [3] https://github.com/openstack/tripleo-heat-templates/blob/cbdb2356f9bea5a927c73f1671bb09e930153e41/docker/services/nova-placement.yaml#L122-L150 *** Bug 1623613 has been marked as a duplicate of this bug. *** Environment: openstack-nova-common-18.0.1-0.20180922200406.03188f7.el7ost.noarch openstack-nova-migration-18.0.1-0.20180922200406.03188f7.el7ost.noarch openstack-nova-compute-18.0.1-0.20180922200406.03188f7.el7ost.noarch If after OC deployment we restart nova_placement on all controllers, then the instance is launched successfully. Without restarting nova_placement, the instance gets status ERROR: (overcloud) [stack@undercloud ~]$ openstack server list +--------------------------------------+--------------+--------+----------+--------+---------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+--------------+--------+----------+--------+---------+ | 5522d013-9f89-4f88-a529-6ebb78265e6c | after_deploy | ERROR | | cirros | m1.tiny | +--------------------------------------+--------------+--------+----------+--------+---------+ Looking in nova-compute.log on compute node: 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager [req-b7e4491f-0666-47e2-9abc-c7f1963077ad - - - - -] Error updating resources for node overcloud-cmp-leaf0-0.localdomain.: ResourceProviderCreationFailed: Failed to create resource provider overcloud-cmp-leaf0-0.localdomain 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager Traceback (most recent call last): 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7736, in _update_available_resource_for_node 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager rt.update_available_resource(context, nodename) 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 724, in update_available_resource 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager self._update_available_resource(context, resources) 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 274, in inner 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager return f(*args, **kwargs) 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 747, in _update_available_resource 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager self._init_compute_node(context, resources) 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 564, in _init_compute_node 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager self._update(context, cn) 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/retrying.py", line 68, in wrapped_f 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager return Retrying(*dargs, **dkw).call(f, *args, **kw) 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/retrying.py", line 223, in call 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager return attempt.get(self._wrap_exception) 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/retrying.py", line 261, in get 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager six.reraise(self.value[0], self.value[1], self.value[2]) 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/retrying.py", line 217, in call 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager attempt = Attempt(fn(*args, **kwargs), attempt_number, False) 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 959, in _update 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager self._update_to_placement(context, compute_node) 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 915, in _update_to_placement 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager context, compute_node.uuid, name=compute_node.hypervisor_hostname) 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 37, in __run_method 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager return getattr(self.instance, __name)(*args, **kwargs) 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 1006, in get_provider_tree_and_ensure_root 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager parent_provider_uuid=parent_provider_uuid) 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 672, in _ensure_resource_provider 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager parent_provider_uuid=parent_provider_uuid) 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 74, in wrapper 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager return f(self, *a, **k) 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 606, in _create_resource_provider 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager raise exception.ResourceProviderCreationFailed(name=name) 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager ResourceProviderCreationFailed: Failed to create resource provider overcloud-cmp-leaf0-0.localdomain 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager (In reply to Alexander Chuzhoy from comment #20) > Environment: > openstack-nova-common-18.0.1-0.20180922200406.03188f7.el7ost.noarch > openstack-nova-migration-18.0.1-0.20180922200406.03188f7.el7ost.noarch > openstack-nova-compute-18.0.1-0.20180922200406.03188f7.el7ost.noarch > > > If after OC deployment we restart nova_placement on all controllers, then > the instance is launched successfully. > > Without restarting nova_placement, the instance gets status ERROR: > (overcloud) [stack@undercloud ~]$ openstack server list > +--------------------------------------+--------------+--------+----------+-- > ------+---------+ > | ID | Name | Status | Networks | > Image | Flavor | > +--------------------------------------+--------------+--------+----------+-- > ------+---------+ > | 5522d013-9f89-4f88-a529-6ebb78265e6c | after_deploy | ERROR | | > cirros | m1.tiny | > +--------------------------------------+--------------+--------+----------+-- > ------+---------+ > > > Looking in nova-compute.log on compute node: > > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager > [req-b7e4491f-0666-47e2-9abc-c7f1963077ad - - - - -] Error updating > resources for node overcloud-cmp-leaf0-0.localdomain.: > ResourceProviderCreationFailed: Failed to create resource provider > overcloud-cmp-leaf0-0.localdomain > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager Traceback (most recent > call last): > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager File > "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7736, in > _update_available_resource_for_node > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager > rt.update_available_resource(context, nodename) > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager File > "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line > 724, in update_available_resource > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager > self._update_available_resource(context, resources) > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager File > "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 274, > in inner > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager return f(*args, > **kwargs) > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager File > "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line > 747, in _update_available_resource > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager > self._init_compute_node(context, resources) > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager File > "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line > 564, in _init_compute_node > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager > self._update(context, cn) > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager File > "/usr/lib/python2.7/site-packages/retrying.py", line 68, in wrapped_f > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager return > Retrying(*dargs, **dkw).call(f, *args, **kw) > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager File > "/usr/lib/python2.7/site-packages/retrying.py", line 223, in call > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager return > attempt.get(self._wrap_exception) > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager File > "/usr/lib/python2.7/site-packages/retrying.py", line 261, in get > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager > six.reraise(self.value[0], self.value[1], self.value[2]) > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager File > "/usr/lib/python2.7/site-packages/retrying.py", line 217, in call > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager attempt = > Attempt(fn(*args, **kwargs), attempt_number, False) > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager File > "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line > 959, in _update > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager > self._update_to_placement(context, compute_node) > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager File > "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line > 915, in _update_to_placement > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager context, > compute_node.uuid, name=compute_node.hypervisor_hostname) > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager File > "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line > 37, in __run_method > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager return > getattr(self.instance, __name)(*args, **kwargs) > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager File > "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line > 1006, in get_provider_tree_and_ensure_root > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager > parent_provider_uuid=parent_provider_uuid) > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager File > "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line > 672, in _ensure_resource_provider > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager > parent_provider_uuid=parent_provider_uuid) > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager File > "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 74, > in wrapper > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager return f(self, *a, > **k) > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager File > "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line > 606, in _create_resource_provider > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager raise > exception.ResourceProviderCreationFailed(name=name) > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager > ResourceProviderCreationFailed: Failed to create resource provider > overcloud-cmp-leaf0-0.localdomain > 2018-10-09 22:30:46.712 1 ERROR nova.compute.manager Can you also provide the placement logs? Created attachment 1492604 [details]
placement_wsgi_error logs
Thanks for the logs, so we've worked past one issue and are now hitting another: 46087 [Wed Oct 10 15:10:49.433284 2018] [:error] [pid 14] [remote 172.25.0.10:208] mod_wsgi (pid=14): Target WSGI script '/var/www/cgi-bin/nova/nova-placement-api' cannot be loaded as Python module. 46088 [Wed Oct 10 15:10:49.433305 2018] [:error] [pid 14] [remote 172.25.0.10:208] mod_wsgi (pid=14): Exception occurred processing WSGI script '/var/www/cgi-bin/nova/nova-placement-api'. 46089 [Wed Oct 10 15:10:49.433320 2018] [:error] [pid 14] [remote 172.25.0.10:208] Traceback (most recent call last): 46090 [Wed Oct 10 15:10:49.433333 2018] [:error] [pid 14] [remote 172.25.0.10:208] File "/var/www/cgi-bin/nova/nova-placement-api", line 54, in <module> 46091 [Wed Oct 10 15:10:49.433354 2018] [:error] [pid 14] [remote 172.25.0.10:208] application = init_application() 46092 [Wed Oct 10 15:10:49.433361 2018] [:error] [pid 14] [remote 172.25.0.10:208] File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/wsgi.py", line 108, in init_application 46093 [Wed Oct 10 15:10:49.433386 2018] [:error] [pid 14] [remote 172.25.0.10:208] db_api.configure(conf.CONF) 46094 [Wed Oct 10 15:10:49.433392 2018] [:error] [pid 14] [remote 172.25.0.10:208] File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/db_api.py", line 35, in configure 46095 [Wed Oct 10 15:10:49.433403 2018] [:error] [pid 14] [remote 172.25.0.10:208] **_get_db_conf(conf.placement_database)) 46096 [Wed Oct 10 15:10:49.433408 2018] [:error] [pid 14] [remote 172.25.0.10:208] File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 788, in configure 46097 [Wed Oct 10 15:10:49.433420 2018] [:error] [pid 14] [remote 172.25.0.10:208] self._factory.configure(**kw) 46098 [Wed Oct 10 15:10:49.433425 2018] [:error] [pid 14] [remote 172.25.0.10:208] File "/usr/lib/python2.7/site-packages/debtcollector/renames.py", line 43, in decorator 46099 [Wed Oct 10 15:10:49.433435 2018] [:error] [pid 14] [remote 172.25.0.10:208] return wrapped(*args, **kwargs) 46100 [Wed Oct 10 15:10:49.433440 2018] [:error] [pid 14] [remote 172.25.0.10:208] File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 312, in configure 46101 [Wed Oct 10 15:10:49.433449 2018] [:error] [pid 14] [remote 172.25.0.10:208] self._configure(False, kw) 46102 [Wed Oct 10 15:10:49.433453 2018] [:error] [pid 14] [remote 172.25.0.10:208] File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 317, in _configure 46103 [Wed Oct 10 15:10:49.433462 2018] [:error] [pid 14] [remote 172.25.0.10:208] raise TypeError("this TransactionFactory is already started") 46104 [Wed Oct 10 15:10:49.433473 2018] [:error] [pid 14] [remote 172.25.0.10:208] TypeError: this TransactionFactory is already started I face exactly the same issue with Queens, too. Environment: openstack-nova-common-17.0.7-0.20181005101004.a48c112.el7.noarch openstack-nova-migration-17.0.7-0.20181005101004.a48c112.el7.noarch openstack-nova-compute-17.0.7-0.20181005101004.a48c112.el7.noarch python2-novaclient-10.1.0-0.20180605113302.ed73aad.el7.noarch python-nova-17.0.7-0.20181005101004.a48c112.el7.noarch puppet-nova-12.4.1-0.20181004113806.5d25320.el7.noarch openstack-tripleo-heat-templates-8.0.8-0.20181007022806.28100e8.el7.noarch instack-undercloud-8.4.7-0.20181004133804.e68034f.el7.noarch Logs: nova-compute.log and placement_wsgi_error.log are the same as above. Additional info: My cluster used 3 controllers and was deployed with controller HA and fencing enabled. VM launch failures would continue until the nova_placement container became unhealthy and was given a restart. *** Bug 1635665 has been marked as a duplicate of this bug. *** I think placement needs to create itself a new transaction context manager when calling configure. I'll propose something upstream. i have a patch that is related to this https://review.openstack.org/#/c/610034/ i believe it will correct the type error but i dont know if its sufficent to resolve the issue entirely. *** Bug 1641877 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:0045 |