Bug 1651475

Summary: [OSP14] track revert downstream-only - placement API fail to start if nova-manage api_db sync not finished
Product: Red Hat OpenStack Reporter: Martin Schuppert <mschuppe>
Component: openstack-tripleo-heat-templatesAssignee: Martin Schuppert <mschuppe>
Status: CLOSED DUPLICATE QA Contact: Gurenko Alex <agurenko>
Severity: medium Docs Contact:
Priority: medium    
Version: 14.0 (Rocky)CC: agurenko, berrange, dasmith, egallen, eglynn, fhubik, jhakimra, kchamart, lyarwood, mbooth, mburns, nova-maint, ohochman, rscarazz, sasha, sbauza, sgordon, vromanso
Target Milestone: rcKeywords: Triaged, ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1636399 Environment:
Last Closed: 2018-11-27 07:18:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Martin Schuppert 2018-11-20 08:01:09 UTC
+++ This bug was initially created as a clone of Bug #1636399 +++

Using this bug to track the deployment race aspect of this issue.


+++ This bug was initially created as a clone of Bug #1630069 +++

Unable to launch instance after deployment.
Gets stuck in BUILD. 
nova-compute shows:
ResourceProviderSyncFailed: Failed to synchronize the placement service with resource provider information supplied by the compute host.

Environment:
openstack-nova-common-18.0.1-0.20180830232802.43f6f74.el7ost.noarch
python2-novaclient-11.0.0-0.20180809174649.f1005ce.el7ost.noarch
openstack-nova-migration-18.0.1-0.20180830232802.43f6f74.el7ost.noarch
puppet-nova-13.3.1-0.20180831195236.ce0efbe.el7ost.noarch
python-nova-18.0.1-0.20180830232802.43f6f74.el7ost.noarch
openstack-nova-compute-18.0.1-0.20180830232802.43f6f74.el7ost.noarch
openstack-tripleo-heat-templates-9.0.0-0.20180831204457.17bb71e.0rc1.el7ost
instack-undercloud-9.3.1-0.20180831000258.e464799.el7ost.noarch

Steps to reproduce:

Deployed OC on BM setup.
Try to launch an instance.

Result:
The instance gets stuck:
(overcloud) [stack@undercloud ~]$ openstack server list
+--------------------------------------+--------------+--------+----------+--------+---------+
| ID                                   | Name         | Status | Networks | Image  | Flavor  |
+--------------------------------------+--------------+--------+----------+--------+---------+
| ecf1d7c4-8b03-4528-96b5-ddc3a21b2558 | after_deploy | BUILD  |          | cirros | m1.tiny |
+--------------------------------------+--------------+--------+----------+--------+---------+



<h1>Internal Server Error</h1>
<p>The server encountered an internal error or
misconfiguration and was unable to complete
your request.</p>
<p>Please contact the server administrator at 
 [no address given] to inform them of the time this error occurred,
 and the actions you performed just before this error.</p>
<p>More information about this error may be available
in the server error log.</p>
</body></html>
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager [req-268c54e7-9f4f-459f-84f1-5c1f3ab41746 - - - - -] Error updating resources for node overcloud-compute-0.localdomain.: ResourceProvider
SyncFailed: Failed to synchronize the placement service with resource provider information supplied by the compute host.
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager Traceback (most recent call last):
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7722, in _update_available_resource_for_node
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager     rt.update_available_resource(context, nodename)
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 703, in update_available_resource
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager     self._update_available_resource(context, resources)
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 274, in inner
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager     return f(*args, **kwargs)
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 726, in _update_available_resource
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager     self._init_compute_node(context, resources)
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 593, in _init_compute_node
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager     self._update(context, cn)
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/retrying.py", line 68, in wrapped_f
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager     return Retrying(*dargs, **dkw).call(f, *args, **kw)
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/retrying.py", line 223, in call
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager     return attempt.get(self._wrap_exception)
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/retrying.py", line 261, in get
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager     six.reraise(self.value[0], self.value[1], self.value[2])
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/retrying.py", line 217, in call
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager     attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 938, in _update
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager     self._update_to_placement(context, compute_node)
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 907, in _update_to_placement
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager     reportclient.update_from_provider_tree(context, prov_tree)
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 37, in __run_method
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager     return getattr(self.instance, __name)(*args, **kwargs)
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 1535, in update_from_provider_tree
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager     raise exception.ResourceProviderSyncFailed()
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager ResourceProviderSyncFailed: Failed to synchronize the placement service with resource provider information supplied by the compute host.
2018-09-17 19:47:18.271 1 ERROR nova.compute.manager 



Repeating error in /var/log/containers/httpd/nova-placement/placement_wsgi_error.log on controllers:
[Mon Sep 17 19:37:25.276009 2018] [:error] [pid 16] [remote 10.19.94.28:80] mod_wsgi (pid=16): Exception occurred processing WSGI script '/var/www/cgi-bin/nova/nova-placement-api'.
[Mon Sep 17 19:37:25.276033 2018] [:error] [pid 16] [remote 10.19.94.28:80] Traceback (most recent call last):
[Mon Sep 17 19:37:25.276055 2018] [:error] [pid 16] [remote 10.19.94.28:80]   File "/var/www/cgi-bin/nova/nova-placement-api", line 54, in <module>
[Mon Sep 17 19:37:25.276092 2018] [:error] [pid 16] [remote 10.19.94.28:80]     application = init_application()
[Mon Sep 17 19:37:25.276105 2018] [:error] [pid 16] [remote 10.19.94.28:80]   File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/wsgi.py", line 102, in init_application
[Mon Sep 17 19:37:25.276132 2018] [:error] [pid 16] [remote 10.19.94.28:80]     _parse_args([], default_config_files=[conffile])
[Mon Sep 17 19:37:25.276144 2018] [:error] [pid 16] [remote 10.19.94.28:80]   File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/wsgi.py", line 62, in _parse_args
[Mon Sep 17 19:37:25.276164 2018] [:error] [pid 16] [remote 10.19.94.28:80]     logging.register_options(conf.CONF)
[Mon Sep 17 19:37:25.276174 2018] [:error] [pid 16] [remote 10.19.94.28:80]   File "/usr/lib/python2.7/site-packages/oslo_log/log.py", line 250, in register_options
[Mon Sep 17 19:37:25.276195 2018] [:error] [pid 16] [remote 10.19.94.28:80]     conf.register_cli_opts(_options.common_cli_opts)
[Mon Sep 17 19:37:25.276209 2018] [:error] [pid 16] [remote 10.19.94.28:80]   File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2501, in __inner
[Mon Sep 17 19:37:25.276927 2018] [:error] [pid 16] [remote 10.19.94.28:80]     result = f(self, *args, **kwargs)
[Mon Sep 17 19:37:25.276949 2018] [:error] [pid 16] [remote 10.19.94.28:80]   File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2783, in register_cli_opts
[Mon Sep 17 19:37:25.276974 2018] [:error] [pid 16] [remote 10.19.94.28:80]     self.register_cli_opt(opt, group, clear_cache=False)
[Mon Sep 17 19:37:25.276985 2018] [:error] [pid 16] [remote 10.19.94.28:80]   File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2505, in __inner
[Mon Sep 17 19:37:25.277004 2018] [:error] [pid 16] [remote 10.19.94.28:80]     return f(self, *args, **kwargs)
[Mon Sep 17 19:37:25.277014 2018] [:error] [pid 16] [remote 10.19.94.28:80]   File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2775, in register_cli_opt
[Mon Sep 17 19:37:25.277032 2018] [:error] [pid 16] [remote 10.19.94.28:80]     raise ArgsAlreadyParsedError("cannot register CLI option")
[Mon Sep 17 19:37:25.277057 2018] [:error] [pid 16] [remote 10.19.94.28:80] ArgsAlreadyParsedError: arguments already parsed: cannot register CLI option
[Mon Sep 17 19:38:55.686870 2018] [:error] [pid 19] [remote 10.19.94.28:88] mod_wsgi (pid=19): Target WSGI script '/var/www/cgi-bin/nova/nova-placement-api' cannot be loaded as Python module.


--- Additional comment from Martin Schuppert on 2018-10-25 14:23:22 EDT ---

Since there are ongoing upstream RDO ci issues and we could not land [1] yet, we added a temporary down-stream only patch to unblock the situation:

During overcloud deploy we could see on ctrl1 + 2 where the db_sync is not running something like:

[root@controller-1 ~]# grep  INFO:nova_wait_for_db_sync /var/log/messages                                                                                                                                                                                                                 
Oct 25 13:32:48 controller-1 journal: INFO:nova_wait_for_db_sync:Nova API DB sync not yet finished. Migrationnumber DB/files (0/61)
Oct 25 13:32:58 controller-1 journal: INFO:nova_wait_for_db_sync:Nova API DB sync not yet finished. Migrationnumber DB/files (13/61)
Oct 25 13:33:08 controller-1 journal: INFO:nova_wait_for_db_sync:Nova API DB sync finished. Migration number 61

[root@controller-2 ~]# grep  INFO:nova_wait_for_db_sync /var/log/messages
Oct 25 13:32:50 controller-2 journal: INFO:nova_wait_for_db_sync:Nova API DB sync not yet finished. Migrationnumber DB/files (0/61)
Oct 25 13:33:00 controller-2 journal: INFO:nova_wait_for_db_sync:Nova API DB sync not yet finished. Migrationnumber DB/files (35/61)
Oct 25 13:33:10 controller-2 journal: INFO:nova_wait_for_db_sync:Nova API DB sync finished. Migration number 61


This BZ is to track the revert of the downstream-only patch