After reboot unable to start nova instance: ERROR nova.compute.manager ResourceProviderRetrievalFailed: Failed to get resource provider with UUID <uuid> Environment: instack-undercloud-9.2.1-0.20180803181448.be5fa97.el7ost.noarch python2-ironic-inspector-client-3.3.0-0.20180731012520.53bf4e8.el7ost.noarch puppet-ironic-13.2.0-0.20180730212733.387fffa.el7ost.noarch python2-ironicclient-2.5.0-0.20180731010529.fb94fb8.el7ost.noarch puppet-nova-13.2.0-0.20180802212242.dd0cc66.el7ost.noarch python2-novaclient-11.0.0-0.20180731010157.f1005ce.el7ost.noarch Steps to reproduce: 1. Deploy OC with ironic enabled. 2. Add nodes in OC 3. Switch the nodes to clean through cleaning 4. launch instances 5. remove instances. 6. reboot the setup 7. attempt to launch an instance Result: (overcloud) [stack@undercloud-0 ~]$ openstack server list +--------------------------------------+-----------+--------+----------+----------------+-----------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+-----------+--------+----------+----------------+-----------+ | 3bbe2cb1-bd0b-4fcb-afb2-22ed00f1bd4e | instance1 | ERROR | | overcloud-full | baremetal | +--------------------------------------+-----------+--------+----------+----------------+-----------+ nova compute log shows: 2018-08-29 18:03:30.391 1 ERROR nova.scheduler.client.report [req-c838d765-d7ea-408b-a90a-88c6ec26691a - - - - -] [None] Failed to retrieve resource provider tree from placement API for UUID 5a0dd0a2-0d57-4828-b070-f65da86d922d. Got 500: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager [req-c838d765-d7ea-408b-a90a-88c6ec26691a - - - - -] Error updating resources for node 5a0dd0a2-0d57-4828-b070-f65da86d922d.: ResourceProviderRetrievalFailed: Failed to get resource provider with UUID 5a0dd0a2-0d57-4828-b070-f65da86d922d 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager Traceback (most recent call last): 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7717, in _update_available_resource_for_node 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager rt.update_available_resource(context, nodename) 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 703, in update_available_resource 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager self._update_available_resource(context, resources) 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 274, in inner 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager return f(*args, **kwargs) 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 726, in _update_available_resource 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager self._init_compute_node(context, resources) 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 564, in _init_compute_node 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager self._update(context, cn) 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/retrying.py", line 68, in wrapped_f 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager return Retrying(*dargs, **dkw).call(f, *args, **kw) 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/retrying.py", line 223, in call 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager return attempt.get(self._wrap_exception) 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/retrying.py", line 261, in get 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager six.reraise(self.value[0], self.value[1], self.value[2]) 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/retrying.py", line 217, in call 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager attempt = Attempt(fn(*args, **kwargs), attempt_number, False) 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 938, in _update 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager self._update_to_placement(context, compute_node) 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 894, in _update_to_placement 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager context, compute_node.uuid, name=compute_node.hypervisor_hostname) 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 37, in __run_method 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager return getattr(self.instance, __name)(*args, **kwargs) 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 1006, in get_provider_tree_and_ensure_root 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager parent_provider_uuid=parent_provider_uuid) 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 668, in _ensure_resource_provider 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager rps_to_refresh = self._get_providers_in_tree(context, uuid) 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 74, in wrapper 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager return f(self, *a, **k) 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 535, in _get_providers_in_tree 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager raise exception.ResourceProviderRetrievalFailed(uuid=uuid) 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager ResourceProviderRetrievalFailed: Failed to get resource provider with UUID 5a0dd0a2-0d57-4828-b070-f65da86d922d 2018-08-29 18:03:30.392 1 ERROR nova.compute.manager nova scheduler shows: 2018-08-29 17:25:01.826 1 ERROR nova Traceback (most recent call last): 2018-08-29 17:25:01.826 1 ERROR nova File "/usr/bin/nova-scheduler", line 10, in <module> 2018-08-29 17:25:01.826 1 ERROR nova sys.exit(main()) 2018-08-29 17:25:01.826 1 ERROR nova File "/usr/lib/python2.7/site-packages/nova/cmd/scheduler.py", line 46, in main 2018-08-29 17:25:01.826 1 ERROR nova topic=scheduler_rpcapi.RPC_TOPIC) 2018-08-29 17:25:01.826 1 ERROR nova File "/usr/lib/python2.7/site-packages/nova/service.py", line 259, in create 2018-08-29 17:25:01.826 1 ERROR nova periodic_interval_max=periodic_interval_max) 2018-08-29 17:25:01.826 1 ERROR nova File "/usr/lib/python2.7/site-packages/nova/service.py", line 128, in __init__ 2018-08-29 17:25:01.826 1 ERROR nova self.manager = manager_class(host=self.host, *args, **kwargs) 2018-08-29 17:25:01.826 1 ERROR nova File "/usr/lib/python2.7/site-packages/nova/scheduler/manager.py", line 63, in __init__ 2018-08-29 17:25:01.826 1 ERROR nova invoke_on_load=True).driver 2018-08-29 17:25:01.826 1 ERROR nova File "/usr/lib/python2.7/site-packages/stevedore/driver.py", line 61, in __init__ 2018-08-29 17:25:01.826 1 ERROR nova warn_on_missing_entrypoint=warn_on_missing_entrypoint 2018-08-29 17:25:01.826 1 ERROR nova File "/usr/lib/python2.7/site-packages/stevedore/named.py", line 81, in __init__ 2018-08-29 17:25:01.826 1 ERROR nova verify_requirements) 2018-08-29 17:25:01.826 1 ERROR nova File "/usr/lib/python2.7/site-packages/stevedore/extension.py", line 203, in _load_plugins 2018-08-29 17:25:01.826 1 ERROR nova self._on_load_failure_callback(self, ep, err) 2018-08-29 17:25:01.826 1 ERROR nova File "/usr/lib/python2.7/site-packages/stevedore/extension.py", line 195, in _load_plugins 2018-08-29 17:25:01.826 1 ERROR nova verify_requirements, 2018-08-29 17:25:01.826 1 ERROR nova File "/usr/lib/python2.7/site-packages/stevedore/named.py", line 158, in _load_one_plugin 2018-08-29 17:25:01.826 1 ERROR nova verify_requirements, 2018-08-29 17:25:01.826 1 ERROR nova File "/usr/lib/python2.7/site-packages/stevedore/extension.py", line 227, in _load_one_plugin 2018-08-29 17:25:01.826 1 ERROR nova obj = plugin(*invoke_args, **invoke_kwds) 2018-08-29 17:25:01.826 1 ERROR nova File "/usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py", line 43, in __init__ 2018-08-29 17:25:01.826 1 ERROR nova super(FilterScheduler, self).__init__(*args, **kwargs) 2018-08-29 17:25:01.826 1 ERROR nova File "/usr/lib/python2.7/site-packages/nova/scheduler/driver.py", line 42, in __init__ 2018-08-29 17:25:01.826 1 ERROR nova self.host_manager = host_manager.HostManager() 2018-08-29 17:25:01.826 1 ERROR nova File "/usr/lib/python2.7/site-packages/nova/scheduler/host_manager.py", line 345, in __init__ 2018-08-29 17:25:01.826 1 ERROR nova self.refresh_cells_caches() 2018-08-29 17:25:01.826 1 ERROR nova File "/usr/lib/python2.7/site-packages/nova/scheduler/host_manager.py", line 646, in refresh_cells_caches 2018-08-29 17:25:01.826 1 ERROR nova temp_cells = objects.CellMappingList.get_all(context) 2018-08-29 17:25:01.826 1 ERROR nova File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 184, in wrapper 2018-08-29 17:25:01.826 1 ERROR nova result = fn(cls, context, *args, **kwargs) 2018-08-29 17:25:01.826 1 ERROR nova File "/usr/lib/python2.7/site-packages/nova/objects/cell_mapping.py", line 255, in get_all 2018-08-29 17:25:01.826 1 ERROR nova db_mappings = cls._get_all_from_db(context) 2018-08-29 17:25:01.826 1 ERROR nova File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 992, in wrapper 2018-08-29 17:25:01.826 1 ERROR nova with self._transaction_scope(context): 2018-08-29 17:25:01.826 1 ERROR nova File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__ 2018-08-29 17:25:01.826 1 ERROR nova return self.gen.next() 2018-08-29 17:25:01.826 1 ERROR nova File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 1042, in _transaction_scope 2018-08-29 17:25:01.826 1 ERROR nova context=context) as resource: 2018-08-29 17:25:01.826 1 ERROR nova File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__ 2018-08-29 17:25:01.826 1 ERROR nova return self.gen.next() 2018-08-29 17:25:01.826 1 ERROR nova File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 645, in _session 2018-08-29 17:25:01.826 1 ERROR nova bind=self.connection, mode=self.mode) 2018-08-29 17:25:01.826 1 ERROR nova File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 409, in _create_session 2018-08-29 17:25:01.826 1 ERROR nova self._start() 2018-08-29 17:25:01.826 1 ERROR nova File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 496, in _start 2018-08-29 17:25:01.826 1 ERROR nova engine_args, maker_args) 2018-08-29 17:25:01.826 1 ERROR nova File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 520, in _setup_for_connection 2018-08-29 17:25:01.826 1 ERROR nova sql_connection=sql_connection, **engine_kwargs) 2018-08-29 17:25:01.826 1 ERROR nova File "/usr/lib/python2.7/site-packages/debtcollector/renames.py", line 43, in decorator 2018-08-29 17:25:01.826 1 ERROR nova return wrapped(*args, **kwargs) 2018-08-29 17:25:01.826 1 ERROR nova File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/engines.py", line 202, in create_engine 2018-08-29 17:25:01.826 1 ERROR nova test_conn = _test_connection(engine, max_retries, retry_interval) 2018-08-29 17:25:01.826 1 ERROR nova File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/engines.py", line 388, in _test_connection 2018-08-29 17:25:01.826 1 ERROR nova six.reraise(type(de_ref), de_ref) 2018-08-29 17:25:01.826 1 ERROR nova File "<string>", line 2, in reraise 2018-08-29 17:25:01.826 1 ERROR nova DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8) 2018-08-29 17:25:01.826 1 ERROR nova
Reproduced the failure to launch instance post reboot. This time there was another error in nova-compute log: 2018-08-29 21:55:58.944 1 ERROR nova.compute.manager [req-31a2c0cd-27ff-459d-8260-6696eedbd674 8680faae02f14a7388e8f73be7a6a0b4 08904f9f318e44e9800f328c03437553 - default default] [instance: b88aecd9-a91f-4750-a450-0ba1fa71688d] Setting instance vm_state to ERROR: ResourceProviderSyncFailed: Failed to synchronize the placement service with resource provider information supplied by the compute host. 2018-08-29 21:55:58.944 1 ERROR nova.compute.manager [instance: b88aecd9-a91f-4750-a450-0ba1fa71688d] Traceback (most recent call last): 2018-08-29 21:55:58.944 1 ERROR nova.compute.manager [instance: b88aecd9-a91f-4750-a450-0ba1fa71688d] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2660, in do_terminate_instance 2018-08-29 21:55:58.944 1 ERROR nova.compute.manager [instance: b88aecd9-a91f-4750-a450-0ba1fa71688d] self._delete_instance(context, instance, bdms) 2018-08-29 21:55:58.944 1 ERROR nova.compute.manager [instance: b88aecd9-a91f-4750-a450-0ba1fa71688d] File "/usr/lib/python2.7/site-packages/nova/hooks.py", line 154, in inner 2018-08-29 21:55:58.944 1 ERROR nova.compute.manager [instance: b88aecd9-a91f-4750-a450-0ba1fa71688d] rv = f(*args, **kwargs) 2018-08-29 21:55:58.944 1 ERROR nova.compute.manager [instance: b88aecd9-a91f-4750-a450-0ba1fa71688d] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2633, in _delete_instance 2018-08-29 21:55:58.944 1 ERROR nova.compute.manager [instance: b88aecd9-a91f-4750-a450-0ba1fa71688d] bdms) 2018-08-29 21:55:58.944 1 ERROR nova.compute.manager [instance: b88aecd9-a91f-4750-a450-0ba1fa71688d] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 756, in _complete_deletion 2018-08-29 21:55:58.944 1 ERROR nova.compute.manager [instance: b88aecd9-a91f-4750-a450-0ba1fa71688d] self._update_resource_tracker(context, instance) 2018-08-29 21:55:58.944 1 ERROR nova.compute.manager [instance: b88aecd9-a91f-4750-a450-0ba1fa71688d] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 564, in _update_resource_tracker 2018-08-29 21:55:58.944 1 ERROR nova.compute.manager [instance: b88aecd9-a91f-4750-a450-0ba1fa71688d] rt.update_usage(context, instance, instance.node) 2018-08-29 21:55:58.944 1 ERROR nova.compute.manager [instance: b88aecd9-a91f-4750-a450-0ba1fa71688d] File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 274, in inner 2018-08-29 21:55:58.944 1 ERROR nova.compute.manager [instance: b88aecd9-a91f-4750-a450-0ba1fa71688d] return f(*args, **kwargs) 2018-08-29 21:55:58.944 1 ERROR nova.compute.manager [instance: b88aecd9-a91f-4750-a450-0ba1fa71688d] File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 492, in update_usage 2018-08-29 21:55:58.944 1 ERROR nova.compute.manager [instance: b88aecd9-a91f-4750-a450-0ba1fa71688d] self._update(context.elevated(), self.compute_nodes[nodename]) 2018-08-29 21:55:58.944 1 ERROR nova.compute.manager [instance: b88aecd9-a91f-4750-a450-0ba1fa71688d] File "/usr/lib/python2.7/site-packages/retrying.py", line 68, in wrapped_f 2018-08-29 21:55:58.944 1 ERROR nova.compute.manager [instance: b88aecd9-a91f-4750-a450-0ba1fa71688d] return Retrying(*dargs, **dkw).call(f, *args, **kw) 2018-08-29 21:55:58.944 1 ERROR nova.compute.manager [instance: b88aecd9-a91f-4750-a450-0ba1fa71688d] File "/usr/lib/python2.7/site-packages/retrying.py", line 223, in call 2018-08-29 21:55:58.944 1 ERROR nova.compute.manager [instance: b88aecd9-a91f-4750-a450-0ba1fa71688d] return attempt.get(self._wrap_exception) 2018-08-29 21:55:58.944 1 ERROR nova.compute.manager [instance: b88aecd9-a91f-4750-a450-0ba1fa71688d] File "/usr/lib/python2.7/site-packages/retrying.py", line 261, in get 2018-08-29 21:55:58.944 1 ERROR nova.compute.manager [instance: b88aecd9-a91f-4750-a450-0ba1fa71688d] six.reraise(self.value[0], self.value[1], self.value[2]) 2018-08-29 21:55:58.944 1 ERROR nova.compute.manager [instance: b88aecd9-a91f-4750-a450-0ba1fa71688d] File "/usr/lib/python2.7/site-packages/retrying.py", line 217, in call 2018-08-29 21:55:58.944 1 ERROR nova.compute.manager [instance: b88aecd9-a91f-4750-a450-0ba1fa71688d] attempt = Attempt(fn(*args, **kwargs), attempt_number, False) 2018-08-29 21:55:58.944 1 ERROR nova.compute.manager [instance: b88aecd9-a91f-4750-a450-0ba1fa71688d] File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 938, in _update 2018-08-29 21:55:58.944 1 ERROR nova.compute.manager [instance: b88aecd9-a91f-4750-a450-0ba1fa71688d] self._update_to_placement(context, compute_node) 2018-08-29 21:55:58.944 1 ERROR nova.compute.manager [instance: b88aecd9-a91f-4750-a450-0ba1fa71688d] File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 907, in _update_to_placement 2018-08-29 21:55:58.944 1 ERROR nova.compute.manager [instance: b88aecd9-a91f-4750-a450-0ba1fa71688d] reportclient.update_from_provider_tree(context, prov_tree) 2018-08-29 21:55:58.944 1 ERROR nova.compute.manager [instance: b88aecd9-a91f-4750-a450-0ba1fa71688d] File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 37, in __run_method 2018-08-29 21:55:58.944 1 ERROR nova.compute.manager [instance: b88aecd9-a91f-4750-a450-0ba1fa71688d] return getattr(self.instance, __name)(*args, **kwargs) 2018-08-29 21:55:58.944 1 ERROR nova.compute.manager [instance: b88aecd9-a91f-4750-a450-0ba1fa71688d] File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 1531, in update_from_provider_tree 2018-08-29 21:55:58.944 1 ERROR nova.compute.manager [instance: b88aecd9-a91f-4750-a450-0ba1fa71688d] raise exception.ResourceProviderSyncFailed() 2018-08-29 21:55:58.944 1 ERROR nova.compute.manager [instance: b88aecd9-a91f-4750-a450-0ba1fa71688d] ResourceProviderSyncFailed: Failed to synchronize the placement service with resource provider information supplied by the compute host. 2018-08-29 21:55:58.944 1 ERROR nova.compute.manager [instance: b88aecd9-a91f-4750-a450-0ba1fa71688d]
grepping for errors in placement.log [root@controller-0 ~]# grep -i error /var/log/containers/nova/nova-placement-api.log 2018-08-30 16:44:22.026 14 DEBUG nova.api.openstack.placement.wsgi [-] logging_exception_prefix = %(asctime)s.%(msecs)03d %(process)d ERROR %(name)s %(instance)s log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:3023 2018-08-30 16:44:22.035 14 DEBUG nova.api.openstack.placement.wsgi [-] publish_errors = False log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:3023 2018-08-30 16:44:22.055 14 DEBUG nova.api.openstack.placement.wsgi [-] oslo_versionedobjects.fatal_exception_format_errors = False log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:3031 2018-08-30 16:44:22.208 14 DEBUG nova.api.openstack.placement.wsgi [-] profiler.filter_error_trace = False log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:3031 2018-08-30 16:44:22.269 14 DEBUG oslo_db.sqlalchemy.engines [-] MySQL server mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION _check_effective_sql_mode /usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/engines.py:308 2018-08-30 16:44:53.819 15 DEBUG nova.api.openstack.placement.wsgi [-] logging_exception_prefix = %(asctime)s.%(msecs)03d %(process)d ERROR %(name)s %(instance)s log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:3023 2018-08-30 16:44:53.828 15 DEBUG nova.api.openstack.placement.wsgi [-] publish_errors = False log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:3023 2018-08-30 16:44:53.847 15 DEBUG nova.api.openstack.placement.wsgi [-] oslo_versionedobjects.fatal_exception_format_errors = False log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:3031 2018-08-30 16:44:53.984 15 DEBUG nova.api.openstack.placement.wsgi [-] profiler.filter_error_trace = False log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:3031 2018-08-30 16:44:54.039 15 DEBUG oslo_db.sqlalchemy.engines [-] MySQL server mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION _check_effective_sql_mode /usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/engines.py:308 2018-08-30 17:02:33.055 15 DEBUG nova.api.openstack.placement.wsgi_wrapper [req-cec08d9d-efc4-4aa1-a480-a6b61465cac4 cb722cd9f95d49f5bfe8d1dace4e7be7 28dc3c7b75cc400287eaa9fac0a3bb08 - default default] Placement API returning an error response: Unable to update inventory for resource provider b73f4434-e4b8-4f33-a8f0-0e522065d5f3: Invalid inventory for 'VCPU' on resource provider 'b73f4434-e4b8-4f33-a8f0-0e522065d5f3'. The reserved value is greater than or equal to total. call_func /usr/lib/python2.7/site-packages/nova/api/openstack/placement/wsgi_wrapper.py:31 2018-08-30 17:02:33.236 15 DEBUG nova.api.openstack.placement.wsgi_wrapper [req-f9139d9c-f767-4268-adb3-a2325832f542 cb722cd9f95d49f5bfe8d1dace4e7be7 28dc3c7b75cc400287eaa9fac0a3bb08 - default default] Placement API returning an error response: Unable to update inventory for resource provider b1423ed9-eecc-4853-86e9-2dcc77ab21fd: Invalid inventory for 'VCPU' on resource provider 'b1423ed9-eecc-4853-86e9-2dcc77ab21fd'. The reserved value is greater than or equal to total. call_func /usr/lib/python2.7/site-packages/nova/api/openstack/placement/wsgi_wrapper.py:31 2018-08-30 17:14:45.335 15 DEBUG nova.api.openstack.placement.wsgi_wrapper [req-d8c27eec-2907-4acb-a6dc-83a6f4befce5 cb722cd9f95d49f5bfe8d1dace4e7be7 28dc3c7b75cc400287eaa9fac0a3bb08 - default default] Placement API returning an error response: Unable to update inventory for resource provider b1423ed9-eecc-4853-86e9-2dcc77ab21fd: Invalid inventory for 'VCPU' on resource provider 'b1423ed9-eecc-4853-86e9-2dcc77ab21fd'. The reserved value is greater than or equal to total. call_func /usr/lib/python2.7/site-packages/nova/api/openstack/placement/wsgi_wrapper.py:31 2018-08-30 17:15:45.157 14 DEBUG nova.api.openstack.placement.wsgi_wrapper [req-04f23fa8-db83-4bf9-925c-cb722903251a cb722cd9f95d49f5bfe8d1dace4e7be7 28dc3c7b75cc400287eaa9fac0a3bb08 - default default] Placement API returning an error response: Unable to update inventory for resource provider b1423ed9-eecc-4853-86e9-2dcc77ab21fd: Invalid inventory for 'VCPU' on resource provider 'b1423ed9-eecc-4853-86e9-2dcc77ab21fd'. The reserved value is greater than or equal to total. call_func /usr/lib/python2.7/site-packages/nova/api/openstack/placement/wsgi_wrapper.py:31 2018-08-30 17:25:40.669 15 DEBUG nova.api.openstack.placement.wsgi [-] logging_exception_prefix = %(asctime)s.%(msecs)03d %(process)d ERROR %(name)s %(instance)s log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:3023 2018-08-30 17:25:40.680 15 DEBUG nova.api.openstack.placement.wsgi [-] publish_errors = False log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:3023 2018-08-30 17:25:40.703 15 DEBUG nova.api.openstack.placement.wsgi [-] oslo_versionedobjects.fatal_exception_format_errors = False log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:3031 2018-08-30 17:25:40.887 15 DEBUG nova.api.openstack.placement.wsgi [-] profiler.filter_error_trace = False log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:3031 2018-08-30 17:25:41.771 15 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 10 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.17.1.12' ([Errno 113] No route to host)") (Background on this error at: http://sqlalche.me/e/e3q8) 2018-08-30 17:25:54.780 15 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 9 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.17.1.12' ([Errno 113] No route to host)") (Background on this error at: http://sqlalche.me/e/e3q8) 2018-08-30 17:26:04.786 15 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 8 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.17.1.12' ([Errno 111] Connection refused)") (Background on this error at: http://sqlalche.me/e/e3q8) 2018-08-30 17:26:14.797 15 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 7 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.17.1.12' ([Errno 111] Connection refused)") (Background on this error at: http://sqlalche.me/e/e3q8) 2018-08-30 17:26:21.043 14 DEBUG nova.api.openstack.placement.wsgi [-] logging_exception_prefix = %(asctime)s.%(msecs)03d %(process)d ERROR %(name)s %(instance)s log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:3023 2018-08-30 17:26:21.061 14 DEBUG nova.api.openstack.placement.wsgi [-] publish_errors = False log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:3023 2018-08-30 17:26:21.081 14 DEBUG nova.api.openstack.placement.wsgi [-] oslo_versionedobjects.fatal_exception_format_errors = False log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:3031 2018-08-30 17:26:21.258 14 DEBUG nova.api.openstack.placement.wsgi [-] profiler.filter_error_trace = False log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:3031 2018-08-30 17:26:21.306 14 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 10 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.17.1.12' ([Errno 111] Connection refused)") (Background on this error at: http://sqlalche.me/e/e3q8) 2018-08-30 17:26:24.809 15 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 6 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.17.1.12' ([Errno 111] Connection refused)") (Background on this error at: http://sqlalche.me/e/e3q8) 2018-08-30 17:26:31.326 14 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 9 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.17.1.12' ([Errno 111] Connection refused)") (Background on this error at: http://sqlalche.me/e/e3q8) 2018-08-30 17:26:34.821 15 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 5 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.17.1.12' ([Errno 111] Connection refused)") (Background on this error at: http://sqlalche.me/e/e3q8) 2018-08-30 17:26:41.344 14 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 8 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.17.1.12' ([Errno 111] Connection refused)") (Background on this error at: http://sqlalche.me/e/e3q8) 2018-08-30 17:26:44.832 15 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 4 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.17.1.12' ([Errno 111] Connection refused)") (Background on this error at: http://sqlalche.me/e/e3q8) 2018-08-30 17:26:51.356 14 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 7 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.17.1.12' ([Errno 111] Connection refused)") (Background on this error at: http://sqlalche.me/e/e3q8) 2018-08-30 17:26:54.843 15 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 3 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.17.1.12' ([Errno 111] Connection refused)") (Background on this error at: http://sqlalche.me/e/e3q8) 2018-08-30 17:27:01.367 14 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 6 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.17.1.12' ([Errno 111] Connection refused)") (Background on this error at: http://sqlalche.me/e/e3q8) 2018-08-30 17:27:04.853 15 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 2 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.17.1.12' ([Errno 111] Connection refused)") (Background on this error at: http://sqlalche.me/e/e3q8) 2018-08-30 17:27:11.378 14 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 5 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.17.1.12' ([Errno 111] Connection refused)") (Background on this error at: http://sqlalche.me/e/e3q8) 2018-08-30 17:27:14.860 15 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 1 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.17.1.12' ([Errno 111] Connection refused)") (Background on this error at: http://sqlalche.me/e/e3q8) 2018-08-30 17:27:21.390 14 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 4 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.17.1.12' ([Errno 111] Connection refused)") (Background on this error at: http://sqlalche.me/e/e3q8) 2018-08-30 17:27:31.402 14 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 3 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.17.1.12' ([Errno 111] Connection refused)") (Background on this error at: http://sqlalche.me/e/e3q8) 2018-08-30 17:27:41.410 14 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 2 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.17.1.12' ([Errno 111] Connection refused)") (Background on this error at: http://sqlalche.me/e/e3q8) 2018-08-30 17:27:51.421 14 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 1 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.17.1.12' ([Errno 111] Connection refused)") (Background on this error at: http://sqlalche.me/e/e3q8)
By looking at the logs, it's not something about a bug, rather a deployment issue where the DB wouldn't be runnning. Please check your deployment, test the API DB directly but that's not something Nova needs to be modified to fix it. That's why I'm closing this bug now.
I observed a similar problem, It seems that nova-placement is not working correctly after control plane reset (reset all the nodes) and coming back online. I have not had more time to debbug due to other issues but I only notice a traceback being thrown on all controllers in nova-placement/placement_wsgi_error.log: [Mon Sep 03 14:50:34.144509 2018] [:error] [pid 15] [remote 172.17.1.17:144] mod_wsgi (pid=15): Exception occurred processing WSGI script '/var/www/cgi-bin/nova/nova-placement-api'. [Mon Sep 03 14:50:34.144529 2018] [:error] [pid 15] [remote 172.17.1.17:144] Traceback (most recent call last): [Mon Sep 03 14:50:34.144560 2018] [:error] [pid 15] [remote 172.17.1.17:144] File "/var/www/cgi-bin/nova/nova-placement-api", line 54, in <module> [Mon Sep 03 14:50:34.144597 2018] [:error] [pid 15] [remote 172.17.1.17:144] application = init_application() [Mon Sep 03 14:50:34.144607 2018] [:error] [pid 15] [remote 172.17.1.17:144] File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/wsgi.py", line 97, in init_application [Mon Sep 03 14:50:34.144666 2018] [:error] [pid 15] [remote 172.17.1.17:144] _parse_args([], default_config_files=[conffile]) [Mon Sep 03 14:50:34.144678 2018] [:error] [pid 15] [remote 172.17.1.17:144] File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/wsgi.py", line 61, in _parse_args [Mon Sep 03 14:50:34.144695 2018] [:error] [pid 15] [remote 172.17.1.17:144] logging.register_options(conf.CONF) [Mon Sep 03 14:50:34.144703 2018] [:error] [pid 15] [remote 172.17.1.17:144] File "/usr/lib/python2.7/site-packages/oslo_log/log.py", line 250, in register_options [Mon Sep 03 14:50:34.144719 2018] [:error] [pid 15] [remote 172.17.1.17:144] conf.register_cli_opts(_options.common_cli_opts) [Mon Sep 03 14:50:34.144727 2018] [:error] [pid 15] [remote 172.17.1.17:144] File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2501, in __inner [Mon Sep 03 14:50:34.144741 2018] [:error] [pid 15] [remote 172.17.1.17:144] result = f(self, *args, **kwargs) [Mon Sep 03 14:50:34.144748 2018] [:error] [pid 15] [remote 172.17.1.17:144] File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2783, in register_cli_opts [Mon Sep 03 14:50:34.144761 2018] [:error] [pid 15] [remote 172.17.1.17:144] self.register_cli_opt(opt, group, clear_cache=False) [Mon Sep 03 14:50:34.144768 2018] [:error] [pid 15] [remote 172.17.1.17:144] File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2505, in __inner [Mon Sep 03 14:50:34.144779 2018] [:error] [pid 15] [remote 172.17.1.17:144] return f(self, *args, **kwargs) [Mon Sep 03 14:50:34.144786 2018] [:error] [pid 15] [remote 172.17.1.17:144] File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2775, in register_cli_opt [Mon Sep 03 14:50:34.144802 2018] [:error] [pid 15] [remote 172.17.1.17:144] raise ArgsAlreadyParsedError("cannot register CLI option") [Mon Sep 03 14:50:34.144833 2018] [:error] [pid 15] [remote 172.17.1.17:144] ArgsAlreadyParsedError: arguments already parsed: cannot register CLI option
I think the closing of this bug came out of comment#4 which had a bunch of maybe misleading references to the DB being unreachable. The problem here is not about erroring out when the DB is not there. It is about being able to recover automatically after a control-plane/DB outage. How to reproduce: Simply take down the control plane by powering off all three controlles, wait a couple of mins(?), and power them back on (or even more simply take down the galera db with 'pcs resource disabe galera-bundle', make sure all nodes are down and wait a few minutes for nova_placement to detect the missing db connection and then re-enable galera 'pcs resource enable galera-bundle') You will observe that all services eventually recover. As a matter of fact all services are mandated to recover and to retry to connect to the control plane since OSP10 (see https://github.com/openstack/tripleo-specs/blob/master/specs/newton/pacemaker-next-generation-architecture.rst for more information) One of the observed sequences in the logs is the following: 1) The control plane is gone (reboot/reset/db maintenance/whatever) and we correctly observe the following message. So far this is all expected and correct: [Fri Aug 31 15:19:05.987530 2018] [:error] [pid 22] [remote 172.17.1.16:0] mod_wsgi (pid=22): Exception occurred processing WSGI script '/var/www/cgi-bin/nova/nova-placement-api'. [Fri Aug 31 15:19:05.987577 2018] [:error] [pid 22] [remote 172.17.1.16:0] Traceback (most recent call last): [Fri Aug 31 15:19:05.987605 2018] [:error] [pid 22] [remote 172.17.1.16:0] File "/var/www/cgi-bin/nova/nova-placement-api", line 54, in <module> ... [Fri Aug 31 15:19:05.990710 2018] [:error] [pid 22] [remote 172.17.1.16:0] DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8) 2) The problem comes from the fact that once the control plane is fully functional again, all openstack services manage to come back and reconnect to the control plane (db/rabbit), whereas nova placement seems to be stuck in a wsgi loop forever: [Fri Aug 31 15:21:57.561423 2018] [:error] [pid 20] [remote 172.17.1.16:176] mod_wsgi (pid=20): Target WSGI script '/var/www/cgi-bin/nova/nova-placement-api' cannot be loaded as Python module. [Fri Aug 31 15:21:57.561532 2018] [:error] [pid 20] [remote 172.17.1.16:176] mod_wsgi (pid=20): Exception occurred processing WSGI script '/var/www/cgi-bin/nova/nova-placement-api'. [Fri Aug 31 15:21:57.561570 2018] [:error] [pid 20] [remote 172.17.1.16:176] Traceback (most recent call last): [Fri Aug 31 15:21:57.561601 2018] [:error] [pid 20] [remote 172.17.1.16:176] File "/var/www/cgi-bin/nova/nova-placement-api", line 54, in <module> [Fri Aug 31 15:21:57.561656 2018] [:error] [pid 20] [remote 172.17.1.16:176] application = init_application() [Fri Aug 31 15:21:57.561682 2018] [:error] [pid 20] [remote 172.17.1.16:176] File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/wsgi.py", line 97, in init_application [Fri Aug 31 15:21:57.561732 2018] [:error] [pid 20] [remote 172.17.1.16:176] _parse_args([], default_config_files=[conffile]) [Fri Aug 31 15:21:57.561751 2018] [:error] [pid 20] [remote 172.17.1.16:176] File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/wsgi.py", line 61, in _parse_args [Fri Aug 31 15:21:57.561786 2018] [:error] [pid 20] [remote 172.17.1.16:176] logging.register_options(conf.CONF) [Fri Aug 31 15:21:57.561814 2018] [:error] [pid 20] [remote 172.17.1.16:176] File "/usr/lib/python2.7/site-packages/oslo_log/log.py", line 250, in register_options [Fri Aug 31 15:21:57.561857 2018] [:error] [pid 20] [remote 172.17.1.16:176] conf.register_cli_opts(_options.common_cli_opts) [Fre way Aug 31 15:21:57.561891 2018] [:error] [pid 20] [remote 172.17.1.16:176] File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2501, in __inner [Fri Aug 31 15:21:57.563173 2018] [:error] [pid 20] [remote 172.17.1.16:176] result = f(self, *args, **kwargs) [Fri Aug 31 15:21:57.563206 2018] [:error] [pid 20] [remote 172.17.1.16:176] File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2783, in register_cli_opts [Fri Aug 31 15:21:57.563249 2018] [:error] [pid 20] [remote 172.17.1.16:176] self.register_cli_opt(opt, group, clear_cache=False) [Fri Aug 31 15:21:57.563267 2018] [:error] [pid 20] [remote 172.17.1.16:176] File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2505, in __inner [Fri Aug 31 15:21:57.563297 2018] [:error] [pid 20] [remote 172.17.1.16:176] return f(self, *args, **kwargs) [Fri Aug 31 15:21:57.563313 2018] [:error] [pid 20] [remote 172.17.1.16:176] File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2775, in register_cli_opt [Fri Aug 31 15:21:57.563341 2018] [:error] [pid 20] [remote 172.17.1.16:176] raise ArgsAlreadyParsedError("cannot register CLI option") [Fri Aug 31 15:21:57.563377 2018] [:error] [pid 20] [remote 172.17.1.16:176] ArgsAlreadyParsedError: arguments already parsed: cannot register CLI option Restarting the nova_placement container fixes it, but this requires manual intervention, which is a) a regression from previous OSP version and b) unwanted/unneeded. The nova part of the problem has been also captured here https://bugs.launchpad.net/tripleo/+bug/1784155 but I am under the impression that the simple control-plane restart condition has not yet been captured there. A small, likely unclean and unacceptable fix (which Marian and I tested just to understand the issue) would be to just ignore the ArgsAlreadyParsedError exception in the _parse_args() function. Ideally we'd split the config reading code out of any wsgi loop, though
As discussed with Lee, closing this one as a duplicate of 1630069. We'll track the mentioned deployment race in a new BZ to avoid confusion. *** This bug has been marked as a duplicate of bug 1630069 ***