Red Hat Bugzilla – Bug 1455793
[UPDATES] Traceback in conductor.log "failed to perform operation on queue 'versioned_notifications.info' in vhost '/' due to timeout"
Last modified: 2018-09-23 04:10:19 EDT
Description of problem: ----------------------- After minor update of RHOS-11 attempt to boot vm fails(hangs). In nova-conductor.log next traceback is present: 2017-05-26 06:47:20.966 3630 DEBUG oslo_messaging._drivers.amqpdriver [-] received message msg_id: 038b7af56b7b48fda85fcef42c4e903f reply to reply_0204a24bf9e7406c9b7a04c790d83b08 __call__ /usr/lib/python2.7/sit e-packages/oslo_messaging/_drivers/amqpdriver.py:213 2017-05-26 06:47:20.975 3630 DEBUG oslo_messaging._drivers.amqpdriver [req-8d17eae8-2d48-4ab3-bb62-9141e5358101 - - - - -] sending reply msg_id: 038b7af56b7b48fda85fcef42c4e903f reply queue: reply_0204a24bf9e740 6c9b7a04c790d83b08 time elapsed: 0.00857478899991s _send_reply /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:90 2017-05-26 06:47:25.219 3631 ERROR oslo.messaging._drivers.impl_rabbit [req-84448c81-d466-4733-8829-d1c81b3f2252 11092399453c4d0f990d61193072fa1e e2fbda293276426cbf439feb26a5bfac - - -] Failed to publish message to topic 'nova': Queue.declare: (404) NOT_FOUND - failed to perform operation on queue 'versioned_notifications.info' in vhost '/' due to timeout 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging [req-84448c81-d466-4733-8829-d1c81b3f2252 11092399453c4d0f990d61193072fa1e e2fbda293276426cbf439feb26a5bfac - - -] Could not send notification t o versioned_notifications. Payload={'_context_domain': None, '_context_roles': [u'admin'], '_context_quota_class': None, 'event_type': u'instance.update', '_context_request_id': u'req-84448c81-d466-4733-8829-d1c 81b3f2252', '_context_service_catalog': [{u'endpoints': [{u'adminURL': u'http://172.17.1.14:8778/placement', u'region': u'regionOne', u'internalURL': u'http://172.17.1.14:8778/placement', u'publicURL': u'https:/ /10.0.0.101:13778/placement'}], u'type': u'placement', u'name': u'placement'}, {u'endpoints': [{u'adminURL': u'http://172.17.1.14:8776/v2/e2fbda293276426cbf439feb26a5bfac', u'region': u'regionOne', u'internalURL ': u'http://172.17.1.14:8776/v2/e2fbda293276426cbf439feb26a5bfac', u'publicURL': u'https://10.0.0.101:13776/v2/e2fbda293276426cbf439feb26a5bfac'}], u'type': u'volumev2', u'name': u'cinderv2'}, {u'endpoints': [{u 'adminURL': u'http://172.17.1.14:8776/v3/e2fbda293276426cbf439feb26a5bfac', u'region': u'regionOne', u'internalURL': u'http://172.17.1.14:8776/v3/e2fbda293276426cbf439feb26a5bfac', u'publicURL': u'https://10.0.0 .101:13776/v3/e2fbda293276426cbf439feb26a5bfac'}], u'type': u'volumev3', u'name': u'cinderv3'}, {u'endpoints': [{u'adminURL': u'http://172.17.1.14:8776/v1/e2fbda293276426cbf439feb26a5bfac', u'region': u'regionOne', u'internalURL': u'http://172.17.1.14:8776/v1/e2fbda293276426cbf439feb26a5bfac', u'publicURL': u'https://10.0.0.101:13776/v1/e2fbda293276426cbf439feb26a5bfac'}], u'type': u'volume', u'name': u'cinder'}], 'tim estamp': u'2017-05-26 06:46:22.902750', '_context_user': u'11092399453c4d0f990d61193072fa1e', '_unique_id': '89020716da9147f9a634cdd848a505dd', '_context_resource_uuid': None, '_context_instance_lock_checked': F alse, '_context_is_admin_project': True, '_context_user_id': u'11092399453c4d0f990d61193072fa1e', 'payload': {'nova_object.version': '1.1', 'nova_object.name': 'InstanceUpdatePayload', 'nova_object.namespace': 'nova', 'nova_object.data': {'availability_zone': None, 'terminated_at': None, 'ip_addresses': [], 'ramdisk_id': u'', 'state_update': {'nova_object.version': '1.0', 'nova_object.name': 'InstanceStateUpdatePayload ', 'nova_object.namespace': 'nova', 'nova_object.data': {'old_state': u'building', 'state': u'building', 'old_task_state': None, 'new_task_state': None}}, 'bandwidth': [], 'old_display_name': None, 'image_uuid': u'1d7e3c51-d5f5-49a0-a40b-ad5569f9cff6', 'flavor': {'nova_object.version': '1.2', 'nova_object.name': 'FlavorPayload', 'nova_object.namespace': 'nova', 'nova_object.data': {'memory_mb': 64, 'root_gb': 1, 'name': u'm1.ultra', 'ephemeral_gb': 0, 'disabled': False, 'vcpus': 1, 'extra_specs': {}, 'swap': 0, 'rxtx_factor': 1.0, 'is_public': True, 'flavorid': u'd35fa292-412b-4a72-811c-584b3b4717bb', 'vcpu_weight': 0}}, 'del eted_at': None, 'reservation_id': u'r-8d7beit9', 'user_id': u'11092399453c4d0f990d61193072fa1e', 'uuid': '263e76c0-1d9c-45ba-9950-1199645ac9a4', 'display_description': None, 'state': u'building', 'power_state': u'pending', 'host_name': u'post-vm-3', 'progress': 0, 'launched_at': None, 'metadata': {}, 'node': None, 'kernel_id': u'', 'host': None, 'display_name': u'post-vm-3', 'task_state': None, 'locked': False, 'tenant _id': u'e2fbda293276426cbf439feb26a5bfac', 'created_at': '2017-05-26T06:44:19Z', 'architecture': None, 'audit_period': {'nova_object.version': '1.0', 'nova_object.name': 'AuditPeriodPayload', 'nova_object.namesp ace': 'nova', 'nova_object.data': {'audit_period_ending': '2017-05-26T06:46:22Z', 'audit_period_beginning': '2017-05-01T00:00:00Z'}}, 'os_type': None}}, '_context_project_name': u'admin', '_context_read_deleted' : u'no', '_context_user_identity': u'11092399453c4d0f990d61193072fa1e e2fbda293276426cbf439feb26a5bfac - - -', '_context_auth_token': u'64038897bee1459a903bb3a528a8fe21', '_context_show_deleted': False, '_contex t_tenant': u'e2fbda293276426cbf439feb26a5bfac', 'priority': 'INFO', '_context_read_only': False, '_context_is_admin': True, '_context_project_id': u'e2fbda293276426cbf439feb26a5bfac', '_context_project_domain': None, '_context_timestamp': '2017-05-26T06:44:14.326492', '_context_user_domain': None, '_context_user_name': u'admin', 'publisher_id': u'nova-compute:controller-1.localdomain', 'message_id': u'a0678f5f-e346-425 4-93d3-046e55f7cdf2', '_context_remote_address': u'172.17.1.30'} 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging Traceback (most recent call last): 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging File "/usr/lib/python2.7/site-packages/oslo_messaging/notify/messaging.py", line 70, in notify 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging retry=retry) 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 104, in _send_notification 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging retry=retry) 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 509, in send_notification 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging envelope=(version == 2.0), notify=True, retry=retry) 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 474, in _send 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging conn.notify_send(exchange, target.topic, msg, retry=retry) 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 1306, in notify_send 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging exchange, msg, routing_key=topic, retry=retry) 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 1170, in _ensure_publishing 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging self.ensure(method, retry=retry, error_callback=_error_callback) 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 817, in ensure 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging ret, channel = autoretry_method() 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 436, in _ensured 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging return fun(*args, **kwargs) 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 508, in __call__ 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging return fun(*args, channel=channels[0], **kwargs), channels[0] 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 793, in execute_method 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging method() 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 1234, in _publish_and_creates_default_queue 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging queue.declare() 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging File "/usr/lib/python2.7/site-packages/kombu/entity.py", line 522, in declare 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging self.queue_declare(nowait, passive=False) 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging File "/usr/lib/python2.7/site-packages/kombu/entity.py", line 548, in queue_declare 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging nowait=nowait) 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 1258, in queue_declare 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging (50, 11), # Channel.queue_declare_ok 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 69, in wait 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging return self.dispatch_method(method_sig, args, content) 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 87, in dispatch_method 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging return amqp_method(self, args) 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 241, in _close 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging reply_code, reply_text, (class_id, method_id), ChannelError, 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging NotFound: Queue.declare: (404) NOT_FOUND - failed to perform operation on queue 'versioned_notifications.info' in vhost '/' due to timeout 2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging 2017-05-26 06:47:25.227 3631 DEBUG oslo_messaging._drivers.amqpdriver [req-84448c81-d466-4733-8829-d1c81b3f2252 11092399453c4d0f990d61193072fa1e e2fbda293276426cbf439feb26a5bfac - - -] sending reply msg_id: e0d9b7cd388f4f3fba2bc26342d77541 reply queue: reply_b8b62150f8d6493bbff9bf5b1c2cba1f time elapsed: 62.424665247s _send_reply /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:90 2017-05-26 06:47:29.602 3631 DEBUG oslo_messaging._drivers.amqpdriver [-] received message msg_id: f0069336289643bc95d9ea9393cc684f reply to reply_b8b62150f8d6493bbff9bf5b1c2cba1f __call__ /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:213 Version-Release number of selected component (if applicable): ------------------------------------------------------------- openstack-nova-novncproxy-15.0.3-3.el7ost.noarch python-nova-15.0.3-3.el7ost.noarch openstack-nova-cert-15.0.3-3.el7ost.noarch openstack-nova-migration-15.0.3-3.el7ost.noarch openstack-nova-api-15.0.3-3.el7ost.noarch python-novaclient-7.1.0-1.el7ost.noarch puppet-nova-10.4.0-5.el7ost.noarch openstack-nova-compute-15.0.3-3.el7ost.noarch openstack-nova-scheduler-15.0.3-3.el7ost.noarch openstack-nova-console-15.0.3-3.el7ost.noarch openstack-nova-common-15.0.3-3.el7ost.noarch openstack-nova-conductor-15.0.3-3.el7ost.noarch openstack-nova-placement-api-15.0.3-3.el7ost.noarch openstack-tripleo-heat-templates-6.0.0-12.el7ost.noarch openstack-tripleo-ui-3.1.0-9.el7ost.noarch puppet-tripleo-6.3.0-12.el7ost.noarch openstack-tripleo-image-elements-6.0.0-2.el7ost.noarch openstack-tripleo-common-6.0.0-8.el7ost.noarch openstack-tripleo-puppet-elements-6.0.0-3.el7ost.noarch openstack-tripleo-validations-5.5.0-1.el7ost.noarch openstack-tripleo-heat-templates-6.0.0-12.el7ost.noarch python-tripleoclient-6.1.0-6.el7ost.noarch Steps to Reproduce: ------------------- 1. Install RHOS-11 2. Install latest repos 3. Update UC 4. Update deploy plan 5. Update OC 6. Try to boot vm Additional info: ---------------- Virtual setup: controller:3,galera:3,messaging:3,networker:2,compute:2,ceph:3 After rebooting controllers and messaging nodes issue still cannot boot vm.
I've just briefly inspected rabbitmq logs from all 3 nodes, and didn't find anything - no unhandled exceptions, no failures, nothing. From RabbitMQ perspective it does well, properly handling some client's side failures (client terminates connection, timeouts, etc). I'm going to look more thoroughly.
Adding a blocker(?) because even though update is not failing, overcloud breaks in a bad way.
(In reply to Amit Ugol from comment #3) > Adding a blocker(?) because even though update is not failing, overcloud > breaks in a bad way. Blocker does not as far as I know mean anything for our z-stream releases, only .0 releases.
Is the environment where this happened still around? While I dig in the logs, I'd like to try to narrow down what specific component of the deployment is failing. Specifically, I'd like to try other actions that cause notifications to be sent out. In components other than Nova (like Neutron or Cinder) I'm not familiar with what actions trigger notifications, but I assume resource creation would do it. Therefore, I'd like to know if 'neutron net-create' or 'cinder create' work? In Nova itself, the only non-instance actions that trigger notifications are flavor operations and service operations [1], so I'd like to know if 'nova flavor-create', or 'nova service-disable' work? Cheers! [1] https://docs.openstack.org/developer/nova/notifications.html#existing-versioned-notifications
@Artom, I've reproduced the issue and replies to your questions: - managed to create new network/subnet - managed to create new cinder volume - managed to create nova flavor/disable/enable service Attempt to boot vm failed. Another note it took about 1 minute for nova commands(except 'nova boot' which seems to hang)
With Ihar's help (thanks Ihar!) I was able to get Neutron to send notifications, so the problem definitely lies with Nova. Also, while flavor creation works, the notifications fail as well, so the problem is generalised for Nova notifications. I can no longer reproduce the bug - notifications have started going out, and I haven't changed anything significant in the environment. I've turned on various forms of debugging, restarted Rabbit and Nova API, and pinged one machine from another, but I never thought those actions would affect anything, especially since you mentioned rebooting the nodes does not resolve the bug. I'd like to continue, this time being much more careful with the changes I make. Could you re-provision the environment from scratch, and I'll start again? Thanks!
This looks like the same problem as [1]. The real solution is to wait for a substantial piece of upstream work. In the meantime, completely restarting the rabbit cluster is the workaround. Can the customer try that and report whether it works? [1] https://bugzilla.redhat.com/show_bug.cgi?id=1488499#c3