Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1455793 - [UPDATES] Traceback in conductor.log "failed to perform operation on queue 'versioned_notifications.info' in vhost '/' due to timeout"
[UPDATES] Traceback in conductor.log "failed to perform operation on queue 'v...
Status: CLOSED NOTABUG
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova (Show other bugs)
11.0 (Ocata)
Unspecified Unspecified
urgent Severity urgent
: ---
: 11.0 (Ocata)
Assigned To: Artom Lifshitz
Joe H. Rahme
: Triaged, ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-05-26 03:19 EDT by Yurii Prokulevych
Modified: 2018-09-23 04:10 EDT (History)
23 users (show)

See Also:
Fixed In Version:
Doc Type: Known Issue
Doc Text:
OpenStack Compute (nova) provides both versioned and unversioned notifications in RabbitMQ. However, due to the lack of consumers for versioned notifications, the versioned notifications queue grows quickly and causes RabbitMQ failures. This can hinder Compute operations such as instance creation and flavor creation. Red Hat is currently implementing fixes for RabbitMQ and director: https://bugzilla.redhat.com/show_bug.cgi?id=1478274 https://bugzilla.redhat.com/show_bug.cgi?id=1488499 The following article provides a workaround until Red Hat releases patches for this issue: https://access.redhat.com/solutions/3139721
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-11-28 10:42:10 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Yurii Prokulevych 2017-05-26 03:19:30 EDT
Description of problem:
-----------------------
After minor update of RHOS-11 attempt to boot vm fails(hangs). In nova-conductor.log next traceback is present:

2017-05-26 06:47:20.966 3630 DEBUG oslo_messaging._drivers.amqpdriver [-] received message msg_id: 038b7af56b7b48fda85fcef42c4e903f reply to reply_0204a24bf9e7406c9b7a04c790d83b08 __call__ /usr/lib/python2.7/sit
e-packages/oslo_messaging/_drivers/amqpdriver.py:213
2017-05-26 06:47:20.975 3630 DEBUG oslo_messaging._drivers.amqpdriver [req-8d17eae8-2d48-4ab3-bb62-9141e5358101 - - - - -] sending reply msg_id: 038b7af56b7b48fda85fcef42c4e903f reply queue: reply_0204a24bf9e740
6c9b7a04c790d83b08 time elapsed: 0.00857478899991s _send_reply /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:90
2017-05-26 06:47:25.219 3631 ERROR oslo.messaging._drivers.impl_rabbit [req-84448c81-d466-4733-8829-d1c81b3f2252 11092399453c4d0f990d61193072fa1e e2fbda293276426cbf439feb26a5bfac - - -] Failed to publish message
 to topic 'nova': Queue.declare: (404) NOT_FOUND - failed to perform operation on queue 'versioned_notifications.info' in vhost '/' due to timeout
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging [req-84448c81-d466-4733-8829-d1c81b3f2252 11092399453c4d0f990d61193072fa1e e2fbda293276426cbf439feb26a5bfac - - -] Could not send notification t
o versioned_notifications. Payload={'_context_domain': None, '_context_roles': [u'admin'], '_context_quota_class': None, 'event_type': u'instance.update', '_context_request_id': u'req-84448c81-d466-4733-8829-d1c
81b3f2252', '_context_service_catalog': [{u'endpoints': [{u'adminURL': u'http://172.17.1.14:8778/placement', u'region': u'regionOne', u'internalURL': u'http://172.17.1.14:8778/placement', u'publicURL': u'https:/
/10.0.0.101:13778/placement'}], u'type': u'placement', u'name': u'placement'}, {u'endpoints': [{u'adminURL': u'http://172.17.1.14:8776/v2/e2fbda293276426cbf439feb26a5bfac', u'region': u'regionOne', u'internalURL
': u'http://172.17.1.14:8776/v2/e2fbda293276426cbf439feb26a5bfac', u'publicURL': u'https://10.0.0.101:13776/v2/e2fbda293276426cbf439feb26a5bfac'}], u'type': u'volumev2', u'name': u'cinderv2'}, {u'endpoints': [{u
'adminURL': u'http://172.17.1.14:8776/v3/e2fbda293276426cbf439feb26a5bfac', u'region': u'regionOne', u'internalURL': u'http://172.17.1.14:8776/v3/e2fbda293276426cbf439feb26a5bfac', u'publicURL': u'https://10.0.0
.101:13776/v3/e2fbda293276426cbf439feb26a5bfac'}], u'type': u'volumev3', u'name': u'cinderv3'}, {u'endpoints': [{u'adminURL': u'http://172.17.1.14:8776/v1/e2fbda293276426cbf439feb26a5bfac', u'region': u'regionOne', u'internalURL': u'http://172.17.1.14:8776/v1/e2fbda293276426cbf439feb26a5bfac', u'publicURL': u'https://10.0.0.101:13776/v1/e2fbda293276426cbf439feb26a5bfac'}], u'type': u'volume', u'name': u'cinder'}], 'tim
estamp': u'2017-05-26 06:46:22.902750', '_context_user': u'11092399453c4d0f990d61193072fa1e', '_unique_id': '89020716da9147f9a634cdd848a505dd', '_context_resource_uuid': None, '_context_instance_lock_checked': F
alse, '_context_is_admin_project': True, '_context_user_id': u'11092399453c4d0f990d61193072fa1e', 'payload': {'nova_object.version': '1.1', 'nova_object.name': 'InstanceUpdatePayload', 'nova_object.namespace': 'nova', 'nova_object.data': {'availability_zone': None, 'terminated_at': None, 'ip_addresses': [], 'ramdisk_id': u'', 'state_update': {'nova_object.version': '1.0', 'nova_object.name': 'InstanceStateUpdatePayload
', 'nova_object.namespace': 'nova', 'nova_object.data': {'old_state': u'building', 'state': u'building', 'old_task_state': None, 'new_task_state': None}}, 'bandwidth': [], 'old_display_name': None, 'image_uuid':
 u'1d7e3c51-d5f5-49a0-a40b-ad5569f9cff6', 'flavor': {'nova_object.version': '1.2', 'nova_object.name': 'FlavorPayload', 'nova_object.namespace': 'nova', 'nova_object.data': {'memory_mb': 64, 'root_gb': 1, 'name': u'm1.ultra', 'ephemeral_gb': 0, 'disabled': False, 'vcpus': 1, 'extra_specs': {}, 'swap': 0, 'rxtx_factor': 1.0, 'is_public': True, 'flavorid': u'd35fa292-412b-4a72-811c-584b3b4717bb', 'vcpu_weight': 0}}, 'del
eted_at': None, 'reservation_id': u'r-8d7beit9', 'user_id': u'11092399453c4d0f990d61193072fa1e', 'uuid': '263e76c0-1d9c-45ba-9950-1199645ac9a4', 'display_description': None, 'state': u'building', 'power_state': 
u'pending', 'host_name': u'post-vm-3', 'progress': 0, 'launched_at': None, 'metadata': {}, 'node': None, 'kernel_id': u'', 'host': None, 'display_name': u'post-vm-3', 'task_state': None, 'locked': False, 'tenant
_id': u'e2fbda293276426cbf439feb26a5bfac', 'created_at': '2017-05-26T06:44:19Z', 'architecture': None, 'audit_period': {'nova_object.version': '1.0', 'nova_object.name': 'AuditPeriodPayload', 'nova_object.namesp
ace': 'nova', 'nova_object.data': {'audit_period_ending': '2017-05-26T06:46:22Z', 'audit_period_beginning': '2017-05-01T00:00:00Z'}}, 'os_type': None}}, '_context_project_name': u'admin', '_context_read_deleted'
: u'no', '_context_user_identity': u'11092399453c4d0f990d61193072fa1e e2fbda293276426cbf439feb26a5bfac - - -', '_context_auth_token': u'64038897bee1459a903bb3a528a8fe21', '_context_show_deleted': False, '_contex
t_tenant': u'e2fbda293276426cbf439feb26a5bfac', 'priority': 'INFO', '_context_read_only': False, '_context_is_admin': True, '_context_project_id': u'e2fbda293276426cbf439feb26a5bfac', '_context_project_domain': 
None, '_context_timestamp': '2017-05-26T06:44:14.326492', '_context_user_domain': None, '_context_user_name': u'admin', 'publisher_id': u'nova-compute:controller-1.localdomain', 'message_id': u'a0678f5f-e346-425
4-93d3-046e55f7cdf2', '_context_remote_address': u'172.17.1.30'}
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging Traceback (most recent call last):
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging   File "/usr/lib/python2.7/site-packages/oslo_messaging/notify/messaging.py", line 70, in notify
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging     retry=retry)
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging   File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 104, in _send_notification
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging     retry=retry)
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 509, in send_notification
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging     envelope=(version == 2.0), notify=True, retry=retry)
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 474, in _send
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging     conn.notify_send(exchange, target.topic, msg, retry=retry)
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 1306, in notify_send
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging     exchange, msg, routing_key=topic, retry=retry)
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 1170, in _ensure_publishing
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging     self.ensure(method, retry=retry, error_callback=_error_callback)
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 817, in ensure
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging     ret, channel = autoretry_method()
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 436, in _ensured
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging     return fun(*args, **kwargs)
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 508, in __call__
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging     return fun(*args, channel=channels[0], **kwargs), channels[0]
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 793, in execute_method
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging     method()
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 1234, in _publish_and_creates_default_queue
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging     queue.declare()
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging   File "/usr/lib/python2.7/site-packages/kombu/entity.py", line 522, in declare
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging     self.queue_declare(nowait, passive=False)
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging   File "/usr/lib/python2.7/site-packages/kombu/entity.py", line 548, in queue_declare
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging     nowait=nowait)
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging   File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 1258, in queue_declare
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging     (50, 11),  # Channel.queue_declare_ok
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging   File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 69, in wait
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging     return self.dispatch_method(method_sig, args, content)
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging   File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 87, in dispatch_method
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging     return amqp_method(self, args)
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging   File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 241, in _close
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging     reply_code, reply_text, (class_id, method_id), ChannelError,
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging NotFound: Queue.declare: (404) NOT_FOUND - failed to perform operation on queue 'versioned_notifications.info' in vhost '/' due to timeout
2017-05-26 06:47:25.220 3631 ERROR oslo_messaging.notify.messaging 
2017-05-26 06:47:25.227 3631 DEBUG oslo_messaging._drivers.amqpdriver [req-84448c81-d466-4733-8829-d1c81b3f2252 11092399453c4d0f990d61193072fa1e e2fbda293276426cbf439feb26a5bfac - - -] sending reply msg_id: e0d9b7cd388f4f3fba2bc26342d77541 reply queue: reply_b8b62150f8d6493bbff9bf5b1c2cba1f time elapsed: 62.424665247s _send_reply /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:90
2017-05-26 06:47:29.602 3631 DEBUG oslo_messaging._drivers.amqpdriver [-] received message msg_id: f0069336289643bc95d9ea9393cc684f reply to reply_b8b62150f8d6493bbff9bf5b1c2cba1f __call__ /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:213


Version-Release number of selected component (if applicable):
-------------------------------------------------------------
openstack-nova-novncproxy-15.0.3-3.el7ost.noarch
python-nova-15.0.3-3.el7ost.noarch
openstack-nova-cert-15.0.3-3.el7ost.noarch
openstack-nova-migration-15.0.3-3.el7ost.noarch
openstack-nova-api-15.0.3-3.el7ost.noarch
python-novaclient-7.1.0-1.el7ost.noarch
puppet-nova-10.4.0-5.el7ost.noarch
openstack-nova-compute-15.0.3-3.el7ost.noarch
openstack-nova-scheduler-15.0.3-3.el7ost.noarch
openstack-nova-console-15.0.3-3.el7ost.noarch
openstack-nova-common-15.0.3-3.el7ost.noarch
openstack-nova-conductor-15.0.3-3.el7ost.noarch
openstack-nova-placement-api-15.0.3-3.el7ost.noarch

openstack-tripleo-heat-templates-6.0.0-12.el7ost.noarch

openstack-tripleo-ui-3.1.0-9.el7ost.noarch
puppet-tripleo-6.3.0-12.el7ost.noarch
openstack-tripleo-image-elements-6.0.0-2.el7ost.noarch
openstack-tripleo-common-6.0.0-8.el7ost.noarch
openstack-tripleo-puppet-elements-6.0.0-3.el7ost.noarch
openstack-tripleo-validations-5.5.0-1.el7ost.noarch
openstack-tripleo-heat-templates-6.0.0-12.el7ost.noarch
python-tripleoclient-6.1.0-6.el7ost.noarch

Steps to Reproduce:
-------------------
1. Install RHOS-11
2. Install latest repos
3. Update UC
4. Update deploy plan
5. Update OC
6. Try to boot vm

Additional info:
----------------
Virtual setup: controller:3,galera:3,messaging:3,networker:2,compute:2,ceph:3
After rebooting controllers and messaging nodes issue still cannot boot vm.
Comment 2 Peter Lemenkov 2017-05-26 05:26:59 EDT
I've just briefly inspected rabbitmq logs from all 3 nodes, and didn't find anything - no unhandled exceptions, no failures, nothing. From RabbitMQ perspective it does well, properly handling some client's side failures (client terminates connection, timeouts, etc).

I'm going to look more thoroughly.
Comment 3 Amit Ugol 2017-05-29 09:49:22 EDT
Adding a blocker(?) because even though update is not failing, overcloud breaks in a bad way.
Comment 4 Stephen Gordon 2017-05-29 09:53:55 EDT
(In reply to Amit Ugol from comment #3)
> Adding a blocker(?) because even though update is not failing, overcloud
> breaks in a bad way.

Blocker does not as far as I know mean anything for our z-stream releases, only .0 releases.
Comment 6 Artom Lifshitz 2017-05-29 13:06:16 EDT
Is the environment where this happened still around? While I dig in the logs, I'd like to try to narrow down what specific component of the deployment is failing. Specifically, I'd like to try other actions that cause notifications to be sent out.

In components other than Nova (like Neutron or Cinder) I'm not familiar with what actions trigger notifications, but I assume resource creation would do it. Therefore, I'd like to know if 'neutron net-create' or 'cinder create' work?

In Nova itself, the only non-instance actions that trigger notifications are flavor operations and service operations [1], so I'd like to know if 'nova flavor-create', or 'nova service-disable' work?

Cheers!

[1] https://docs.openstack.org/developer/nova/notifications.html#existing-versioned-notifications
Comment 7 Yurii Prokulevych 2017-05-30 10:22:27 EDT
@Artom, I've reproduced the issue and replies to your questions:
- managed to create new network/subnet
- managed to create new cinder volume
- managed to create nova flavor/disable/enable service

Attempt to boot vm failed. Another note it took about 1 minute for nova commands(except 'nova boot' which seems to hang)
Comment 8 Artom Lifshitz 2017-06-01 12:22:11 EDT
With Ihar's help (thanks Ihar!) I was able to get Neutron to send notifications, so the problem definitely lies with Nova. Also, while flavor creation works, the notifications fail as well, so the problem is generalised for Nova notifications.

I can no longer reproduce the bug - notifications have started going out, and I haven't changed anything significant in the environment. I've turned on various forms of debugging, restarted Rabbit and Nova API, and pinged one machine from another, but I never thought those actions would affect anything, especially since you mentioned rebooting the nodes does not resolve the bug.

I'd like to continue, this time being much more careful with the changes I make. Could you re-provision the environment from scratch, and I'll start again? Thanks!
Comment 25 Artom Lifshitz 2017-10-04 13:30:13 EDT
This looks like the same problem as [1]. The real solution is to wait for a substantial piece of upstream work. In the meantime, completely restarting the rabbit cluster is the workaround. Can the customer try that and report whether it works?

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1488499#c3

Note You need to log in before you can comment on or make changes to this bug.