Description of problem: Events not properly being picked up from RHOS Version-Release number of selected component (if applicable): 5.4.0.2 How reproducible: Very Steps to Reproduce: 1. Add a RHOS provider 2. Start/Restart and instance 3. Try to find events Actual results: Events are not collected Expected results: Events should be collected Additional info: This has been tested against RHOS4 and RHOS6 GA versions. Both RHOS environments had extra ports opened up to allow connections to AMQP/RabbitMQ. Connections to port 5672 were verified, and even the filters for the scheduler were removed, but events were still not captured.
Some additional questions. Have you tried to subscribe to the notification channel from other place, too see if messages are sent there, but not consumed by CFME? Or are they not sent there at all? If they are not sent there, we had the same issue with configuration of each service. E.g. /etc/nova/nova.conf must be configured to send messages to notification channel. Do you see this in your conf? /etc/nova/nova.conf notification_driver = messaging notification_topics = notifications If it stops receiving events after some time, might be connections issue, we experienced problems with Bunny gem, that lost connection when we restarted server with AMPQ and it wasn't able to get it back. (we have pretty old bunny gem, maybe update could fix this). mcornea could provide more info about this issue.
Note: with proper configuration we are collecting events for OpenstackInfra provider (it's the same implementation as for Openstack provider) with RHOS6. We are testing RHOS7 which had some issues.
I see notification_topics = notifications but not notification_driver = messaging Also can I confirm are we connecting to the notifications.* queue, or are we creating our own queue and binding to a fanout exchange?
From my limited knowledge of AMQP and from observation, I think we are connecting to notifications.* When I start events worker, number of connections on notifications.* rises by 6, which are these 6 lines here https://github.com/ManageIQ/manageiq/blob/a1ed085996b42ace3e9498bdf2fe001de517b040/vmdb/config/vmdb.tmpl.yml#L301
Discovery work has been carried out and the issue is with multiple appliances connecting to the same RHOS instance. Each appliance connects to the same queues. As such they consume messages in a round robin fashion meaning that messages can be lost. Though it is less likely for a customer to be running multiple appliances against the same RHOS instance, this fix is highly needed for QE to be able to test RHOS functionality in CFME. We already have a solution to the problem. Each appliance will use a randomized queue name, much like we current do for QPID. Greg B has developed the fix and it has already proved successful in early testing.
https://github.com/ManageIQ/manageiq/pull/2995
New commit detected on manageiq/master: https://github.com/ManageIQ/manageiq/commit/051cac62ebe60f720dd1844ab0d64b6880c42f98 commit 051cac62ebe60f720dd1844ab0d64b6880c42f98 Author: Greg Blomquist <gblomqui> AuthorDate: Tue May 26 12:38:30 2015 -0400 Commit: Greg Blomquist <gblomqui> CommitDate: Tue May 26 12:45:57 2015 -0400 Create unique binding queue names for RabbitMQ When CFME connects to RabbitMQ to collect OpenStack events, it creates queues to bind to the OpenStack services' exchanges. The queues were named after the services to which they were bound. For example, binding to the "nova" service would result in a binding queue called "nova". If more than one appliance attempted to connect to RabbitMQ to collect OpenStack events, only the first appliance to create the binding queue would receive any events. Now, the binding queue is named after the appliance connecting to the RabbitMQ service. The new binding queue name will look like "miq-<host|ip>-<exchange>" * e.g.: "miq-10.10.10.10-nova" This allows for two things: individual appliances will get their own binding queue per service, and administrators will be able to tell which binding queues belong to which appliances. https://bugzilla.redhat.com/show_bug.cgi?id=1223976 lib/openstack/amqp/openstack_rabbit_event_monitor.rb | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
New commit detected on manageiq/master: https://github.com/ManageIQ/manageiq/commit/28b13d6ca787f7379882a39866ae8e4a39356d6a commit 28b13d6ca787f7379882a39866ae8e4a39356d6a Author: Greg Blomquist <gblomqui> AuthorDate: Wed Jun 3 16:16:05 2015 -0400 Commit: Greg Blomquist <gblomqui> CommitDate: Wed Jun 3 16:45:04 2015 -0400 Fix custom naming for AMQP binding queues The original fixes for the bugs linked below used the OpenStack server's IP address as the hostname information in the binding queue name. This meant that any two appliances that attempted to connect to a single OpenStack env would create the same named binding queues. This fix uses the appliance's ip address in the binding queue name, making the name of the binding queue unique per appliance. https://bugzilla.redhat.com/show_bug.cgi?id=1224389 https://bugzilla.redhat.com/show_bug.cgi?id=1223976 lib/openstack/amqp/openstack_qpid_event_monitor.rb | 10 ++++++---- lib/openstack/amqp/openstack_qpid_receiver.rb | 5 +++-- lib/openstack/amqp/openstack_rabbit_event_monitor.rb | 6 ++++-- lib/spec/openstack/amqp/openstack_qpid_event_monitor_spec.rb | 2 +- lib/spec/openstack/amqp/openstack_qpid_receiver_spec.rb | 2 +- lib/spec/openstack/amqp/openstack_rabbit_event_monitor_spec.rb | 2 +- vmdb/lib/workers/mixins/event_catcher_openstack_mixin.rb | 4 +++- 7 files changed, 19 insertions(+), 12 deletions(-)
New commit detected on manageiq/master: https://github.com/ManageIQ/manageiq/commit/3649eb07a5e8b1b9a5f56dab11eb205e66758ef5 commit 3649eb07a5e8b1b9a5f56dab11eb205e66758ef5 Author: Greg Blomquist <gblomqui> AuthorDate: Tue Jun 23 13:30:07 2015 -0400 Commit: Greg Blomquist <gblomqui> CommitDate: Tue Jun 23 16:00:12 2015 -0400 Include miq_server when retrieving worker To try to make the way the OpenStack event catcher creates binding queues work a little better, the appliance's IP address was looked up and used as part of the binding queue's name. However, there were a couple of things working against this fix. First, the appliance's IP address was not readily available to the worker process. Second, ManageIQ has a DB connection pool with only one connection. And, threads (i.e., where event catcher workers do all their work) that attempt to run queries are opening a new DB connection. The original fix never actually tried opening the a new connection. Instead, it was perfectly happy to get back a nil value for the appliance and try to lookup Nil#ipaddress. This fix gets around this problem by throwing the appliance record (miq_server, actually) into an ivar and making that available to the thread. This keeps the thread from having to query for the miq_server, while still giving it access to the MiqServer#ipaddress. Original PR: https://github.com/ManageIQ/manageiq/pull/3050 Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1232484 References: https://bugzilla.redhat.com/show_bug.cgi?id=1224389 https://bugzilla.redhat.com/show_bug.cgi?id=1223976 vmdb/lib/workers/mixins/event_catcher_openstack_mixin.rb | 2 +- vmdb/lib/workers/worker_base.rb | 13 +++++++------ 2 files changed, 8 insertions(+), 7 deletions(-)
Verified in 5.5.0.3
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2015:2551