Description of problem: After a global network outage, it appears like rabbitmq didn't recover properly . We can see that rabbitmq was stopped and restarted globally around 1:14AM on Apr 28 so we should've properly recovered from that situation but when we look in the rabbitmq logs, we can see the following: =ERROR REPORT==== 30-Apr-2018::15:33:22 === Channel error on connection <0.2394.14> (192.168.1.1:54680 -> 192.168.1.1:5672, vhost: '/', user: 'guest'), channel 1: operation queue.declare caused a channel exception not_found: "failed to perform operation on queue 'versioned_notifications.info' in vhost '/' due to timeout" =ERROR REPORT==== 30-Apr-2018::15:33:26 === Channel error on connection <0.2412.14> (192.168.1.1:43120 -> 192.168.1.1:5672, vhost: '/', user: 'guest'), channel 1: operation queue.declare caused a channel exception not_found: "failed to perform operation on queue 'versioned_notifications.info' in vhost '/' due to timeout" =ERROR REPORT==== 30-Apr-2018::15:33:26 === closing AMQP connection <0.3018.12> (192.168.1.1:33562 -> 192.168.1.1:5672): missed heartbeats from client, timeout: 60s Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Global network outage 2. Flaky LACP bond on one of the controllers 3. Rabbitmq flaky Actual results: Many openstack API calls were failing and restarting rabbitmq on the controllers solved the issue Expected results: Shouldn't it have recovered automatically? Additional info:
Hi David, can you report if the config change addressed the issue? Eck: Do we need to consider releasing the change?
Hey Andrew, Let me poke the customer and see if they tried it ... Thanks, Dave