Description of problem: The problem the customer noticed is neutron dhcp-agent not starting after package update and after rebooting the controllers. Problem turned out to be with rabbitmq. Question is why was rabbitmq failing and causing other services not to work. Version-Release number of selected component (if applicable): rabbitmq-server-3.6.3-6.el7ost.noarch How reproducible: unknown Steps to Reproduce: 1.unknown 2. 3. Actual results: rabbitmq db corruption causes dnsmasq not to work. Expected results: rabbitmq works properly Additional info: Solution was to do pcs resource restart rabbitmq-clone.
Invalid
@Michael - could you explain your comment "Invalid" please?
I believe we've found what's going on. Short story - please upgrade up to rabbitmq-server-3.6.3-7.el7ost. Long story. We've tried to speedup various rabbitmqctl operations with experimental out-of-tree patch, which worked rather well in previous RHOS versions. Unfortunately it started causing issues on versions higher that RHOS10 due to changed iptables rules (fairly speaking, improved iptables rules). The same applies to some spiking networking outages, where rabbitmqctl cannot work properly anymore. We reverted back that patch in this version, so everything should work with rabbitmq-server-3.6.3-7.el7ost
*** This bug has been marked as a duplicate of bug 1434593 ***
The issue reported here is why dnsmasq didn't start when RabbitMQ was down. The explanation is that dnsmasq is started by neutron-dhcp-agent, which won't start if RabbitMQ is down, which is a normal behaviour. In this case, RabbitMQ was with stuck processes and misbehaving due to networking issues (port flappings).