Description of problem: rabbitmq-server fails to start on boot with the following: Sep 18 08:22:00 director.uc.example.com systemd[1]: Starting RabbitMQ broker... Sep 18 08:22:11 director.uc.example.com systemd[1]: rabbitmq-server.service: Got notification message from PID 1236, but reception only permitted for PID 1145 Sep 18 08:23:30 director.uc.example.com systemd[1]: rabbitmq-server.service operation timed out. Terminating. Sep 18 08:23:30 director.uc.example.com systemd[1]: Failed to start RabbitMQ broker. Sep 18 08:23:30 director.uc.example.com systemd[1]: Unit rabbitmq-server.service entered failed state. Version-Release number of selected component (if applicable): # rpm -q erlang-sd_notify erlang-sd_notify-0.1-2.el7ost.x86_64 # rpm -q systemd systemd-208-20.el7_1.6.x86_64 [root@director ~]# rpm -q rabbitmq-server rabbitmq-server-3.3.5-4.el7ost.noarch How reproducible: Every time Steps to Reproduce: 1. Install director per https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/7/html/Director_Installation_and_Usage/chap-Installing_the_Undercloud.html 2. Reboot server Additional info: * restarting the rabbitmq-server.service unit after system boot successfully starts the service. This isn't really a good workaround since other RHEL-OSP7 services depend upon rabbitmq on the director and they are left in a failed state. * adding the NotifyAccess=all to a custom rabbitmq-server.service unit successfully works around the issue: [root@director ~]# cat /etc/systemd/system/rabbitmq-server.service [Unit] Description=RabbitMQ broker After=syslog.target network.target [Service] Type=notify User=rabbitmq Group=rabbitmq WorkingDirectory=/var/lib/rabbitmq ExecStart=/usr/lib/rabbitmq/bin/rabbitmq-server ExecStop=/usr/lib/rabbitmq/bin/rabbitmqctl stop NotifyAccess=all [Install] WantedBy=multi-user.target
Eck, any thoughts?
I believe a slow system start (OVS bridge) is the cause of the problem. I noticed there were several other notify services that were timing out. So, I modified /etc/systemd/system.conf [root@director ~]# grep ^DefaultTimeout /etc/systemd/system.conf DefaultTimeoutStartSec=180s After a reboot I found that only 1 service failed to start on boot - openstack-ironic-discoverd which was a Type=simple. This service was failing to start with: Sep 21 15:12:55 director.uc.example.com ironic-discoverd[549]: File "/usr/lib/python2.7/site-packages/keystoneclient/v2_0/client.py", line 196, in get_raw_token_from_identity_service Sep 21 15:12:55 director.uc.example.com ironic-discoverd[549]: _("Authorization Failed: %s") % e) Sep 21 15:12:55 director.uc.example.com ironic-discoverd[549]: keystoneclient.openstack.common.apiclient.exceptions.AuthorizationFailure: Authorization Failed: Unable to establish connection to http://10.200.0.1:5000/v2.0/tokens To deal with openstack-ironic-discoverd I added the network-online.target dependency (since it doesn't make sense to try to connect to the API endpoint before networking is online: # mkdir -p /etc/systemd/system/openstack-ironic-discoverd.service.d # cd /etc/systemd/system/openstack-ironic-discoverd.service.d # echo -e '[Unit]\nAfter=network-online.target\nWants=network-online.target' > require-networking.conf # systemctl daemon-reload
Can you attach the rabbitmq log (/var/log/rabbitmq/rabbit@<node>.log) as well as the journal (journalctl -u rabbitmq-server)?
Sure. I'll attach an sosreport. Here is the output from the journalctl command: [root@director ~]# journalctl -u rabbitmq-server -- Logs begin at Tue 2015-09-22 13:16:12 EDT, end at Tue 2015-09-22 13:29:50 EDT. -- Sep 22 13:17:13 director.uc.example.com systemd[1]: Starting RabbitMQ broker... Sep 22 13:17:25 director.uc.example.com systemd[1]: rabbitmq-server.service: Got notification message from PID 1248, but reception only permitted for PID 1161 Sep 22 13:18:43 director.uc.example.com systemd[1]: rabbitmq-server.service operation timed out. Terminating. Sep 22 13:18:44 director.uc.example.com systemd[1]: Failed to start RabbitMQ broker. Sep 22 13:18:44 director.uc.example.com systemd[1]: Unit rabbitmq-server.service entered failed state. [root@director ~]# Thanks for taking a look ! -- Andrew
Created attachment 1075944 [details] sosreport from RHEL-OSP7 director after rabbitmq-server timed out
Created attachment 1075946 [details] rabbit log
After a brief glance, I'm not sure what the deal is. There's nothing during the bootup in the rabbit log, and the journal normally shows the rabbitmq "splash screen" which is one of the first things the server does when it starts booting, but the splash is absent as well. So either the process is not really being started at all by systemd, or it's hanging for some reason *very* early in the startup process. I'll see if I can reproduce this in my dev environment. Honestly I rarely if ever have rabbitmq-server set to start during bootup, instead relying on it to be managed by pacemaker once the system is up.
This bug did not make the OSP 8.0 release. It is being deferred to OSP 10.
Just fyi - this was fixed in RabbitMQ service file since rabbitmq-server-3.6.1-2.el7ost (available in RHOS9). So no further override in /etc/systemd/system is required. Could we just drop generating RabbitMQ service override in Director?
Hi, sorry, I don't know. Mike, John?
*** This bug has been marked as a duplicate of bug 1348700 ***