Currently RabbitMQ cluster uses a predefined port 35672 for clustering. This port belongs to so-called ephemeral ports range. Ephemeral ports are the ports kernel assings to application if it doesn't specify which port to open. So there is a small chance that this application being started before RabbitMQ itself could grab this port. Unfortunately we've just saw this in the wild. Sidenote - if you see "Protocol: ~tp: register/listen error: ~tp~n",["inet_tcp",eaddrinuse]} in /var/log/rabbitmq/startup_err then check if some application opened port 35672. Just stop this app, start RabbitMQ, and start this application again. If we need static predefined port, then we'd better use port 25672. It doesn't belong to ephemeral ports range, so chances are low that anyone opens this port by mistake. This change will likely require a change in our firewall rules, might require a change in SELinux rules, and certainly requires a change in Director.
So in tripleo we set this stuff in puppet/hieradata/controller.yaml rabbitmq_kernel_variables: inet_dist_listen_min: '35672' inet_dist_listen_max: '35672' tripleo::firewall::firewall_rules: ... '109 rabbitmq': dport: - 4369 - 5672 - 35672 ... Peter, are there other settings somewhere (rpm, default config, etc.) that might affect this port or is the above all there is?
(In reply to Michele Baldessari from comment #2) > So in tripleo we set this stuff in puppet/hieradata/controller.yaml > > rabbitmq_kernel_variables: > inet_dist_listen_min: '35672' > inet_dist_listen_max: '35672' > > > tripleo::firewall::firewall_rules: > ... > '109 rabbitmq': > dport: > - 4369 > - 5672 > - 35672 > ... > > Peter, are there other settings somewhere (rpm, default config, etc.) that > might affect this port or is the above all there is? Nope. I'm not aware of any other places containing this.
Reducing priority/severity down since it's not very likely issue.
should we bump this to osp10?
If we can get traction for the selinux part of this bug then osp10 is doable, otherwise it will osp11 I am afraid
As Marian mentioned per mail, this is hitting us harder than before due to the HA-NG architecture: services will start before rabbit and so the chances of the ephemeral port being taken are higher
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-2948.html