Description of problem: Seeing frequent connection time outs to message queues, which is causing general system instability. Nova service-list and neutron agent-list are reporting all services are healthy. Version-Release number of selected component (if applicable): rabbitmq-server-3.1.5-6.3.el7ost.noarch python-kombu-2.5.16-3.el7ost.noarch openstack-nova-compute-2014.1.1-4.el7ost.noarch openstack-neutron-2014.1.1-4.el7ost.noarch openstack-cinder-2014.1.1-1.el7ost.noarch ######### Observing Frequent Neutron L3 Agent Queue Connection Errors: 2014-09-17 01:21:22.156 25367 ERROR neutron.openstack.common.rpc.common [-] Failed to consume message from queue: [Errno 110] Connection timed out 2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common Traceback (most recent call last): 2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 579, in ensure 2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common return method(*args, **kwargs) 2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 659, in _consume 2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common return self.connection.drain_events(timeout=timeout) 2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 281, in drain_events 2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common return self.transport.drain_events(self.connection, **kwargs) 2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 94, in drain_events 2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common return connection.drain_events(**kwargs) 2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 266, in drain_events 2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common chanmap, None, timeout=timeout, 2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 328, in _wait_multiple 2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common channel, method_sig, args, content = read_timeout(timeout) 2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 292, in read_timeout 2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common return self.method_reader.read_method() 2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 192, in read_method 2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common raise m 2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common error: [Errno 110] Connection timed out 2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common ########## Observing Frequent Nova Compute Queue Connection Errors 2014-09-14 05:25:06.985 10800 ERROR oslo.messaging._drivers.impl_rabbit [-] Failed to consume message from queue: [Errno 110] Connection timed out 2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last): 2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 622, in ensure 2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit return method(*args, **kwargs) 2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 702, in _consume 2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit return self.connection.drain_events(timeout=timeout) 2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 281, in drain_events 2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit return self.transport.drain_events(self.connection, **kwargs) 2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 94, in drain_events 2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit return connection.drain_events(**kwargs) 2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 266, in drain_events 2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit chanmap, None, timeout=timeout, 2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 328, in _wait_multiple 2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit channel, method_sig, args, content = read_timeout(timeout) 2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 292, in read_timeout 2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit return self.method_reader.read_method() 2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 192, in read_method 2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit raise m 2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit error: [Errno 110] Connection timed out 2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit 2014-09-14 05:25:06.985 10800 INFO oslo.messaging._drivers.impl_rabbit [-] Reconnecting to AMQP server on svl6-csl-b-rabbitmq-002:5672 2014-09-14 05:25:06.986 10800 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds... 2014-09-14 05:25:08.006 10800 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on svl6-csl-b-rabbitmq-002:5672 ######### Cinder also log gin similar errors 2014-09-17 03:14:52.935 8477 ERROR oslo.messaging._drivers.impl_rabbit [-] Failed to publish message to topic 'cinder-scheduler': [Errno 104] Connection reset by peer 2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last): 2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 622, in ensure 2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit return method(*args, **kwargs) 2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 718, in _publish 2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit publisher = cls(self.conf, self.channel, topic, **kwargs) 2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 394, in __init__ 2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit None, type='fanout', **options) 2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 326, in __init__ 2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit self.reconnect(channel) 2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 334, in reconnect 2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit routing_key=self.routing_key) 2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 83, in __init__ 2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit self.revive(self._channel) 2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 214, in revive 2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit self.declare() 2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 100, in declare 2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit self.exchange.declare() 2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/entity.py", line 163, in declare 2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit nowait=nowait, passive=passive, 2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 595, in exchange_declare 2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit self._send_method((40, 10), args) 2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 58, in _send_method 2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit self.channel_id, method_sig, args, content, 2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 224, in write_method 2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit write_frame(1, channel, payload) 2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 160, in write_frame 2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit pack('>BHI%dsB' % size, frame_type, channel, size, payload, 0xce), 2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/eventlet/greenio.py", line 309, in sendall 2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit tail = self.send(data, flags) 2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/eventlet/greenio.py", line 295, in send 2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit total_sent += fd.send(data[total_sent:], flags) 2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit error: [Errno 104] Connection reset by peer 2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit #### Nova Conductor queue publishing Errors 2014-09-17 03:21:31.834 8965 ERROR oslo.messaging._drivers.impl_rabbit [-] Failed to publish message to topic 'reply_0694324562bd4bb1bde9bd334b636e16': [Errno 104] Connection reset by peer 2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last): 2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 622, in ensure 2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit return method(*args, **kwargs) 2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 718, in _publish 2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit publisher = cls(self.conf, self.channel, topic, **kwargs) 2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 360, in __init__ 2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit type='direct', **options) 2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 326, in __init__ 2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit self.reconnect(channel) 2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 334, in reconnect 2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit routing_key=self.routing_key) 2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 83, in __init__ 2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit self.revive(self._channel) 2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 214, in revive 2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit self.declare() 2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 100, in declare 2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit self.exchange.declare() 2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/entity.py", line 163, in declare 2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit nowait=nowait, passive=passive, 2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 595, in exchange_declare 2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit self._send_method((40, 10), args) 2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 58, in _send_method 2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit self.channel_id, method_sig, args, content, 2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 224, in write_method 2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit write_frame(1, channel, payload) 2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 160, in write_frame 2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit pack('>BHI%dsB' % size, frame_type, channel, size, payload, 0xce), 2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/eventlet/greenio.py", line 309, in sendall 2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit tail = self.send(data, flags) 2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/eventlet/greenio.py", line 295, in send 2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit total_sent += fd.send(data[total_sent:], flags) 2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit error: [Errno 104] Connection reset by peer 2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit 2014-09-17 03:21:31.836 8965 INFO oslo.messaging._drivers.impl_rabbit [-] Reconnecting to AMQP server on svl6-csl-b-rabbitmq-002:5672 2014-09-17 03:21:31.837 8965 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds... 2014-09-17 03:21:32.873 8965 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on svl6-csl-b-rabbitmq-002:5
Are there are any corresponding messages in rabbitmq server logs when this happens?
Assuming this is with a HA deployment, it's a dupe of bug 1123296. If this is happening with a non-HA deployment, please let me know and we can reopen this one. *** This bug has been marked as a duplicate of bug 1123296 ***
Yes thanks. This is an HA deployment.
However after looking at the dupe log, we are not running Rabbit behind HA proxy, so think this case needs to remain open.
Ken, can you check the rabbitmq log? Also, do you know if you've changed the ulimit on your rabbitmq servers? If you're running under load here, it could be hitting the default limit on open file descriptors. It's kind of just a guess right now ... I think the rabbitmq log would provide a better hint if that was the case.
So before I opened the case today, i had increased system ulimit last night as there were File Handle errors in the rabbit log. Those errors are now gone but connection timeout behavior persists. Close thing to an error in rabbit log now is warnings =WARNING REPORT==== 17-Sep-2014::05:26:21 === closing AMQP connection <0.1776.0> (10.114.194.204:38795 -> 10.114.197.142:5672): connection_closed_abruptly
(In reply to Ken Schroeder from comment #7) > So before I opened the case today, i had increased system ulimit last night > as there were File Handle errors in the rabbit log. Those errors are now > gone but connection timeout behavior persists. Close thing to an error in > rabbit log now is warnings OK, thanks for clarifying. > =WARNING REPORT==== 17-Sep-2014::05:26:21 === > closing AMQP connection <0.1776.0> (10.114.194.204:38795 -> > 10.114.197.142:5672): > connection_closed_abruptly A quick search seems to indicate that this error is related to an unclean connection teardown by the client (TCP connection closed without a proper AMQP connection close). If there's just one of those and several of the errors you're seeing on the OpenStack side, they're probably unrelated.
Doing event correlation between rabbit logs and clients showing up in the warning I cannot tie them together based on timestamp.
Can you elaborate a bit more on your HA setup, minus haproxy? Are you load balancing the rabbit servers behind a VIP via some other means, e.g. hardware load balancer? Are you doing it client side with kombu, e.g. setting rabbit_hosts=host1,host2,host3? Something else?
The HA architecture we are using is multiple rabbit nodes with HA Queues and Kombu on the client side. There is no VIP or load balancer in the messaging architecture.
Is there possibly a stateful firewall in between the clients and the rabbitmq nodes? The more I look at this, the more it looks like something in the middle is resetting the connection. Especially since you are seeing the resets both on the server and client sides. I know you said you couldn't find any correlation between the client and server logs. Because rabbitmq does not do heartbeats, the client often will not notice the reset until some time far in the future than when the reset actually occurs. However the server will notice earlier, since it tries to push messages to the consumers and fails. This is a known issue and is being tracked in bug 1129242. If there is a firewall in the middle that tears down idle connections, it would explain the behavior you are seeing.
The rabbit instances are running on a Nova KVM hypervisor with OVS and Neutron provider network. The only firewall in the data path is OVS security groups which open and functioning.
What is the proper method to tune File Handles Limit for rabbitmq-server process. It seems adding to /etc/security/limits.conf does not actually change the limits for the rabbitmq-server process. Modifying the startup script is not really something that can be managed cleanly by puppet. [root@svl6-csl-b-rabbitmq-001 ~]# ps -ef|grep rabbit rabbitmq 1062 1 11 02:26 ? 01:25:19 /usr/lib64/erlang/erts-5.10.4/bin/beam.smp -W w -K true -A30 -P 1048576 -- -root /usr/lib64/erlang -progname erl -- -home /var/lib/rabbitmq -- -pa /usr/lib/rabbitmq/lib/rabbitmq_server-3.3.5/sbin/../ebin -noshell -noinput -s rabbit boot -sname rabbit@svl6-csl-b-rabbitmq-001 -boot start_sasl -config /etc/rabbitmq/rabbitmq -kernel inet_default_connect_options [{nodelay,true}] -rabbit tcp_listeners [{"auto",5672}] -sasl errlog_type error -sasl sasl_error_logger false -rabbit error_logger {file,"/var/log/rabbitmq/rabbit"} -rabbit sasl_error_logger {file,"/var/log/rabbitmq/rabbit"} -rabbit enabled_plugins_file "/etc/rabbitmq/enabled_plugins" -rabbit plugins_dir "/usr/lib/rabbitmq/lib/rabbitmq_server-3.3.5/sbin/../plugins" -rabbit plugins_expand_dir "/var/lib/rabbitmq/mnesia/rabbit@svl6-csl-b-rabbitmq-001-plugins-expand" -os_mon start_cpu_sup false -os_mon start_disksup false -os_mon start_memsup false -mnesia dir "/var/lib/rabbitmq/mnesia/rabbit@svl6-csl-b-rabbitmq-001" -kernel inet_dist_listen_min 25672 -kernel inet_dist_listen_max 25672 [root@svl6-csl-b-rabbitmq-001 ~]# cat /proc/1062/limits |grep file Max file size unlimited unlimited bytes Max core file size 0 unlimited bytes Max open files 1024 4096 files [root@svl6-csl-b-rabbitmq-001 ~]# su rabbitmq -s /bin/sh -c 'ulimit -n' 100000
We have also modified /etc/default/rabbitmq-server but that is not having impact either it would seem. [root@svl6-csl-b-rabbitmq-001 ~]# cat /etc/default/rabbitmq-server # This file is sourced by /etc/init.d/rabbitmq-server. Its primary # reason for existing is to allow adjustment of system limits for the # rabbitmq-server process. # # Maximum number of open file handles. This will need to be increased # to handle many simultaneous connections. Refer to the system # documentation for ulimit (in man bash) for more information. # ulimit -n 102400 [root@svl6-csl-b-
Under systemd, you must configure the file handle limit under the service unit. This will increase it to 102400: cp /usr/lib/systemd/system/rabbitmq-server.service /etc/systemd/system/ sed -i '/^ExecStopPost.*/a LimitNOFILE=102400' /etc/systemd/system/rabbitmq-server.service systemctl daemon-reload service rabbitmq-server restart
See also bug 1148063 which is to modify the default rabbitmq file handle limits.
It would appear that the problems we were having has been resolved by enabling rabbitmq config for keepalive function, additionally we tuned ipv4.keepalive settings in /etc/sysctl.conf on the rabbitmq servers to a much lower value than the defaults. Since performing those updates the nova and other service connections have stabilized. Believe this can be closed.
*** Bug 1174929 has been marked as a duplicate of this bug. ***