Bug 1142915 - Frequent connection timeouts
Summary: Frequent connection timeouts
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rabbitmq-server
Version: 5.0 (RHEL 7)
Hardware: x86_64
OS: Unspecified
high
high
Target Milestone: ---
: 5.0 (RHEL 7)
Assignee: John Eckersberg
QA Contact: yeylon@redhat.com
URL:
Whiteboard:
: 1174929 (view as bug list)
Depends On:
Blocks: 1154145 1174033
TreeView+ depends on / blocked
 
Reported: 2014-09-17 15:23 UTC by Ken Schroeder
Modified: 2019-09-09 16:56 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-02-11 20:07:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Ken Schroeder 2014-09-17 15:23:10 UTC
Description of problem:
Seeing  frequent connection time outs to message queues, which is causing general system instability. Nova service-list and neutron agent-list are reporting all services are healthy.

Version-Release number of selected component (if applicable):
rabbitmq-server-3.1.5-6.3.el7ost.noarch
python-kombu-2.5.16-3.el7ost.noarch
openstack-nova-compute-2014.1.1-4.el7ost.noarch
openstack-neutron-2014.1.1-4.el7ost.noarch
openstack-cinder-2014.1.1-1.el7ost.noarch


######### Observing Frequent Neutron L3 Agent Queue Connection Errors:
2014-09-17 01:21:22.156 25367 ERROR neutron.openstack.common.rpc.common [-] Failed to consume message from queue: [Errno 110] Connection timed out
2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common Traceback (most recent call last):
2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 579, in ensure
2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common     return method(*args, **kwargs)
2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 659, in _consume
2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common     return self.connection.drain_events(timeout=timeout)
2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 281, in drain_events
2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common     return self.transport.drain_events(self.connection, **kwargs)
2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 94, in drain_events
2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common     return connection.drain_events(**kwargs)
2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 266, in drain_events
2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common     chanmap, None, timeout=timeout,
2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 328, in _wait_multiple
2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common     channel, method_sig, args, content = read_timeout(timeout)
2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 292, in read_timeout
2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common     return self.method_reader.read_method()
2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 192, in read_method
2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common     raise m
2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common error: [Errno 110] Connection timed out
2014-09-17 01:21:22.156 25367 TRACE neutron.openstack.common.rpc.common 

########## Observing Frequent Nova Compute Queue Connection Errors
2014-09-14 05:25:06.985 10800 ERROR oslo.messaging._drivers.impl_rabbit [-] Failed to consume message from queue: [Errno 110] Connection timed out
2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 622, in ensure
2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit     return method(*args, **kwargs)
2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 702, in _consume
2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit     return self.connection.drain_events(timeout=timeout)
2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 281, in drain_events
2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit     return self.transport.drain_events(self.connection, **kwargs)
2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 94, in drain_events
2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit     return connection.drain_events(**kwargs)
2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 266, in drain_events
2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit     chanmap, None, timeout=timeout,
2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 328, in _wait_multiple
2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit     channel, method_sig, args, content = read_timeout(timeout)
2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 292, in read_timeout
2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit     return self.method_reader.read_method()
2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 192, in read_method
2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit     raise m
2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit error: [Errno 110] Connection timed out
2014-09-14 05:25:06.985 10800 TRACE oslo.messaging._drivers.impl_rabbit 
2014-09-14 05:25:06.985 10800 INFO oslo.messaging._drivers.impl_rabbit [-] Reconnecting to AMQP server on svl6-csl-b-rabbitmq-002:5672
2014-09-14 05:25:06.986 10800 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds...
2014-09-14 05:25:08.006 10800 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on svl6-csl-b-rabbitmq-002:5672


######### Cinder also log gin similar errors
2014-09-17 03:14:52.935 8477 ERROR oslo.messaging._drivers.impl_rabbit [-] Failed to publish message to topic 'cinder-scheduler': [Errno 104] Connection reset by peer
2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 622, in ensure
2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit     return method(*args, **kwargs)
2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 718, in _publish
2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit     publisher = cls(self.conf, self.channel, topic, **kwargs)
2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 394, in __init__
2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit     None, type='fanout', **options)
2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 326, in __init__
2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit     self.reconnect(channel)
2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 334, in reconnect
2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit     routing_key=self.routing_key)
2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 83, in __init__
2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit     self.revive(self._channel)
2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 214, in revive
2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit     self.declare()
2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 100, in declare
2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit     self.exchange.declare()
2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/entity.py", line 163, in declare
2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit     nowait=nowait, passive=passive,
2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 595, in exchange_declare
2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit     self._send_method((40, 10), args)
2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 58, in _send_method
2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit     self.channel_id, method_sig, args, content,
2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 224, in write_method
2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit     write_frame(1, channel, payload)
2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 160, in write_frame
2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit     pack('>BHI%dsB' % size, frame_type, channel, size, payload, 0xce),
2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/eventlet/greenio.py", line 309, in sendall
2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit     tail = self.send(data, flags)
2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/eventlet/greenio.py", line 295, in send
2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit     total_sent += fd.send(data[total_sent:], flags)
2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit error: [Errno 104] Connection reset by peer
2014-09-17 03:14:52.935 8477 TRACE oslo.messaging._drivers.impl_rabbit 

#### Nova Conductor queue publishing Errors
2014-09-17 03:21:31.834 8965 ERROR oslo.messaging._drivers.impl_rabbit [-] Failed to publish message to topic 'reply_0694324562bd4bb1bde9bd334b636e16': [Errno 104] Connection reset by peer
2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 622, in ensure
2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit     return method(*args, **kwargs)
2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 718, in _publish
2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit     publisher = cls(self.conf, self.channel, topic, **kwargs)
2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 360, in __init__
2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit     type='direct', **options)
2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 326, in __init__
2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit     self.reconnect(channel)
2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 334, in reconnect
2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit     routing_key=self.routing_key)
2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 83, in __init__
2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit     self.revive(self._channel)
2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 214, in revive
2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit     self.declare()
2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 100, in declare
2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit     self.exchange.declare()
2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/entity.py", line 163, in declare
2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit     nowait=nowait, passive=passive,
2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 595, in exchange_declare
2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit     self._send_method((40, 10), args)
2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 58, in _send_method
2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit     self.channel_id, method_sig, args, content,
2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 224, in write_method
2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit     write_frame(1, channel, payload)
2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 160, in write_frame
2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit     pack('>BHI%dsB' % size, frame_type, channel, size, payload, 0xce),
2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/eventlet/greenio.py", line 309, in sendall
2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit     tail = self.send(data, flags)
2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/eventlet/greenio.py", line 295, in send
2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit     total_sent += fd.send(data[total_sent:], flags)
2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit error: [Errno 104] Connection reset by peer
2014-09-17 03:21:31.834 8965 TRACE oslo.messaging._drivers.impl_rabbit 
2014-09-17 03:21:31.836 8965 INFO oslo.messaging._drivers.impl_rabbit [-] Reconnecting to AMQP server on svl6-csl-b-rabbitmq-002:5672
2014-09-17 03:21:31.837 8965 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds...
2014-09-17 03:21:32.873 8965 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on svl6-csl-b-rabbitmq-002:5

Comment 1 Russell Bryant 2014-09-17 15:38:45 UTC
Are there are any corresponding messages in rabbitmq server logs when this happens?

Comment 2 John Eckersberg 2014-09-17 15:40:35 UTC
Assuming this is with a HA deployment, it's a dupe of bug 1123296.  If this is happening with a non-HA deployment, please let me know and we can reopen this one.

*** This bug has been marked as a duplicate of bug 1123296 ***

Comment 3 Ken Schroeder 2014-09-17 15:47:41 UTC
Yes thanks.  This is an HA deployment.

Comment 4 Ken Schroeder 2014-09-17 15:54:31 UTC
However after looking at the dupe log, we are not running Rabbit behind HA proxy, so think this case needs to remain open.

Comment 6 Russell Bryant 2014-09-17 16:03:37 UTC
Ken, can you check the rabbitmq log?

Also, do you know if you've changed the ulimit on your rabbitmq servers?  If you're running under load here, it could be hitting the default limit on open file descriptors.  It's kind of just a guess right now ... I think the rabbitmq log would provide a better hint if that was the case.

Comment 7 Ken Schroeder 2014-09-17 16:10:41 UTC
So before I opened the case today, i had increased system ulimit last night as there were File Handle errors in the rabbit log.  Those errors are now gone but connection timeout behavior persists.  Close thing to an error in rabbit log now is warnings

=WARNING REPORT==== 17-Sep-2014::05:26:21 ===
closing AMQP connection <0.1776.0> (10.114.194.204:38795 -> 10.114.197.142:5672):
connection_closed_abruptly

Comment 8 Russell Bryant 2014-09-17 16:19:27 UTC
(In reply to Ken Schroeder from comment #7)
> So before I opened the case today, i had increased system ulimit last night
> as there were File Handle errors in the rabbit log.  Those errors are now
> gone but connection timeout behavior persists.  Close thing to an error in
> rabbit log now is warnings

OK, thanks for clarifying.

> =WARNING REPORT==== 17-Sep-2014::05:26:21 ===
> closing AMQP connection <0.1776.0> (10.114.194.204:38795 ->
> 10.114.197.142:5672):
> connection_closed_abruptly

A quick search seems to indicate that this error is related to an unclean connection teardown by the client (TCP connection closed without a proper AMQP connection close).   If there's just one of those and several of the errors you're seeing on the OpenStack side, they're probably unrelated.

Comment 9 Ken Schroeder 2014-09-17 16:35:31 UTC
Doing event correlation between rabbit logs and clients showing up in the warning I cannot tie them together based on timestamp.

Comment 10 John Eckersberg 2014-09-19 14:31:21 UTC
Can you elaborate a bit more on your HA setup, minus haproxy?  Are you load balancing the rabbit servers behind a VIP via some other means, e.g. hardware load balancer?  Are you doing it client side with kombu, e.g. setting rabbit_hosts=host1,host2,host3?  Something else?

Comment 11 Ken Schroeder 2014-09-19 15:24:08 UTC
The HA architecture we are using is multiple rabbit nodes with HA Queues and Kombu on the client side.  There is no VIP or load balancer in the messaging architecture.

Comment 12 John Eckersberg 2014-09-29 17:56:05 UTC
Is there possibly a stateful firewall in between the clients and the rabbitmq nodes?  The more I look at this, the more it looks like something in the middle is resetting the connection.  Especially since you are seeing the resets both on the server and client sides.

I know you said you couldn't find any correlation between the client and server logs.  Because rabbitmq does not do heartbeats, the client often will not notice the reset until some time far in the future than when the reset actually occurs.  However the server will notice earlier, since it tries to push messages to the consumers and fails.  This is a known issue and is being tracked in bug 1129242.

If there is a firewall in the middle that tears down idle connections, it would explain the behavior you are seeing.

Comment 13 Ken Schroeder 2014-09-30 14:15:21 UTC
The rabbit instances are running on a Nova KVM hypervisor with OVS and Neutron provider network.  The only firewall in the data path is OVS security groups which open and functioning.

Comment 14 Ken Schroeder 2014-09-30 15:11:27 UTC
What is the proper method to tune File Handles Limit for rabbitmq-server process. It seems adding to /etc/security/limits.conf does not actually change the limits for the rabbitmq-server process.  Modifying the startup script is not really something that can be managed cleanly by puppet.  


[root@svl6-csl-b-rabbitmq-001 ~]# ps -ef|grep rabbit
rabbitmq  1062     1 11 02:26 ?        01:25:19 /usr/lib64/erlang/erts-5.10.4/bin/beam.smp -W w -K true -A30 -P 1048576 -- -root /usr/lib64/erlang -progname erl -- -home /var/lib/rabbitmq -- -pa /usr/lib/rabbitmq/lib/rabbitmq_server-3.3.5/sbin/../ebin -noshell -noinput -s rabbit boot -sname rabbit@svl6-csl-b-rabbitmq-001 -boot start_sasl -config /etc/rabbitmq/rabbitmq -kernel inet_default_connect_options [{nodelay,true}] -rabbit tcp_listeners [{"auto",5672}] -sasl errlog_type error -sasl sasl_error_logger false -rabbit error_logger {file,"/var/log/rabbitmq/rabbit"} -rabbit sasl_error_logger {file,"/var/log/rabbitmq/rabbit"} -rabbit enabled_plugins_file "/etc/rabbitmq/enabled_plugins" -rabbit plugins_dir "/usr/lib/rabbitmq/lib/rabbitmq_server-3.3.5/sbin/../plugins" -rabbit plugins_expand_dir "/var/lib/rabbitmq/mnesia/rabbit@svl6-csl-b-rabbitmq-001-plugins-expand" -os_mon start_cpu_sup false -os_mon start_disksup false -os_mon start_memsup false -mnesia dir "/var/lib/rabbitmq/mnesia/rabbit@svl6-csl-b-rabbitmq-001" -kernel inet_dist_listen_min 25672 -kernel inet_dist_listen_max 25672       
[root@svl6-csl-b-rabbitmq-001 ~]# cat /proc/1062/limits |grep file
Max file size             unlimited            unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max open files            1024                 4096                 files    

[root@svl6-csl-b-rabbitmq-001 ~]# su rabbitmq -s /bin/sh -c 'ulimit -n'
100000

Comment 15 Ken Schroeder 2014-09-30 15:15:07 UTC
We have also modified /etc/default/rabbitmq-server but that is not having impact either it would seem.
[root@svl6-csl-b-rabbitmq-001 ~]# cat /etc/default/rabbitmq-server 
# This file is sourced by /etc/init.d/rabbitmq-server. Its primary
# reason for existing is to allow adjustment of system limits for the
# rabbitmq-server process.
#
# Maximum number of open file handles. This will need to be increased
# to handle many simultaneous connections. Refer to the system
# documentation for ulimit (in man bash) for more information.
#
ulimit -n 102400
[root@svl6-csl-b-

Comment 16 John Eckersberg 2014-09-30 15:22:30 UTC
Under systemd, you must configure the file handle limit under the service unit.  This will increase it to 102400:

cp /usr/lib/systemd/system/rabbitmq-server.service /etc/systemd/system/
sed -i '/^ExecStopPost.*/a LimitNOFILE=102400' /etc/systemd/system/rabbitmq-server.service
systemctl daemon-reload
service rabbitmq-server restart

Comment 17 John Eckersberg 2014-09-30 15:59:04 UTC
See also bug 1148063 which is to modify the default rabbitmq file handle limits.

Comment 18 Ken Schroeder 2014-10-14 15:05:34 UTC
It would appear that the problems we were having has been resolved by enabling rabbitmq config for keepalive function, additionally we tuned ipv4.keepalive settings in /etc/sysctl.conf on the rabbitmq servers to a much lower value than the defaults.   Since performing those updates the nova and other service connections have stabilized.  Believe this can be closed.

Comment 19 Peter Lemenkov 2016-11-08 14:53:17 UTC
*** Bug 1174929 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.