Bug 1734203

Summary: RabbitMQ connection timed out every second from nova-compute
Product: Red Hat OpenStack Reporter: Robin Cernin <rcernin>
Component: rabbitmq-serverAssignee: Peter Lemenkov <plemenko>
Status: CLOSED INSUFFICIENT_DATA QA Contact: pkomarov
Severity: medium Docs Contact:
Priority: medium    
Version: 13.0 (Queens)CC: apevec, hberaud, jeckersb, lhh, lmiccini, nalmond
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-25 11:25:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Robin Cernin 2019-07-30 01:10:44 UTC
Description of problem:

RabbitMQ timeouts and reconnection every second from nova-compute:

oslo.messaging._drivers.impl_rabbit [req-877d602c-46ed-4c84-86a0-2a91161efbc6 - - - - -] [b0bcc530-82da-45ac-8af5-75adee2313dd] AMQP server on overcloud-controller-1.internalapi.localdomain:5672 is unreachable: timed out. Trying again in 1 seconds.: timeout: timed out

2019-07-29 21:25:17.331 1 INFO oslo.messaging._drivers.impl_rabbit [req-877d602c-46ed-4c84-86a0-2a91161efbc6 - - - - -] [b0bcc530-82da-45ac-8af5-75adee2313dd] Reconnected to AMQP server on overcloud-controller-1.internalapi.localdomain:5672 via [amqp] client with port 48808.

2019-07-29 21:27:17.333 1 ERROR oslo.messaging._drivers.impl_rabbit [req-877d602c-46ed-4c84-86a0-2a91161efbc6 - - - - -] [b0bcc530-82da-45ac-8af5-75adee2313dd] AMQP server on overcloud-controller-1.internalapi.localdomain:5672 is unreachable: timed out. Trying again in 1 seconds.: timeout: timed out

2019-07-29 21:27:18.343 1 INFO oslo.messaging._drivers.impl_rabbit [req-877d602c-46ed-4c84-86a0-2a91161efbc6 - - - - -] [b0bcc530-82da-45ac-8af5-75adee2313dd] Reconnected to AMQP server on overcloud-controller-1.internalapi.localdomain:5672 via [amqp] client with port 48820.



Version-Release number of selected component (if applicable):

[overcloud-controller-1]$ grep rabbitmq installed-rpms 
puppet-rabbitmq-8.1.1-0.20180216013831.d4b06b7.el7ost.noarch Mon Mar  4 20:03:41 2019
rabbitmq-server-3.6.15-3.el7ost.noarch                      Mon Mar  4 20:01:05 2019


How reproducible:

Can't reproduce after RabbitMQ docker bundle restart.

Actual results:

in the nova-compute.log we can see:

2019-07-29 21:27:17.333 1 ERROR oslo.messaging._drivers.impl_rabbit [req-877d602c-46ed-4c84-86a0-2a91161efbc6 - - - - -] [b0bcc530-82da-45ac-8af5-75adee2313dd] AMQP server on overcloud-controller-1.internalapi.localdomain:5672 is unreachable: timed out. Trying again in 1 seconds.: timeout: timed out

2019-07-29 21:27:18.343 1 INFO oslo.messaging._drivers.impl_rabbit [req-877d602c-46ed-4c84-86a0-2a91161efbc6 - - - - -] [b0bcc530-82da-45ac-8af5-75adee2313dd] Reconnected to AMQP server on overcloud-controller-1.internalapi.localdomain:5672 via [amqp] client with port 48820.

in the rabbitmq log we can see error such as:

Client unexpectedly closed TCP connection.

Expected results:

Additional info:

Still uploading the logs.