Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2187966

Summary:	handshake_timeout,frame_header errors in RabbitMQ logs in RHOSP 16.1.8 deployment with internal TLS
Product:	Red Hat OpenStack	Reporter:	Alex Stupnikov <astupnik>
Component:	python-amqp	Assignee:	OSP Team <rhos-maint>
Status:	CLOSED NEXTRELEASE	QA Contact:	Nobody <nobody>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	16.1 (Train)	CC:	apevec, lhh, lmiccini
Target Milestone:	---	Keywords:	Triaged
Target Release:	---
Hardware:	All
OS:	All
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-11-16 08:39:50 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Alex Stupnikov 2023-04-19 09:30:26 UTC

Description of problem:
When investigating RabbitMQ crash in customer's deployment I have found numerous errors like

    2023-04-17 15:07:50.755 [info] <0.22890.66> accepting AMQP connection <0.22890.66> (192.168.1.19:39862 -> 192.168.1.19:5672)
    2023-04-17 15:07:50.755 [error] <0.22890.66> closing AMQP connection <0.22890.66> (192.168.1.19:39862 -> 192.168.1.19:5672):
    {handshake_timeout,handshake}

OR

    2023-04-17 15:07:40.754 [info] <0.22495.66> accepting AMQP connection <0.22495.66> (192.168.1.19:39868 -> 192.168.1.19:5672)
    2023-04-17 15:07:50.753 [error] <0.22495.66> closing AMQP connection <0.22495.66> (192.168.1.19:39868 -> 192.168.1.19:5672):
    {handshake_timeout,frame_header}

I have collected tcpdump to understand this problem better and from tcpdump it looks like client stops participating in connection establishment after initial exchange (when compared with "good connections"). Some time ago there was a known issue in python-amqp affecting environments where TLS was used to establish communications:
https://bugs.launchpad.net/oslo.messaging/+bug/1800957
https://review.opendev.org/c/openstack/oslo.messaging/+/638735/1/releasenotes/notes/amqp-tls-issue-57c7f6ea894e03d7.yaml

But in RHOSP 16.1 we use newer version of python-amqp. Reporting this as a bug to request a second look from engineering. I will provide information about collected data privately.

Version-Release number of selected component (if applicable):
Red Hat OpenStack Platform release 16.1.8 (Train)

How reproducible:
Errors are generated sporadically in /var/log/containers/rabbitmq/rabbit

Actual results:
Occasional handshake_timeout errors in /var/log/containers/rabbitmq/rabbit

Expected results:
No handshake_timeout errors in RabbitMQ logs

Comment 11 Luca Miccini 2023-11-16 08:39:50 UTC

fixed in 16.2