Bug 2187966 - handshake_timeout,frame_header errors in RabbitMQ logs in RHOSP 16.1.8 deployment with internal TLS
Summary: handshake_timeout,frame_header errors in RabbitMQ logs in RHOSP 16.1.8 deploy...
Keywords:
Status: NEW
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-amqp
Version: 16.1 (Train)
Hardware: All
OS: All
medium
medium
Target Milestone: ---
: ---
Assignee: OSP Team
QA Contact: Nobody
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-04-19 09:30 UTC by Alex Stupnikov
Modified: 2023-08-11 20:19 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-24361 0 None None None 2023-04-19 09:30:59 UTC

Description Alex Stupnikov 2023-04-19 09:30:26 UTC
Description of problem:
When investigating RabbitMQ crash in customer's deployment I have found numerous errors like

    2023-04-17 15:07:50.755 [info] <0.22890.66> accepting AMQP connection <0.22890.66> (192.168.1.19:39862 -> 192.168.1.19:5672)
    2023-04-17 15:07:50.755 [error] <0.22890.66> closing AMQP connection <0.22890.66> (192.168.1.19:39862 -> 192.168.1.19:5672):
    {handshake_timeout,handshake}

OR

    2023-04-17 15:07:40.754 [info] <0.22495.66> accepting AMQP connection <0.22495.66> (192.168.1.19:39868 -> 192.168.1.19:5672)
    2023-04-17 15:07:50.753 [error] <0.22495.66> closing AMQP connection <0.22495.66> (192.168.1.19:39868 -> 192.168.1.19:5672):
    {handshake_timeout,frame_header}

I have collected tcpdump to understand this problem better and from tcpdump it looks like client stops participating in connection establishment after initial exchange (when compared with "good connections"). Some time ago there was a known issue in python-amqp affecting environments where TLS was used to establish communications:
https://bugs.launchpad.net/oslo.messaging/+bug/1800957
https://review.opendev.org/c/openstack/oslo.messaging/+/638735/1/releasenotes/notes/amqp-tls-issue-57c7f6ea894e03d7.yaml

But in RHOSP 16.1 we use newer version of python-amqp. Reporting this as a bug to request a second look from engineering. I will provide information about collected data privately.

Version-Release number of selected component (if applicable):
Red Hat OpenStack Platform release 16.1.8 (Train)

How reproducible:
Errors are generated sporadically in /var/log/containers/rabbitmq/rabbit

Actual results:
Occasional handshake_timeout errors in /var/log/containers/rabbitmq/rabbit

Expected results:
No handshake_timeout errors in RabbitMQ logs


Note You need to log in before you can comment on or make changes to this bug.