Bug 2187966
| Summary: | handshake_timeout,frame_header errors in RabbitMQ logs in RHOSP 16.1.8 deployment with internal TLS | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Alex Stupnikov <astupnik> |
| Component: | python-amqp | Assignee: | OSP Team <rhos-maint> |
| Status: | NEW --- | QA Contact: | Nobody <nobody> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 16.1 (Train) | CC: | apevec, lhh, lmiccini |
| Target Milestone: | --- | Keywords: | Triaged |
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | All | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Description of problem: When investigating RabbitMQ crash in customer's deployment I have found numerous errors like 2023-04-17 15:07:50.755 [info] <0.22890.66> accepting AMQP connection <0.22890.66> (192.168.1.19:39862 -> 192.168.1.19:5672) 2023-04-17 15:07:50.755 [error] <0.22890.66> closing AMQP connection <0.22890.66> (192.168.1.19:39862 -> 192.168.1.19:5672): {handshake_timeout,handshake} OR 2023-04-17 15:07:40.754 [info] <0.22495.66> accepting AMQP connection <0.22495.66> (192.168.1.19:39868 -> 192.168.1.19:5672) 2023-04-17 15:07:50.753 [error] <0.22495.66> closing AMQP connection <0.22495.66> (192.168.1.19:39868 -> 192.168.1.19:5672): {handshake_timeout,frame_header} I have collected tcpdump to understand this problem better and from tcpdump it looks like client stops participating in connection establishment after initial exchange (when compared with "good connections"). Some time ago there was a known issue in python-amqp affecting environments where TLS was used to establish communications: https://bugs.launchpad.net/oslo.messaging/+bug/1800957 https://review.opendev.org/c/openstack/oslo.messaging/+/638735/1/releasenotes/notes/amqp-tls-issue-57c7f6ea894e03d7.yaml But in RHOSP 16.1 we use newer version of python-amqp. Reporting this as a bug to request a second look from engineering. I will provide information about collected data privately. Version-Release number of selected component (if applicable): Red Hat OpenStack Platform release 16.1.8 (Train) How reproducible: Errors are generated sporadically in /var/log/containers/rabbitmq/rabbit Actual results: Occasional handshake_timeout errors in /var/log/containers/rabbitmq/rabbit Expected results: No handshake_timeout errors in RabbitMQ logs