Bug 1741267 - heartbeats missed and connection timeout
Summary: heartbeats missed and connection timeout
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-amqp
Version: 14.0 (Rocky)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z8
: 14.0 (Rocky)
Assignee: RHOS Maint
QA Contact: pkomarov
URL:
Whiteboard:
Depends On: 1740681
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-14 15:59 UTC by Hervé Beraud
Modified: 2019-11-06 16:54 UTC (History)
6 users (show)

Fixed In Version: python-amqp-2.3.2-4.el7ost
Doc Type: Bug Fix
Doc Text:
Previously, SSLError timeoutes were not handled properly; socket.timeout() was not raised. This could cause rabbitmq driver connections to lockup. This patch ensures SSLError timeouts are treated as socket timeouts so that oslo.messaging and services log errors related to rabbitmq heartbeat and that the connection between service and the rabbitmq server remains stable.
Clone Of: 1740681
Environment:
Last Closed: 2019-11-06 16:53:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:3747 0 None None None 2019-11-06 16:54:14 UTC

Description Hervé Beraud 2019-08-14 15:59:55 UTC
+++ This bug was initially created as a clone of Bug #1740681 +++

Description of problem:

py-amqp lack on SSLError timeout that are not properly managed.

If I'm right possibly without specifically handling this case (SSLError timeout), the socket.timeout() was not raised sometimes causing the connection to lock up.

This is related to a python bug https://bugs.python.org/issue10272

The current version of py-amqp on my freshly deployed version of OSP13 is:

```
$ rpm -qa | grep amqp
python2-amqp-2.3.2-3.el7ost.noarch
```

py-amqp was patched for the SSL issue in version 2.4.1

```
$ git log v2.4.0..v2.4.1 --no-merges --oneline
ba132f4 Bump version: 2.4.0 → 2.4.1
e669e83 Updated changelog.
2356f42 Treat EWOULDBLOCK as timeout (#253)
bf122a0 Always treat SSLError timeouts as socket timeouts (#247)
457b3ba Support float read_timeout/write_timeout (#246)
40e0ef5 Add unit test for SSLTransport _write function (#251)
e45ea3e read_frame python3 compatible for large payloads (#248)
734305d Add unit test for test_wrap_socket_sni (#250)
60acabc Fix crash in basic_publish when broker does not support connection.blocked capability (#244)
f507172 basic_consume() should return consumer tag instead of tuple (#240)
d09a0b0 Parametrize product_version in integration tests (#236)
0f7ffd2 Bump PyPy to 6.0. Add PyPy3 to the build process. (#238)
```

So we don't have the fix (bf122a0 Always treat SSLError timeouts as socket timeouts (#247)) embdded in our version (2.3.2).

The fix was released through this patch https://github.com/celery/py-amqp/pull/247

Possibly this BZ is also related to:
- https://bugzilla.redhat.com/show_bug.cgi?id=1725917
- https://bugzilla.redhat.com/show_bug.cgi?id=1734203
- https://bugzilla.redhat.com/show_bug.cgi?id=1733930
- https://bugs.launchpad.net/ubuntu/+source/oslo.messaging/+bug/1800957

Can you release a new version of py-amqp related to 2.4.1 or higher with the patch embbded for OSP13/14...

Version-Release number of selected component (if applicable):

2.3.2-2


How reproducible:

Unknown


Steps to Reproduce:
1. 
2.
3.

Actual results:

In few circumstances some amqp heartbeat from oslo.messaging can fail and the driver can be disconnected and we can observe timeout and error logs related in services logs (nova-api by example)(cf. https://bugzilla.redhat.com/show_bug.cgi?id=1725917 for more details)

Expected results:

No error logs related to missed heartbeat and connection timeout

--- Additional comment from Hervé Beraud on 2019-08-14 15:35:57 UTC ---

We can't bump the package version due to the openstack requirements constraints so I'll only backport the needed fix there:

bf122a0 Always treat SSLError timeouts as socket timeouts (#247)

--- Additional comment from Hervé Beraud on 2019-08-14 15:58:14 UTC ---

Fixed in version python-amqp-2.1.4-3.el7ost

Comment 1 Hervé Beraud 2019-08-14 17:41:34 UTC
python-amqp-2.3.2-4.el7ost

Comment 3 pkomarov 2019-10-23 22:02:25 UTC
Verified , 

(undercloud) [stack@undercloud-0 ~]$ rhos-release -L
Installed repositories (rhel-7.7):
  14
  ceph-3
  ceph-osd-3
  rhel-7.7
(undercloud) [stack@undercloud-0 ~]$ cat core_puddle_version 
2019-10-21.1(undercloud) [stack@undercloud-0 ~]$ rpm -qa | grep amqp
python2-amqp-2.3.2-5.el7ost.noarch

(undercloud) [stack@undercloud-0 ~]$ rpm -q --changelog python2-amqp-2.3.2-5.el7ost.noarch|grep SSL
- Always treat SSLError timeouts as socket timeouts (#247) (rhbz#1741267)

Comment 5 errata-xmlrpc 2019-11-06 16:53:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3747


Note You need to log in before you can comment on or make changes to this bug.