Bug 1740681 - heartbeats missed and connection timeout
Summary: heartbeats missed and connection timeout
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-amqp
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z9
: 13.0 (Queens)
Assignee: RHOS Maint
QA Contact: pkomarov
URL:
Whiteboard:
: 1725917 (view as bug list)
Depends On:
Blocks: 1741267
TreeView+ depends on / blocked
 
Reported: 2019-08-13 13:22 UTC by Hervé Beraud
Modified: 2019-12-29 18:17 UTC (History)
6 users (show)

Fixed In Version: python-amqp-2.3.2-4.el7ost
Doc Type: Bug Fix
Doc Text:
Previously, an SSLError timeout that was not managed correctly caused a connection issue that impacted oslo.messaging rabbitmq driver connections and oslo.messaging. With this update, SSLError timeouts are treated as socket timeouts, which mean that oslo.messaging and services stop logging errors to rabbitmq heartbeat and the connection between services and the rabbitmq server remains stable.
Clone Of:
: 1741267 (view as bug list)
Environment:
Last Closed: 2019-11-07 14:04:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:3791 0 None None None 2019-11-07 14:04:51 UTC

Internal Links: 1733930 1747226

Description Hervé Beraud 2019-08-13 13:22:55 UTC
Description of problem:

py-amqp lack on SSLError timeout that are not properly managed.

If I'm right possibly without specifically handling this case (SSLError timeout), the socket.timeout() was not raised sometimes causing the connection to lock up.

This is related to a python bug https://bugs.python.org/issue10272

The current version of py-amqp on my freshly deployed version of OSP13 is:

```
$ rpm -qa | grep amqp
python2-amqp-2.3.2-3.el7ost.noarch
```

py-amqp was patched for the SSL issue in version 2.4.1

```
$ git log v2.4.0..v2.4.1 --no-merges --oneline
ba132f4 Bump version: 2.4.0 → 2.4.1
e669e83 Updated changelog.
2356f42 Treat EWOULDBLOCK as timeout (#253)
bf122a0 Always treat SSLError timeouts as socket timeouts (#247)
457b3ba Support float read_timeout/write_timeout (#246)
40e0ef5 Add unit test for SSLTransport _write function (#251)
e45ea3e read_frame python3 compatible for large payloads (#248)
734305d Add unit test for test_wrap_socket_sni (#250)
60acabc Fix crash in basic_publish when broker does not support connection.blocked capability (#244)
f507172 basic_consume() should return consumer tag instead of tuple (#240)
d09a0b0 Parametrize product_version in integration tests (#236)
0f7ffd2 Bump PyPy to 6.0. Add PyPy3 to the build process. (#238)
```

So we don't have the fix (bf122a0 Always treat SSLError timeouts as socket timeouts (#247)) embdded in our version (2.3.2).

The fix was released through this patch https://github.com/celery/py-amqp/pull/247

Possibly this BZ is also related to:
- https://bugzilla.redhat.com/show_bug.cgi?id=1725917
- https://bugzilla.redhat.com/show_bug.cgi?id=1734203
- https://bugzilla.redhat.com/show_bug.cgi?id=1733930
- https://bugs.launchpad.net/ubuntu/+source/oslo.messaging/+bug/1800957

Can you release a new version of py-amqp related to 2.4.1 or higher with the patch embbded for OSP13/14...

Version-Release number of selected component (if applicable):

2.3.2-2


How reproducible:

Unknown


Steps to Reproduce:
1. 
2.
3.

Actual results:

In few circumstances some amqp heartbeat from oslo.messaging can fail and the driver can be disconnected and we can observe timeout and error logs related in services logs (nova-api by example)(cf. https://bugzilla.redhat.com/show_bug.cgi?id=1725917 for more details)

Expected results:

No error logs related to missed heartbeat and connection timeout

Comment 1 Hervé Beraud 2019-08-14 15:35:57 UTC
We can't bump the package version due to the openstack requirements constraints so I'll only backport the needed fix there:

bf122a0 Always treat SSLError timeouts as socket timeouts (#247)

Comment 2 Hervé Beraud 2019-08-14 15:58:14 UTC
Fixed in version python-amqp-2.1.4-3.el7ost

Comment 3 Hervé Beraud 2019-08-14 18:01:02 UTC
Cross tagged from OSP14 with python-amqp-2.3.2-4.el7ost

```
$ brew list-tag-history --build=python-amqp-2.3.2-4.el7ost
Wed Aug 14 19:39:34 2019: python-amqp-2.3.2-4.el7ost tagged into rhos-14.0-rhel-7-candidate by hberaud [still active]
Wed Aug 14 19:56:26 2019: python-amqp-2.3.2-4.el7ost tagged into rhos-13.0-rhel-7-candidate by hberaud [still active]
```

Comment 4 Hervé Beraud 2019-10-07 09:26:08 UTC
*** Bug 1725917 has been marked as a duplicate of this bug. ***

Comment 13 pkomarov 2019-10-23 23:38:19 UTC
Verified , 

[stack@undercloud-0 ~]$ rhos-release -L
Installed repositories (rhel-7.7):
  13
  ceph-3
  ceph-osd-3
  rhel-7.7
[stack@undercloud-0 ~]$ cat core_puddle_version 
2019-10-23.1[stack@undercloud-0 ~]$ 
[stack@undercloud-0 ~]$ 
[stack@undercloud-0 ~]$  rpm -qa | grep amqp
python2-amqp-2.3.2-5.el7ost.noarch
[stack@undercloud-0 ~]$ 
[stack@undercloud-0 ~]$ rpm -q --changelog python2-amqp-2.3.2-5.el7ost.noarch|grep SSL
- Always treat SSLError timeouts as socket timeouts (#247) (rhbz#1741267)

Comment 15 errata-xmlrpc 2019-11-07 14:04:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3791


Note You need to log in before you can comment on or make changes to this bug.