Bug 1819979 - [Backport][RHOSP13] Drop expired connections before retrieving from the queue
Summary: [Backport][RHOSP13] Drop expired connections before retrieving from the queue
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-oslo-cache
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: z14
: 13.0 (Queens)
Assignee: Hervé Beraud
QA Contact: nlevinki
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-02 02:01 UTC by Michele Valsecchi
Modified: 2023-09-07 22:39 UTC (History)
5 users (show)

Fixed In Version: python-oslo-cache-1.28.1-2.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-06-24 11:41:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1775341 0 None None None 2020-04-02 02:02:35 UTC
Red Hat Issue Tracker OSP-3210 0 None None None 2022-08-23 18:37:25 UTC

Description Michele Valsecchi 2020-04-02 02:01:50 UTC
Description of problem:

It looks like due to a way oslo_cache.memcache_pool implements reaping of old connections, whenever "def acquire()" is called, 
in some cases the connection returned to the caller from the pool can be already be stale (in CLOSE_WAIT state).

Version-Release number of selected component (if applicable):
RHOSP13

How reproducible:
very hard to reproduce, but it's definitely real: as master branch has merged[3] the fix[4] for it.

Steps to Reproduce:
See [1] for details

Actual results:
Services relying on Oslo will fail. In [1] you can see it's vnc to fail, in my case VM can not boot properly:

~~~
$ openstack server show xxx
+-------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+
| Field                               | Value                                                                                                                                         |
+-------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+
| OS-EXT-STS:power_state              | Shutdown                                                                                                                                      |
| OS-EXT-STS:task_state               | None                                                                                                                                          |
| OS-EXT-STS:vm_state                 | error                                                                                                                                         |
| OS-SRV-USG:launched_at              | 2020-01-22T01:43:16.000000                                                                                                                    |
| OS-SRV-USG:terminated_at            | None                                                                                                                                          |
| created                             | 2020-01-22T01:41:12Z                                                                                                                          |
| fault                               | {u'message': u'Unable to get a connection from pool id 139676967376016 after 10 seconds.', u'code': 400, u'created': u'2020-03-18T19:08:11Z'} |
| status                              | ERROR                                                                                                                                         |
+-------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+
~~~

Expected results:
Requesting a backport to RHOSP13.

[2] queens branch should have [4] backported, as we can see it is already in [1] train.



[0] https://bugs.launchpad.net/oslo.cache/+bug/1775341
[1] https://github.com/openstack/oslo.cache/blob/stable/train/oslo_cache/_memcache_pool.py#L135
[2] https://github.com/openstack/oslo.cache/blob/stable/queens/oslo_cache/_memcache_pool.py#L137
[3] https://github.com/openstack/oslo.cache/blob/master/oslo_cache/_memcache_pool.py#L135
[4] https://opendev.org/openstack/oslo.cache/commit/43c6279a7eff0df7ab22155fb6c165f551cdcf8d

Comment 2 Hervé Beraud 2020-04-02 14:13:03 UTC
Fixed with python-oslo-cache-1.28.1-2.el7ost

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=27665209

Comment 15 errata-xmlrpc 2020-06-24 11:41:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2719


Note You need to log in before you can comment on or make changes to this bug.