Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1819979

Summary: [Backport][RHOSP13] Drop expired connections before retrieving from the queue
Product: Red Hat OpenStack Reporter: Michele Valsecchi <mvalsecc>
Component: python-oslo-cacheAssignee: Hervé Beraud <hberaud>
Status: CLOSED ERRATA QA Contact: nlevinki <nlevinki>
Severity: medium Docs Contact:
Priority: medium    
Version: 13.0 (Queens)CC: apevec, dabarzil, jschluet, lhh, lmiccini
Target Milestone: z14Keywords: Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: python-oslo-cache-1.28.1-2.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-06-24 11:41:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michele Valsecchi 2020-04-02 02:01:50 UTC
Description of problem:

It looks like due to a way oslo_cache.memcache_pool implements reaping of old connections, whenever "def acquire()" is called, 
in some cases the connection returned to the caller from the pool can be already be stale (in CLOSE_WAIT state).

Version-Release number of selected component (if applicable):
RHOSP13

How reproducible:
very hard to reproduce, but it's definitely real: as master branch has merged[3] the fix[4] for it.

Steps to Reproduce:
See [1] for details

Actual results:
Services relying on Oslo will fail. In [1] you can see it's vnc to fail, in my case VM can not boot properly:

~~~
$ openstack server show xxx
+-------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+
| Field                               | Value                                                                                                                                         |
+-------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+
| OS-EXT-STS:power_state              | Shutdown                                                                                                                                      |
| OS-EXT-STS:task_state               | None                                                                                                                                          |
| OS-EXT-STS:vm_state                 | error                                                                                                                                         |
| OS-SRV-USG:launched_at              | 2020-01-22T01:43:16.000000                                                                                                                    |
| OS-SRV-USG:terminated_at            | None                                                                                                                                          |
| created                             | 2020-01-22T01:41:12Z                                                                                                                          |
| fault                               | {u'message': u'Unable to get a connection from pool id 139676967376016 after 10 seconds.', u'code': 400, u'created': u'2020-03-18T19:08:11Z'} |
| status                              | ERROR                                                                                                                                         |
+-------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+
~~~

Expected results:
Requesting a backport to RHOSP13.

[2] queens branch should have [4] backported, as we can see it is already in [1] train.



[0] https://bugs.launchpad.net/oslo.cache/+bug/1775341
[1] https://github.com/openstack/oslo.cache/blob/stable/train/oslo_cache/_memcache_pool.py#L135
[2] https://github.com/openstack/oslo.cache/blob/stable/queens/oslo_cache/_memcache_pool.py#L137
[3] https://github.com/openstack/oslo.cache/blob/master/oslo_cache/_memcache_pool.py#L135
[4] https://opendev.org/openstack/oslo.cache/commit/43c6279a7eff0df7ab22155fb6c165f551cdcf8d

Comment 2 Hervé Beraud 2020-04-02 14:13:03 UTC
Fixed with python-oslo-cache-1.28.1-2.el7ost

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=27665209

Comment 15 errata-xmlrpc 2020-06-24 11:41:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2719