Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1977711

Summary: oslo.cache's pymemcache backend doesn't recover from socket disconnection
Product: Red Hat OpenStack Reporter: Damien Ciabrini <dciabrin>
Component: python-oslo-cacheAssignee: Hervé Beraud <hberaud>
Status: CLOSED ERRATA QA Contact: MilanaLevy <millevy>
Severity: high Docs Contact:
Priority: high    
Version: 16.2 (Train)CC: alee, apevec, cylopez, dciabrin, elicohen, ggrasza, hberaud, hrybacki, jpretori, jschluet, lhh, lmiccini, millevy
Target Milestone: z4Keywords: Triaged
Target Release: 16.2 (Train on RHEL 8.4)Flags: hberaud: needinfo-
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: python-oslo-cache-1.37.1-2.20220111022422.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-12-07 19:21:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1163328, 1988996, 2071945    
Bug Blocks: 1761768, 1989732    

Description Damien Ciabrini 2021-06-30 10:26:16 UTC
Description of problem:
When oslo.cache is enabled and configured to target pymemcache (e.g. memcached + TLS-e),
pymemcache is managing the sockets that connect to memcached.

With this configuration, there is no automatic retry in pymemcache on socket error
or socket disconnection. Instead, pymemcache closes the invalid socket and raises
an Exception down the stack. This makes the oslo cache call fail, and any subsequent
calls will also fail until all bad sockets are hit and closed.

Try can consistently been triggered by:
  1. running "openstack service list" on the overcloud to create connection to memcache
  
  2. restart memcached with "systemctl restart tripleo_memcached" to
     force the connected sockets to close one side of its connection.
     This will leave <x> opened sockets on the controller:
     the keystone service will have its side of the socket still
     opened.

  3. the next call to "openstack service list" will fail because
     pymemcache will hit a half-closed socket, close its side, and
     raise an exception

  4. the keystone service will recover only once the remaining <x>-1 half-closed sockets
     get hit and closed.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. enable keystone cache with pymemcache as backend.

[cache]
backend = dogpile.cache.pymemcache
enabled = true
memcache_servers = 127.0.0.1:11211

2. trigger an API call to that node, e.g.:

openstack service list

3. restart memcache on the node

systemctl restart tripleo_memcached

4. retry the same API call

openstack service list

Actual results:

the last "service list" call will fail with "Internal Server Error (HTTP 500)"

Expected results:

the call should work

Additional info:

Comment 11 Hervé Beraud 2022-04-26 11:47:50 UTC
Fixed in version python-oslo-cache-1.37.1-2.20220111022422.el8ost for rhos-16.2-rhel-8-trunk-candidate

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=44868389

Comment 27 errata-xmlrpc 2022-12-07 19:21:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.4), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:8794