DescriptionDamien Ciabrini
2021-06-30 10:26:16 UTC
Description of problem:
When oslo.cache is enabled and configured to target pymemcache (e.g. memcached + TLS-e),
pymemcache is managing the sockets that connect to memcached.
With this configuration, there is no automatic retry in pymemcache on socket error
or socket disconnection. Instead, pymemcache closes the invalid socket and raises
an Exception down the stack. This makes the oslo cache call fail, and any subsequent
calls will also fail until all bad sockets are hit and closed.
Try can consistently been triggered by:
1. running "openstack service list" on the overcloud to create connection to memcache
2. restart memcached with "systemctl restart tripleo_memcached" to
force the connected sockets to close one side of its connection.
This will leave <x> opened sockets on the controller:
the keystone service will have its side of the socket still
opened.
3. the next call to "openstack service list" will fail because
pymemcache will hit a half-closed socket, close its side, and
raise an exception
4. the keystone service will recover only once the remaining <x>-1 half-closed sockets
get hit and closed.
Version-Release number of selected component (if applicable):
How reproducible:
Always
Steps to Reproduce:
1. enable keystone cache with pymemcache as backend.
[cache]
backend = dogpile.cache.pymemcache
enabled = true
memcache_servers = 127.0.0.1:11211
2. trigger an API call to that node, e.g.:
openstack service list
3. restart memcache on the node
systemctl restart tripleo_memcached
4. retry the same API call
openstack service list
Actual results:
the last "service list" call will fail with "Internal Server Error (HTTP 500)"
Expected results:
the call should work
Additional info:
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.4), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2022:8794
Description of problem: When oslo.cache is enabled and configured to target pymemcache (e.g. memcached + TLS-e), pymemcache is managing the sockets that connect to memcached. With this configuration, there is no automatic retry in pymemcache on socket error or socket disconnection. Instead, pymemcache closes the invalid socket and raises an Exception down the stack. This makes the oslo cache call fail, and any subsequent calls will also fail until all bad sockets are hit and closed. Try can consistently been triggered by: 1. running "openstack service list" on the overcloud to create connection to memcache 2. restart memcached with "systemctl restart tripleo_memcached" to force the connected sockets to close one side of its connection. This will leave <x> opened sockets on the controller: the keystone service will have its side of the socket still opened. 3. the next call to "openstack service list" will fail because pymemcache will hit a half-closed socket, close its side, and raise an exception 4. the keystone service will recover only once the remaining <x>-1 half-closed sockets get hit and closed. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. enable keystone cache with pymemcache as backend. [cache] backend = dogpile.cache.pymemcache enabled = true memcache_servers = 127.0.0.1:11211 2. trigger an API call to that node, e.g.: openstack service list 3. restart memcache on the node systemctl restart tripleo_memcached 4. retry the same API call openstack service list Actual results: the last "service list" call will fail with "Internal Server Error (HTTP 500)" Expected results: the call should work Additional info: