Description of problem: Some time ago I have reported a bug #1893205 : memcached failed periodically in RHOSP 16.1 deployments with "beefy" controller nodes. It looks like this problem is caused by Neutron, which opens too many connections to memcached and exceeds its limit [1]. Upstream bug [2] may be related. Original bug has three cases attached, two of them (02782690 and 02829231) have sosreports collected at the time of the failure: you can check latest sosreports. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1893205#c13 [2] https://bugs.launchpad.net/oslo.cache/+bug/1888394
Hello, The root cause is a python-memcached issue, I think it's due to the fact that when python-memcached try to get a socket and when the `flush_on_reconnect` param is passed to python-memcached it will do a `flush_all` [1][2]. This behavior have been added during ussuri by adding passing the `flush_on_reconnect` param to memcache pooled backend from oslo.cache [3]. The goal was to release memcached socket, that was due to a race condition if I remember correctly [4]. This feature was needed by keystone, and we tried to make this feature optional on oslo.cache [5], however we didn't reached a consensus and this patch have been abandonned since... AFAIK identical bugs have been reported upstream [6][7]. The root cause is the python-memcached "bug" [1][2][7], however I don't know if this really something expected by that library, I'll try to discuss with the maintainers. Let's move this bug to oslo.cache for now, I'll reopen and surely I will become the owner the abandonned patch [5], to try to make this feature optional. [1] https://github.com/linsomniac/python-memcached/blob/master/memcache.py#L1412,L1413 [2] https://github.com/linsomniac/python-memcached/blob/master/memcache.py#L1487 [3] https://review.opendev.org/c/openstack/oslo.cache/+/644774 [4] https://bugs.launchpad.net/keystonemiddleware/+bug/1892852 [5] https://review.opendev.org/c/openstack/oslo.cache/+/742193/ [6] https://bugs.launchpad.net/keystonemiddleware/+bug/1883659 [7] https://github.com/linsomniac/python-memcached/issues/179
According to my discussion with Hervé, it was confirmed this issue is caused by implementation in python-memcached which keystonemiddleware uses by default, and the issue will be solved by switching to oslo.cache implementation which is now tracked in bz 1893205 . So I'll close this as duplicated of that bug. *** This bug has been marked as a duplicate of bug 1893205 ***