Bug 1915700 - Neutron opens too many connections to memcached and causes DoS situation
Summary: Neutron opens too many connections to memcached and causes DoS situation
Keywords:
Status: CLOSED DUPLICATE of bug 1893205
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-keystonemiddleware
Version: 16.1 (Train)
Hardware: All
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Lance Bragstad
QA Contact: Jeremy Agee
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-13 09:32 UTC by Alex Stupnikov
Modified: 2024-06-13 23:54 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-04 13:11:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 742193 0 None MERGED Do not hardcode flush_on_reconnect, move to oslo.cache config 2021-02-16 18:44:09 UTC
OpenStack gerrit 770738 0 None MERGED Add setting to override max memcached connections 2021-02-16 18:44:09 UTC
OpenStack gerrit 772394 0 None NEW Allow to keystone to override oslo.cache.memcache_pool_flush_on_reconnect 2021-02-16 18:44:10 UTC
Red Hat Issue Tracker OSP-3389 0 None None None 2022-08-23 16:20:07 UTC

Description Alex Stupnikov 2021-01-13 09:32:23 UTC
Description of problem:

Some time ago I have reported a bug #1893205 : memcached failed periodically in RHOSP 16.1 deployments with "beefy" controller nodes. It looks like this problem is caused by Neutron, which opens too many connections to memcached and exceeds its limit [1].

Upstream bug [2] may be related.

Original bug has three cases attached, two of them (02782690 and 02829231) have sosreports collected at the time of the failure: you can check latest sosreports.

[1]
https://bugzilla.redhat.com/show_bug.cgi?id=1893205#c13

[2]
https://bugs.launchpad.net/oslo.cache/+bug/1888394

Comment 1 Hervé Beraud 2021-01-13 11:08:39 UTC
Hello,

The root cause is a python-memcached issue, I think it's due to the fact that when python-memcached try to get a socket and when the `flush_on_reconnect` param is passed to python-memcached it will do a `flush_all` [1][2].

This behavior have been added during ussuri by adding passing the `flush_on_reconnect` param to memcache pooled backend from oslo.cache [3]. The goal was to release memcached socket, that was due to a race condition if I remember correctly [4].

This feature was needed by keystone, and we tried to make this feature optional on oslo.cache [5], however we didn't reached a consensus and this patch have been abandonned since... 

AFAIK identical bugs have been reported upstream [6][7].

The root cause is the python-memcached "bug" [1][2][7], however I don't know if this really something expected by that library, I'll try to discuss with the maintainers.
 
Let's move this bug to oslo.cache for now, I'll reopen and surely I will become the owner the abandonned patch [5], to try to make this feature optional.

[1] https://github.com/linsomniac/python-memcached/blob/master/memcache.py#L1412,L1413
[2] https://github.com/linsomniac/python-memcached/blob/master/memcache.py#L1487
[3] https://review.opendev.org/c/openstack/oslo.cache/+/644774
[4] https://bugs.launchpad.net/keystonemiddleware/+bug/1892852
[5] https://review.opendev.org/c/openstack/oslo.cache/+/742193/
[6] https://bugs.launchpad.net/keystonemiddleware/+bug/1883659
[7] https://github.com/linsomniac/python-memcached/issues/179

Comment 17 Takashi Kajinami 2021-07-04 13:11:44 UTC
According to my discussion with Hervé, it was confirmed this issue is caused by implementation in python-memcached which keystonemiddleware uses by default,
and the issue will be solved by switching to oslo.cache implementation which is now tracked in bz 1893205 .

So I'll close this as duplicated of that bug.

*** This bug has been marked as a duplicate of bug 1893205 ***


Note You need to log in before you can comment on or make changes to this bug.