1915700 – Neutron opens too many connections to memcached and causes DoS situation

Bug 1915700 - Neutron opens too many connections to memcached and causes DoS situation

Summary: Neutron opens too many connections to memcached and causes DoS situation

Keywords:
Status:	CLOSED DUPLICATE of bug 1893205
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	python-keystonemiddleware
Sub Component:
Version:	16.1 (Train)
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Lance Bragstad
QA Contact:	Jeremy Agee
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-01-13 09:32 UTC by Alex Stupnikov
Modified:	2024-10-01 17:18 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-07-04 13:11:44 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
OpenStack gerrit	742193	None	MERGED	Do not hardcode flush_on_reconnect, move to oslo.cache config	2021-02-16 18:44:09 UTC
OpenStack gerrit	770738	None	MERGED	Add setting to override max memcached connections	2021-02-16 18:44:09 UTC
OpenStack gerrit	772394	None	NEW	Allow to keystone to override oslo.cache.memcache_pool_flush_on_reconnect	2021-02-16 18:44:10 UTC
Red Hat Issue Tracker	OSP-3389	None	None	None	2022-08-23 16:20:07 UTC

Description Alex Stupnikov 2021-01-13 09:32:23 UTC

Description of problem:

Some time ago I have reported a bug #1893205 : memcached failed periodically in RHOSP 16.1 deployments with "beefy" controller nodes. It looks like this problem is caused by Neutron, which opens too many connections to memcached and exceeds its limit [1].

Upstream bug [2] may be related.

Original bug has three cases attached, two of them (02782690 and 02829231) have sosreports collected at the time of the failure: you can check latest sosreports.

[1]
https://bugzilla.redhat.com/show_bug.cgi?id=1893205#c13

[2]
https://bugs.launchpad.net/oslo.cache/+bug/1888394

Comment 1 Hervé Beraud 2021-01-13 11:08:39 UTC

Hello,

The root cause is a python-memcached issue, I think it's due to the fact that when python-memcached try to get a socket and when the `flush_on_reconnect` param is passed to python-memcached it will do a `flush_all` [1][2].

This behavior have been added during ussuri by adding passing the `flush_on_reconnect` param to memcache pooled backend from oslo.cache [3]. The goal was to release memcached socket, that was due to a race condition if I remember correctly [4].

This feature was needed by keystone, and we tried to make this feature optional on oslo.cache [5], however we didn't reached a consensus and this patch have been abandonned since... 

AFAIK identical bugs have been reported upstream [6][7].

The root cause is the python-memcached "bug" [1][2][7], however I don't know if this really something expected by that library, I'll try to discuss with the maintainers.
 
Let's move this bug to oslo.cache for now, I'll reopen and surely I will become the owner the abandonned patch [5], to try to make this feature optional.

[1] https://github.com/linsomniac/python-memcached/blob/master/memcache.py#L1412,L1413
[2] https://github.com/linsomniac/python-memcached/blob/master/memcache.py#L1487
[3] https://review.opendev.org/c/openstack/oslo.cache/+/644774
[4] https://bugs.launchpad.net/keystonemiddleware/+bug/1892852
[5] https://review.opendev.org/c/openstack/oslo.cache/+/742193/
[6] https://bugs.launchpad.net/keystonemiddleware/+bug/1883659
[7] https://github.com/linsomniac/python-memcached/issues/179

Comment 17 Takashi Kajinami 2021-07-04 13:11:44 UTC

According to my discussion with Hervé, it was confirmed this issue is caused by implementation in python-memcached which keystonemiddleware uses by default,
and the issue will be solved by switching to oslo.cache implementation which is now tracked in bz 1893205 .

So I'll close this as duplicated of that bug.

*** This bug has been marked as a duplicate of bug 1893205 ***

Note You need to log in before you can comment on or make changes to this bug.