1893205 – From time to time memcached stops processing requests and brings down OpenStack control plane

Bug 1893205 - From time to time memcached stops processing requests and brings down OpenStack control plane

Summary: From time to time memcached stops processing requests and brings down OpenSta...

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-heat-templates
Sub Component:
Version:	16.1 (Train)
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Damien Ciabrini
QA Contact:	Joe H. Rahme
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	1849754 1915700 (view as bug list)
Depends On:
Blocks:	2046185 2100879 2101864 2101865
TreeView+	depends on / blocked

Reported:	2020-10-30 14:46 UTC by Alex Stupnikov
Modified:	2024-03-25 16:52 UTC (History)
CC List:	29 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	2046185 2100879 (view as bug list)
Environment:
Last Closed:	2023-08-11 12:22:42 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1634646	None	None	None	2021-10-08 08:39:45 UTC
Launchpad	1931047	None	None	None	2021-06-07 09:30:20 UTC
OpenStack gerrit	795497	None	NEW	Add support for keystone_authtoken/memcache_use_advanced_pool	2021-06-09 11:24:07 UTC
Red Hat Issue Tracker	OSP-642	None	None	None	2021-11-10 17:31:49 UTC

Description Alex Stupnikov 2020-10-30 14:46:28 UTC

Description of problem:

Customer has RHOSP 16.1 deployment with beefy controller services. From time to time memcached stops working on all controller nodes.

Because of bug #1891034 we can't tell what's going on from memcached perspective. But in controller's logs we can see that at some point memcached healthcheck starts failing from time to time and then becomes completely broken (no successful healthchecks).

Customer provided sosreports from controller nodes collected at the time of the failure. I kindly ask developers to provide troubleshooting tips and help finding a workaround.

Comment 14 Michael Bayer 2021-01-12 16:37:55 UTC

https://bugs.launchpad.net/oslo.cache/+bug/1888394 has been noted and is a likely cause of this, notes neutron in the comments too.

Comment 17 Alex Stupnikov 2021-01-13 09:33:07 UTC

Bug #1915700 was reported to investigate Neutron behavior.

Comment 22 Srinivas Atmakuri 2021-01-28 04:06:45 UTC

Does creating a cronjob on the controller nodes to have Memcached service restarted for every X hours (~12 hrs) sound like a valid workaround?

Can such a procedure be safely implemented in production environments?

Thank you.

Comment 35 Hervé Beraud 2021-05-25 12:13:44 UTC

Hello Yadnesh,

Did you redeployed a stack with that? (the config)

Else, you can directly update the neutron config and restart the service to apply it.

Comment 45 Takashi Kajinami 2021-06-07 09:27:25 UTC

So far setting keystone_authtoken/memcache_use_advanced_pool is a valid workaround
and this is considered to be actual fix.

I reported a bug[1] in launchpad against tripleo and submitted a draft patch here[2].
 [1] https://bugs.launchpad.net/tripleo/+bug/1931047
 [2] https://review.opendev.org/c/openstack/tripleo-heat-templates/+/795010

I'd appreciate any feedback to that patch especially based on the following point
- Currently this patch enables advanced_pool for all services, based on the fact
  that advanced_pool is now recommended. 

- A new option is added to tht in case a user want to switch back to "legacy pool".
  However considering the fact that advanced_pool is now recomended, maybe
  we can just hard-code the parameter instead.

Comment 46 Takashi Kajinami 2021-06-08 15:19:47 UTC

So far the following template can be used to try enabling memcache_use_advanced_pool for all overcloud services.

~~~
parameter_defaults:
  ControllerExtraConfig:
    aodh::keystone::authtoken::memcache_use_advanced_pool: true
    barbican::keystone::authtoken::memcache_use_advanced_pool: true
    cinder::keystone::authtoken::memcache_use_advanced_pool: true
    glance::api::authtoken::memcache_use_advanced_pool: true
    gnocchi::keystone::authtoken::memcache_use_advanced_pool: true
    heat::keystone::authtoken::memcache_use_advanced_pool: true
    ironic::api::authtoken::memcache_use_advanced_pool: true
    manila::keystone::authtoken::memcache_use_advanced_pool: true
    neutron::keystone::authtoken::memcache_use_advanced_pool: true
    nova::keystone::authtoken::memcache_use_advanced_pool: true
    nova::metadata::novajoin::authtoken::memcache_use_advanced_pool: true
    octavia::keystone::authtoken::memcache_use_advanced_pool: true
    panko::keystone::authtoken::memcache_use_advanced_pool: true
    placement::keystone::authtoken::memcache_use_advanced_pool: true
~~~

Comment 47 Damien Ciabrini 2021-06-25 13:08:24 UTC

*** Bug 1849754 has been marked as a duplicate of this bug. ***

Comment 48 Takashi Kajinami 2021-07-04 13:11:42 UTC

*** Bug 1915700 has been marked as a duplicate of this bug. ***

Comment 63 aruffin@redhat.com 2022-04-07 17:46:51 UTC

Hello,

I am hoping to get a bit of clarity here.  Was this patch rolled out into the latest 16.1  as well as 16.2?

Thank you

Comment 69 Takashi Kajinami 2022-08-03 12:05:24 UTC

There was a confusion caused by the past discussion in this bug so I'm posting this to clear that.

The parameter we are changing to workaround the problem is used by keystonemiddleware,
which is used by only API process.
Thus the parameter should be set in controller nodes, and there is NO NEED to set the same
in compute nodes or any other nodes where no api processes are running.

Please follow https://bugzilla.redhat.com/show_bug.cgi?id=1893205#c46 .
It should be enough to apply the required change.

Comment 70 Luca Miccini 2023-08-11 12:22:42 UTC

fixed in 16.2 , for 16.1 see Takashi's advice regarding a workaround.

Note You need to log in before you can comment on or make changes to this bug.

apevec
aruffin
bdobreli
camorris
dciabrin
dhill
dhruv
dsedgmen
enothen
ggrimaux
hberaud
jmelvin
jraju
jschluet
lhh
lmiccini
mbayer
mburns
mgarciac
michal.vasko
michele
msecaur
satmakur
schhabdi
tkajinam
xili
ykulkarn
yusuf
yusufhadiwinata