Bug 1893205
Summary: | From time to time memcached stops processing requests and brings down OpenStack control plane | |||
---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Alex Stupnikov <astupnik> | |
Component: | openstack-tripleo-heat-templates | Assignee: | Damien Ciabrini <dciabrin> | |
Status: | CLOSED NEXTRELEASE | QA Contact: | Joe H. Rahme <jhakimra> | |
Severity: | medium | Docs Contact: | ||
Priority: | high | |||
Version: | 16.1 (Train) | CC: | apevec, aruffin, bdobreli, camorris, dciabrin, dhill, dhruv, dsedgmen, enothen, ggrimaux, hberaud, jmelvin, jraju, jschluet, lhh, lmiccini, mbayer, mburns, mgarciac, michal.vasko, michele, msecaur, satmakur, schhabdi, tkajinam, xili, ykulkarn, yusuf, yusufhadiwinata | |
Target Milestone: | --- | Keywords: | Triaged | |
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2046185 2100879 (view as bug list) | Environment: | ||
Last Closed: | 2023-08-11 12:22:42 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2046185, 2100879, 2101864, 2101865 |
Description
Alex Stupnikov
2020-10-30 14:46:28 UTC
https://bugs.launchpad.net/oslo.cache/+bug/1888394 has been noted and is a likely cause of this, notes neutron in the comments too. Bug #1915700 was reported to investigate Neutron behavior. Does creating a cronjob on the controller nodes to have Memcached service restarted for every X hours (~12 hrs) sound like a valid workaround? Can such a procedure be safely implemented in production environments? Thank you. Hello Yadnesh, Did you redeployed a stack with that? (the config) Else, you can directly update the neutron config and restart the service to apply it. So far setting keystone_authtoken/memcache_use_advanced_pool is a valid workaround and this is considered to be actual fix. I reported a bug[1] in launchpad against tripleo and submitted a draft patch here[2]. [1] https://bugs.launchpad.net/tripleo/+bug/1931047 [2] https://review.opendev.org/c/openstack/tripleo-heat-templates/+/795010 I'd appreciate any feedback to that patch especially based on the following point - Currently this patch enables advanced_pool for all services, based on the fact that advanced_pool is now recommended. - A new option is added to tht in case a user want to switch back to "legacy pool". However considering the fact that advanced_pool is now recomended, maybe we can just hard-code the parameter instead. So far the following template can be used to try enabling memcache_use_advanced_pool for all overcloud services. ~~~ parameter_defaults: ControllerExtraConfig: aodh::keystone::authtoken::memcache_use_advanced_pool: true barbican::keystone::authtoken::memcache_use_advanced_pool: true cinder::keystone::authtoken::memcache_use_advanced_pool: true glance::api::authtoken::memcache_use_advanced_pool: true gnocchi::keystone::authtoken::memcache_use_advanced_pool: true heat::keystone::authtoken::memcache_use_advanced_pool: true ironic::api::authtoken::memcache_use_advanced_pool: true manila::keystone::authtoken::memcache_use_advanced_pool: true neutron::keystone::authtoken::memcache_use_advanced_pool: true nova::keystone::authtoken::memcache_use_advanced_pool: true nova::metadata::novajoin::authtoken::memcache_use_advanced_pool: true octavia::keystone::authtoken::memcache_use_advanced_pool: true panko::keystone::authtoken::memcache_use_advanced_pool: true placement::keystone::authtoken::memcache_use_advanced_pool: true ~~~ *** Bug 1849754 has been marked as a duplicate of this bug. *** *** Bug 1915700 has been marked as a duplicate of this bug. *** Hello, I am hoping to get a bit of clarity here. Was this patch rolled out into the latest 16.1 as well as 16.2? Thank you There was a confusion caused by the past discussion in this bug so I'm posting this to clear that. The parameter we are changing to workaround the problem is used by keystonemiddleware, which is used by only API process. Thus the parameter should be set in controller nodes, and there is NO NEED to set the same in compute nodes or any other nodes where no api processes are running. Please follow https://bugzilla.redhat.com/show_bug.cgi?id=1893205#c46 . It should be enough to apply the required change. fixed in 16.2 , for 16.1 see Takashi's advice regarding a workaround. |