Bug 1794595 - Undercloud keystone performance issue leads to stack update failure at scale [NEEDINFO]
Summary: Undercloud keystone performance issue leads to stack update failure at scale
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: puppet-keystone
Version: 16.0 (Train)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z2
: 16.0 (Train on RHEL 8.1)
Assignee: RHOS Maint
QA Contact: Jeremy Agee
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-23 22:50 UTC by Sai Sindhur Malleni
Modified: 2020-05-14 12:15 UTC (History)
13 users (show)

Fixed In Version: openstack-tripleo-heat-templates-11.3.2-0.20200315025718.033aae9.el8ost, puppet-keystone-15.4.1-0.20200312191822.abeb879.el8ost, puppet-openstacklib-15.4.1-0.20200310152952.ae52363.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-14 12:15:29 UTC
Target Upstream Version:
jamsmith: needinfo? (rhos-maint)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 702031 0 None MERGED Fix performance regression due to reduced number of keystone workers 2020-12-03 19:55:06 UTC
OpenStack gerrit 711762 0 None MERGED Fix performance regression due to reduced number of keystone workers 2020-12-03 19:55:06 UTC
OpenStack gerrit 711763 0 None MERGED Have doubled workers for keystone service 2020-12-03 19:55:32 UTC
OpenStack gerrit 711764 0 None MERGED Update the number of keystone workers 2020-12-03 19:55:05 UTC
Red Hat Product Errata RHBA-2020:2114 0 None None None 2020-05-14 12:15:55 UTC

Description Sai Sindhur Malleni 2020-01-23 22:50:24 UTC
Description of problem: When scaling out an overcloud from 244 to 251 nodes, we see the deployment fail due to keystone performance issue. 
ERROR:

https://gist.githubusercontent.com/smalleni/3a17d8d516fd0139dee8e07d519b48ea/raw/c0918532dfea3eff973461fe7ac03413d39c57f6/gistfile1.txt

We have an undercloud with 64 logical cores, however only 12 keystone workers were deployed based on the ::os_workers fact here https://github.com/openstack/puppet-openstacklib/blob/master/lib/facter/os_workers.rb

In previous releases, 24 workers would be deployed (12 keystone-main and 12 keystone-admin), but since the actions have been consolidated within a single "keystone" worker since keystone v3, TripleO sets lower number of total "keystone" workers (compared to when it would set keystone-main and keystone-admin workers).

On bumping the number of keystone workers on my undercloud from 12 to 24, the heat stack update proceeded.



Version-Release number of selected component (if applicable):
16
RHOS_TRUNK-16.0-RHEL-8-20200113.n.0

How reproducible:
100%

Steps to Reproduce:
1. Deploy undercloud with defaults
2. Deploy a a large overcloud
3. Scale the overcloud

Actual results:
Scale out fails

Expected results:
Scale out should succeed

Additional info:
Keystone performance drop in overcloud which is related to this bug:  
https://bugzilla.redhat.com/show_bug.cgi?id=1789495

It is possible the fix for that will also help with this

Comment 12 errata-xmlrpc 2020-05-14 12:15:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2114


Note You need to log in before you can comment on or make changes to this bug.