DescriptionSai Sindhur Malleni
2020-01-23 22:50:24 UTC
Description of problem: When scaling out an overcloud from 244 to 251 nodes, we see the deployment fail due to keystone performance issue.
ERROR:
https://gist.githubusercontent.com/smalleni/3a17d8d516fd0139dee8e07d519b48ea/raw/c0918532dfea3eff973461fe7ac03413d39c57f6/gistfile1.txt
We have an undercloud with 64 logical cores, however only 12 keystone workers were deployed based on the ::os_workers fact here https://github.com/openstack/puppet-openstacklib/blob/master/lib/facter/os_workers.rb
In previous releases, 24 workers would be deployed (12 keystone-main and 12 keystone-admin), but since the actions have been consolidated within a single "keystone" worker since keystone v3, TripleO sets lower number of total "keystone" workers (compared to when it would set keystone-main and keystone-admin workers).
On bumping the number of keystone workers on my undercloud from 12 to 24, the heat stack update proceeded.
Version-Release number of selected component (if applicable):
16
RHOS_TRUNK-16.0-RHEL-8-20200113.n.0
How reproducible:
100%
Steps to Reproduce:
1. Deploy undercloud with defaults
2. Deploy a a large overcloud
3. Scale the overcloud
Actual results:
Scale out fails
Expected results:
Scale out should succeed
Additional info:
Keystone performance drop in overcloud which is related to this bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1789495
It is possible the fix for that will also help with this
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2020:2114
Comment 13Red Hat Bugzilla
2023-09-14 05:50:38 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days