Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1430766

Summary: Deployment fails with stack still stuck in CREATE_IN_PROGRESS because of messaging timeout-Keystone pegged
Product: Red Hat OpenStack Reporter: Sai Sindhur Malleni <smalleni>
Component: rhosp-directorAssignee: Angus Thomas <athomas>
Status: CLOSED DUPLICATE QA Contact: Amit Ugol <augol>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 10.0 (Newton)CC: aschultz, dbecker, mburns, morazi, nkinder, raywang, rhel-osp-director-maint
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: scale_lab
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-21 19:47:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Keystone Pegged none

Description Sai Sindhur Malleni 2017-03-09 14:52:18 UTC
Created attachment 1261600 [details]
Keystone Pegged

Description of problem: We currently deploy undercloud with one keystone admin and one main process in httpd. However, during deployment of 51 node OC keystone is pegged to a point that even simple commands like nova list do no return in significant time. Due to keystone being pegged, looks like deployment exits due to heat timeout on waiting for an RPC message. Based on Alex Krzos's(done extensive keystone performance work) suggestions I bumped up number of processes to 24 which is the logical core count (both admin and main keeping threads equal to 1) and I see the deployment went through smoothly. As a precaution I also bumped rpc_response_timeout in heat.conf to 1200 from 600 but I think it is not necessary as keystone was clearly the bottleneck. Even if it isn't advisable to deploy with logical core count, we definitely need more than 1 process.


Version-Release number of selected component (if applicable):
RHOP 10 Puddle 2017-03-03.1

How reproducible:
100% on largee deploys

Steps to Reproduce:
1. Deploy OC with default UC configuration
2.
3.

Actual results:
Deploy should go through

Expected results:
Deploy just exits due to RPC timeout (it doesn't even mark the stack as failed)

Additional info:
Heat RPC timeout and deploy exiting: https://gist.github.com/smalleni/a90133782d4f903ef995339293a45b8f

Comment 3 Nathan Kinder 2017-11-21 19:47:50 UTC
This has been addressed in OSP11 and a backport is pending for OSP 10.z.  Closing this as a duplicate of the existing OSP 10 bug for this issue.

*** This bug has been marked as a duplicate of bug 1435472 ***