Bug 1433942

Summary: The validator "Check the number of processes" fails for a default undercloud installation
Product: Red Hat OpenStack Reporter: Udi Kalifon <ukalifon>
Component: openstack-tripleo-validationsAssignee: Florian Fuchs <flfuchs>
Status: CLOSED ERRATA QA Contact: Ola Pavlenko <opavlenk>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 11.0 (Ocata)CC: jjoyce, jpichon, jrist, jschluet, markmc, slinaber, tvignaud
Target Milestone: rcKeywords: Triaged
Target Release: 11.0 (Ocata)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-validations-5.5.0-1.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-05-17 20:09:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Udi Kalifon 2017-03-20 12:21:34 UTC
Description of problem:
I installed OSP11 on bare metals and ran the validator "Check the number of OpenStack processes on undercloud". It fails with:

Task 'Verify the number of running processes per OpenStack service' failed:
Host: localhost
Message: There are 9 heat-engine processes running. Having more than 8 risks running out of memory.

Task 'Verify the number of running processes per OpenStack service' failed:
Host: localhost
Message: There are 9 nova-api processes running. Having more than 8 risks running out of memory.


Version-Release number of selected component (if applicable):
openstack-tripleo-validations-5.4.0-4.el7ost.noarch
openstack-tripleo-common-6.0.1-0.20170307123121.2c9fa69.el7ost.noarch
puppet-tripleo-6.3.0-1.el7ost.noarch


How reproducible:
100%


Steps to Reproduce:
1. Install a default undercloud and run the validator


Actual results:
Validator fails.


Expected results:
We should comply with our own guidelines and recommendations. The validator should pass if the user didn't make a configuration change.

Comment 2 Florian Fuchs 2017-04-03 13:17:37 UTC
The maximum number of processes is currently hard-coded to 4. But some processes (like nova or heat) set the number of workers to higher values, especially on hosts with a larger number of CPUs.

I suggest to set the maximum number in this validation to the number of CPUs on the undercloud.

Upstream patch: https://review.openstack.org/#/c/452746/

Comment 4 Florian Fuchs 2017-04-10 20:00:34 UTC
Updated the patch with a different fix, based on review feedback.

Comment 6 Florian Fuchs 2017-04-11 11:25:59 UTC
(In reply to Florian Fuchs from comment #2)
> The maximum number of processes is currently hard-coded to 4. But some
> processes (like nova or heat) set the number of workers to higher values,
> especially on hosts with a larger number of CPUs.
> 
> I suggest to set the maximum number in this validation to the number of CPUs
> on the undercloud.
> 
> Upstream patch: https://review.openstack.org/#/c/452746/

Correction: The max number is 8, not 4. Also, considering the original intention of the validation, my suggestion to set the max number equal to the number of CPUs doesn't make much sense, because that's the condition the validation is supposed to prevent in the first place (services eating up memory in setups with a large number of CPUs). So, until a better way is found to realistically assess a meaningful maximum number of processes per service, the proposed upstream fix makes the validation succeed with warnings instead of letting it fail.

Comment 9 Jason E. Rist 2017-04-11 15:41:25 UTC
Removing Regression based on logic in #5, removing blocker since that was the intended action before #7

Comment 11 Udi Kalifon 2017-04-27 13:16:18 UTC
Verified in openstack-tripleo-validations-5.5.0-1.el7ost.noarch. The validator passes with a warning.

Comment 12 errata-xmlrpc 2017-05-17 20:09:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1245