Bug 1361285

Summary:	Glance deployed with single worker
Product:	Red Hat OpenStack	Reporter:	Sai Sindhur Malleni <smalleni>
Component:	openstack-tripleo-heat-templates	Assignee:	Jiri Stransky <jstransk>
Status:	CLOSED ERRATA	QA Contact:	Avi Avraham <aavraham>
Severity:	unspecified	Docs Contact:
Priority:	medium
Version:	10.0 (Newton)	CC:	cyril, dbecker, egafford, eglynn, emacchi, fpercoco, jason.dobies, jschluet, jstransk, jtaleric, mburns, mcornea, morazi, pgrist, rhel-osp-director-maint, scohen, smalleni, srevivo
Target Milestone:	ga	Keywords:	Triaged
Target Release:	10.0 (Newton)
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	openstack-tripleo-heat-templates-5.0.0-1.1.el7ost	Doc Type:	Enhancement
Doc Text:	Feature: Glance configured with more workers by default. Reason: Improved performance. Result: Glance API and Registry gets deployed with more workers by default. The count is automatically scaled depending on the number of processors.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-12-14 15:47:33 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Sai Sindhur Malleni 2016-07-28 16:55:36 UTC

Description of problem:
Glance-API and Glance-registry are each deployed with a single worker per controller. This is in spite of the glance-api.conf and glance-registry.conf having workers set to 0 by default(which should ideally translate to workers=# on cores on the machine). FWIW, packstack deploys with workers=cores. So essentially two problems here:
1. Why is glance deploying with 1 worker when workers is set to 0 in the config file(every service I can think of default to # of cores when workers=0)
2. Performance Hit with single worker

Version-Release number of selected component (if applicable):

How reproducible:
100%

Steps to Reproduce:
1. Deploy OSP 9
2. Check for glance processes
3. Deployed with 1 process each for api and registry

Actual results:
1 worker per glance process

Expected results:
# of workers= # of cores

Additional info:
On a 1 controller+2 compute setup, I was able to run a few Rally benchmarks via our tool Browbeat. The controller had 32 cores. The scenrio was to create and delete an image in glance(cirros). At concurrency=64, we see that the default deployment with 1 worker pretty much errs out 54% of the time and has a 95% of 60 seconds to create an image vs 30 seconds in the case workers=#of cores(32).
Rally Results:
1 worker each for api and registry: http://10.12.23.106:9000/20160728-162548/all-rally-run-0.html#/GlanceImages.create_and_delete_image-2

32 workers each for api and registry: http://10.12.23.106:9000/20160728-160417/all-rally-run-0.html#/GlanceImages.create_and_delete_image-2

Comment 2 Sai Sindhur Malleni 2016-07-28 17:57:28 UTC

Changed component from openstack-tripleo to director as it was the downstream product I was testing.

Comment 4 Sai Sindhur Malleni 2016-07-29 13:48:52 UTC

The previous Rally results used the glance image create with  the image url being external to our network. So I believe the bad times we see in some cases are due to the image download itself, and not more of image create. I retested by hosting image locally and the timings improved(no timeouts seen even with single worker). But the timings are much better when # of workers= cores as seen below. The scenrio is glance image create and delete at concurrency=64.

1 worker: http://10.12.23.106:9000/20160728-183814/all-rally-run-0.html#/GlanceImages.create_and_delete_image-2

32 workers(no. of cores): http://10.12.23.106:9000/20160728-185531/all-rally-run-0.html#/GlanceImages.create_and_delete_image-2

Comment 5 Flavio Percoco 2016-07-29 15:41:20 UTC

I can confirm that setting `workers` to 0 does not mean it'll use the N# of cores. In Glance 0 is translated to a single pool. In order to make it use the number of cores, the workers option should be set to None. While this behavior is not what one would expect, it's been like that since Glance was created and our config files default to None.

The problem seems to be in tripleo as it's setting the value for this option to 0[0] and it should probably be changed there.

[0] https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/glance-api.yaml#L38-L41

Comment 6 Sai Sindhur Malleni 2016-08-01 13:03:57 UTC

So, we do want to move to deploying with workers=cores?

Comment 7 Flavio Percoco 2016-08-02 09:15:58 UTC

I'd recommand not setting `workers` at all in the configuration file. Just let Glance find the number of cores itself.

Comment 8 Flavio Percoco 2016-08-02 09:36:30 UTC

Moving this to puppet as this might need to be fixed there.

Comment 9 Steven Hardy 2016-08-02 09:40:22 UTC

(In reply to Flavio Percoco from comment #5)
> I can confirm that setting `workers` to 0 does not mean it'll use the N# of
> cores. In Glance 0 is translated to a single pool. In order to make it use
> the number of cores, the workers option should be set to None. While this
> behavior is not what one would expect, it's been like that since Glance was
> created and our config files default to None.
> 
> The problem seems to be in tripleo as it's setting the value for this option
> to 0[0] and it should probably be changed there.
> 
> [0]
> https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/
> services/glance-api.yaml#L38-L41

I can't see any way to fix it in the template because None/null is not a valid default value.

So options are either 

1. make glance treat 0 the same as null, which I think is consistent with how most other projects now do things?

2. special-case this in puppet-glance to do the same (e.g translate the 0 in to None)

We can't really remove this interface completely beccause it will break backwards compatibility for users of the templates.

Comment 10 Emilien Macchi 2016-08-02 12:04:39 UTC

I would:

1) submit the bug upstream in launchpad/puppet-glance

2) change default values in glance::api::workers to be $os_service_default, so the value will be unset by default and we'll rely on what Glance uses by default.

Please move the bug upstream and close it.

Comment 11 Emilien Macchi 2016-08-02 15:46:04 UTC

Just a note, I am not working on the bug presently, I just gave some direction at how to do here.

Comment 12 Emilien Macchi 2016-08-02 17:45:15 UTC

So it's clearly a bug in TripleO Heat templates or in Glance, like Steven mentioned well.*
It is not a bug in puppet-glance, because puppet-glance provides the interface to configure the number of workers with a default value to the number of processors, which is done by many other modules.

So we need to either patch TripleO to change the default GlanceWorkers parameter from 0 to something (None?) or Glance to accept '0' string value.

I'm moving the bug out from puppet-glance, to openstack-glance, but feel free to move it where you think it needs to be fixed.

Comment 13 Joe Talerico 2016-08-02 18:20:51 UTC

@Emilien et all - it looks like 0 is reserved for testing/profiling/etc [1]

This is a bug with our puppet modules, we should not even set workers.

[1] https://github.com/openstack/glance/blob/474d8d05c438d7f6934019489195416993c2e013/glance/common/wsgi.py#L317

Comment 14 Emilien Macchi 2016-08-02 18:24:10 UTC

> This is a bug with our puppet modules, we should not even set workers.

That is wrong, the option is available in Glance:
https://github.com/openstack/glance/blob/474d8d05c438d7f6934019489195416993c2e013/glance/common/wsgi.py#L83-L86

puppet-glance just provides the interface to configure it or not. If you think we should not configure it, just set it to $os_service_default in puppet-glance or set it to "undef" in tripleo.

Comment 15 Flavio Percoco 2016-08-02 19:25:41 UTC

(In reply to Emilien Macchi from comment #14)
> > This is a bug with our puppet modules, we should not even set workers.
> 
> That is wrong, the option is available in Glance:
> https://github.com/openstack/glance/blob/
> 474d8d05c438d7f6934019489195416993c2e013/glance/common/wsgi.py#L83-L86

Yeah but the default (None) is the right one to keep as it sets the value to the number of CPUs. We don't set that in our RPMs either.

> 
> puppet-glance just provides the interface to configure it or not. If you
> think we should not configure it, just set it to $os_service_default in
> puppet-glance or set it to "undef" in tripleo.

++

This sounds like the right solution.

Comment 16 Elise Gafford 2016-08-29 14:47:09 UTC

Hi Jiri,

What's the status on this one? Is there a patch related to this? Is it still possible to fix in TripleO for 10? Trying to determine whether we need to push this to 11 and triage there.

Thanks!
- Elise

Comment 17 Jiri Stransky 2016-08-29 15:03:00 UTC

Hi Elise, there's no patch i know of. I see the BZ has been triaged into Storage DFG so this probably hasn't been assigned to a correct dev yet. (I'm just the default person in the assignee field for all t-h-t BZs.)

Regarding a possible fix for 10 -- i think it would either have to land before Thursday's OpenStack feature freeze, or if this arguably causes performance issues, then it might be considered valid for backporting to stable/newton even after feature freeze.

Comment 18 Sai Sindhur Malleni 2016-08-29 15:12:38 UTC

I can see submitted upstream https://review.openstack.org/#/c/350219/

Comment 19 Cyril Roelandt 2016-09-02 14:48:30 UTC

I have to agree with Flavio here: if you really want to use #cores, you should set the option to 'None' (see glance.common.swgi.get_num_workers). Fixing this in TripleO seems to be the right thing to do.

Comment 20 Elise Gafford 2016-09-12 15:23:36 UTC

Hi Jiri,

Moving to ON_DEV to reflect your progress.

Comment 24 Avi Avraham 2016-12-08 13:51:13 UTC

verified 
[root@undercloud-0 ~]# rpm -q openstack-tripleo-heat-templates
openstack-tripleo-heat-templates-5.1.0-7.el7ost.noarch
# The number of cores
[root@undercloud-0 ~]# nproc
4
[root@controller-0 ~]# grep ^workers /etc/glance/*
/etc/glance/glance-api.conf:workers = 4
/etc/glance/glance-registry.conf:workers = 4

Comment 26 errata-xmlrpc 2016-12-14 15:47:33 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html