Description of problem:
Glance-API and Glance-registry are each deployed with a single worker per controller. This is in spite of the glance-api.conf and glance-registry.conf having workers set to 0 by default(which should ideally translate to workers=# on cores on the machine). FWIW, packstack deploys with workers=cores. So essentially two problems here:
1. Why is glance deploying with 1 worker when workers is set to 0 in the config file(every service I can think of default to # of cores when workers=0)
2. Performance Hit with single worker
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Deploy OSP 9
2. Check for glance processes
3. Deployed with 1 process each for api and registry
1 worker per glance process
# of workers= # of cores
On a 1 controller+2 compute setup, I was able to run a few Rally benchmarks via our tool Browbeat. The controller had 32 cores. The scenrio was to create and delete an image in glance(cirros). At concurrency=64, we see that the default deployment with 1 worker pretty much errs out 54% of the time and has a 95% of 60 seconds to create an image vs 30 seconds in the case workers=#of cores(32).
1 worker each for api and registry: http://10.12.23.106:9000/20160728-162548/all-rally-run-0.html#/GlanceImages.create_and_delete_image-2
32 workers each for api and registry: http://10.12.23.106:9000/20160728-160417/all-rally-run-0.html#/GlanceImages.create_and_delete_image-2
Changed component from openstack-tripleo to director as it was the downstream product I was testing.
The previous Rally results used the glance image create with the image url being external to our network. So I believe the bad times we see in some cases are due to the image download itself, and not more of image create. I retested by hosting image locally and the timings improved(no timeouts seen even with single worker). But the timings are much better when # of workers= cores as seen below. The scenrio is glance image create and delete at concurrency=64.
1 worker: http://10.12.23.106:9000/20160728-183814/all-rally-run-0.html#/GlanceImages.create_and_delete_image-2
32 workers(no. of cores): http://10.12.23.106:9000/20160728-185531/all-rally-run-0.html#/GlanceImages.create_and_delete_image-2
I can confirm that setting `workers` to 0 does not mean it'll use the N# of cores. In Glance 0 is translated to a single pool. In order to make it use the number of cores, the workers option should be set to None. While this behavior is not what one would expect, it's been like that since Glance was created and our config files default to None.
The problem seems to be in tripleo as it's setting the value for this option to 0 and it should probably be changed there.
So, we do want to move to deploying with workers=cores?
I'd recommand not setting `workers` at all in the configuration file. Just let Glance find the number of cores itself.
Moving this to puppet as this might need to be fixed there.
(In reply to Flavio Percoco from comment #5)
> I can confirm that setting `workers` to 0 does not mean it'll use the N# of
> cores. In Glance 0 is translated to a single pool. In order to make it use
> the number of cores, the workers option should be set to None. While this
> behavior is not what one would expect, it's been like that since Glance was
> created and our config files default to None.
> The problem seems to be in tripleo as it's setting the value for this option
> to 0 and it should probably be changed there.
I can't see any way to fix it in the template because None/null is not a valid default value.
So options are either
1. make glance treat 0 the same as null, which I think is consistent with how most other projects now do things?
2. special-case this in puppet-glance to do the same (e.g translate the 0 in to None)
We can't really remove this interface completely beccause it will break backwards compatibility for users of the templates.
1) submit the bug upstream in launchpad/puppet-glance
2) change default values in glance::api::workers to be $os_service_default, so the value will be unset by default and we'll rely on what Glance uses by default.
Please move the bug upstream and close it.
Just a note, I am not working on the bug presently, I just gave some direction at how to do here.
So it's clearly a bug in TripleO Heat templates or in Glance, like Steven mentioned well.*
It is not a bug in puppet-glance, because puppet-glance provides the interface to configure the number of workers with a default value to the number of processors, which is done by many other modules.
So we need to either patch TripleO to change the default GlanceWorkers parameter from 0 to something (None?) or Glance to accept '0' string value.
I'm moving the bug out from puppet-glance, to openstack-glance, but feel free to move it where you think it needs to be fixed.
@Emilien et all - it looks like 0 is reserved for testing/profiling/etc 
This is a bug with our puppet modules, we should not even set workers.
> This is a bug with our puppet modules, we should not even set workers.
That is wrong, the option is available in Glance:
puppet-glance just provides the interface to configure it or not. If you think we should not configure it, just set it to $os_service_default in puppet-glance or set it to "undef" in tripleo.
(In reply to Emilien Macchi from comment #14)
> > This is a bug with our puppet modules, we should not even set workers.
> That is wrong, the option is available in Glance:
Yeah but the default (None) is the right one to keep as it sets the value to the number of CPUs. We don't set that in our RPMs either.
> puppet-glance just provides the interface to configure it or not. If you
> think we should not configure it, just set it to $os_service_default in
> puppet-glance or set it to "undef" in tripleo.
This sounds like the right solution.
What's the status on this one? Is there a patch related to this? Is it still possible to fix in TripleO for 10? Trying to determine whether we need to push this to 11 and triage there.
Hi Elise, there's no patch i know of. I see the BZ has been triaged into Storage DFG so this probably hasn't been assigned to a correct dev yet. (I'm just the default person in the assignee field for all t-h-t BZs.)
Regarding a possible fix for 10 -- i think it would either have to land before Thursday's OpenStack feature freeze, or if this arguably causes performance issues, then it might be considered valid for backporting to stable/newton even after feature freeze.
I can see submitted upstream https://review.openstack.org/#/c/350219/
I have to agree with Flavio here: if you really want to use #cores, you should set the option to 'None' (see glance.common.swgi.get_num_workers). Fixing this in TripleO seems to be the right thing to do.
Moving to ON_DEV to reflect your progress.
[root@undercloud-0 ~]# rpm -q openstack-tripleo-heat-templates
# The number of cores
[root@undercloud-0 ~]# nproc
[root@controller-0 ~]# grep ^workers /etc/glance/*
/etc/glance/glance-api.conf:workers = 4
/etc/glance/glance-registry.conf:workers = 4
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.