Bug 1421729

Summary: Utilization data for OSP cloud instances does not show up
Product: Red Hat CloudForms Management Engine Reporter: michael_rasoulian <michael_rasoulian>
Component: ApplianceAssignee: Marek Aufart <maufart>
Status: CLOSED CURRENTRELEASE QA Contact: Ola Pavlenko <opavlenk>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 5.7.0CC: aarapov, abellott, arkady_kanevsky, cdevine, christopher_dearborn, cpelland, dajohnso, david_paterson, dcain, gtanzill, jhardy, john_terpstra, John_walsh, kurt_hey, lavenel, mandreou, manisha_tripathy, mburns, michael_rasoulian, morazi, obarenbo, randy_perryman, scohen, simaishi, smerrow, sreichar, tzumainn, wayne_allen
Target Milestone: GAKeywords: TestOnly, ZStream
Target Release: 5.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 5.8.0.2 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1415544
: 1422241 (view as bug list) Environment:
Last Closed: 2017-06-12 17:11:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: Openstack Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1415544, 1422241    
Attachments:
Description Flags
evm.log
none
ceilometer meter-list output none

Description michael_rasoulian 2017-02-13 14:40:24 UTC
Created attachment 1249872 [details]
evm.log

Description of problem:
After adding OpenStack as a cloud provider and enabling capacity & utilization, utilization data for the cloud instances is never available.  Waited well over 48 hours after C&U enabled and it's not showing up in the CF interface.


Version-Release number of selected component (if applicable):
CF 4.2
OSP 9.0

How reproducible:
Always

Steps to Reproduce:
1. Add OpenStack as a Cloud provider.
2. Enable capacity & utilization.
3. Wait 24 hours for data to populate.
4. Generate a report for utilization of cloud instances (CPU usage, mem usage, etc.) or try to view individual utilization charts per instance.  

Actual results:
The reports do not generate any records.  The utilization graphs are unavailable (the "Utilization" link is disabled).

Expected results:
Reports, dashboard, individual instance charts should all show utilization data.

Additional info:
This is working fine in CF 4.1.

Comment 2 Tzu-Mainn Chen 2017-02-13 16:36:22 UTC
Hi!  Would it be possible to run the following openstack CLI commands on the overcloud?

* ceilometer meter-list
* gnocchi metric list

Comment 3 michael_rasoulian 2017-02-13 16:42:17 UTC
Created attachment 1249937 [details]
ceilometer meter-list output

ceilometer meter-list output attached

gnocchi metric list output is blank

Comment 4 Tzu-Mainn Chen 2017-02-13 18:42:56 UTC
Thanks!  Here's what's going on:

The upstream telemetry team is in the process of deprecating Ceilometer.  The metrics portion of the API is being replaced by Gnocchi.  To compensate, what CF does is check if the Gnocchi service exists in the Keystone catalog.  If it does, it uses it; if not, it uses Ceilometer.

In OSP9, the Gnocchi service is enabled, but by default metrics collection in the overcloud is done through Ceilometer.  So CF sees Gnocchi, uses it, and finds nothing.  (In OSP10 the default is switched to Gnocchi and metrics appear in CF).

The suggested workaround for the Telemetry team is to set the following when deploying the overcloud:

 CeilometerMeterDispatcher: 'gnocchi'

I believe you already use a ceilometer.yaml file to enable event storage, correct?  If so, then you can just update it to the following:

parameter_defaults:
  CeilometerStoreEvents: true
  CeilometerMeterDispatcher: 'gnocchi'

Ronnie, can you try this as well?

Comment 5 arkady kanevsky 2017-02-13 19:13:34 UTC
TM,
thanks for quick response.
Is the change you are proposing is for CF config for OpenStack or for OpenStack config?

I am not aware that JS-6.x configures gnocchi at all. But let DellEMC team to confirm that.

In any case why is CF checking on what is in config instead of what services are running?

Comment 6 Tzu-Mainn Chen 2017-02-13 19:28:15 UTC
Hi Arkady!  I should have been more specific: this change is when deploying the overcloud; it has nothing to do with CloudForms.  This sort of deployment customization should already by done in order to enable event storage in Ceilometer; this workaround just adds one line to that customization in order to set Gnocchi as the metrics backend instead of Ceilometer.

Comment 7 michael_rasoulian 2017-02-13 21:15:17 UTC
Are you saying that ceilometer then is effectively not supported in CF 4.2/4.2.1?  When adding the OpenStack cloud provider in CF, should ceilometer  not be selected for the event endpoint?

I'm in the process of trying the above configuration change, but just want to get clarity on where CF stands in relation to support for ceilometer.  I have to run an update on my overcloud to implement it.

Comment 8 Tzu-Mainn Chen 2017-02-13 21:38:47 UTC
Not quite.  As of 4.2, CF supports *both* Ceilometer and Gnocchi for its metrics API.  This is independent of selecting Ceilometer as the event endpoint, as Gnocchi only replaces the Ceilometer metrics portion of the telemetry API.

What CF does in 4.2+ is the following:

a) Check to see if Gnocchi is registered as a service in the Keystone catalog
b) If so, use the Gnocchi API
c) If not, use the Ceilometer API

What happens in OSP9 is that by default, the overcloud registers the Gnocchi service but still stores metrics in Ceilometer.  The workaround in Comment 4 makes it so that when you create an overcloud, it will store metrics in Gnocchi instead.

OSP10+ resolves this discrepancy by automatically using Gnocchi for metrics as a default.

Let me know if this doesn't make sense!

Comment 9 arkady kanevsky 2017-02-13 22:06:28 UTC
Changing OpenStack configuration is not an option.
That is what has been tested and validated for JS-6.x.
We cannot go back and retest OpenStack for CF especially with Upgrade.

What is registered for keystone and what services are running in openstack are not the same thing. It is bad design to use keystone registration for decision making on where to collect data.

If we do not have an option that works in CF-4.2 configuration, quick patch (by today) for it or some other workaround we are pulling a plug on CF support for JS.

Will revisit for JS-10.

Comment 10 Randy Perryman 2017-02-13 23:38:21 UTC
Question what is OSP 10+ in relation to OSP 10 GA?

(In reply to Tzu-Mainn Chen from comment #8)
> Not quite.  As of 4.2, CF supports *both* Ceilometer and Gnocchi for its
> metrics API.  This is independent of selecting Ceilometer as the event
> endpoint, as Gnocchi only replaces the Ceilometer metrics portion of the
> telemetry API.
> 
> What CF does in 4.2+ is the following:
> 
> a) Check to see if Gnocchi is registered as a service in the Keystone catalog
> b) If so, use the Gnocchi API
> c) If not, use the Ceilometer API
> 
> What happens in OSP9 is that by default, the overcloud registers the Gnocchi
> service but still stores metrics in Ceilometer.  The workaround in Comment 4
> makes it so that when you create an overcloud, it will store metrics in
> Gnocchi instead.
> 
> OSP10+ resolves this discrepancy by automatically using Gnocchi for metrics
> as a default.
> 
> Let me know if this doesn't make sense!

Comment 11 Tzu-Mainn Chen 2017-02-14 00:55:45 UTC
Ah, I just meant OSP 10 and above.

Comment 12 Tzu-Mainn Chen 2017-02-14 01:09:04 UTC
Arkady, this is the solution that we discussed with the Telemetry team for a pretty tricky deprecation situation.  You're right that having a registered API endpoint in Keystone is not the same thing as what OpenStack services are actually running, but that's not what's happening here.  Both Gnocchi and Ceilometer are running *and* are registered in Keystone; however data is being fed into Ceilometer instead of Gnocchi.

What this configuration change does is simply feed data into Gnocchi instead, which I believe is the preferred Telemetry option anyway; there's a reason Ceilometer is being deprecated!

I'll talk to the engineer who worked on this feature to see if there are additional workarounds possible.

Comment 13 Tzu-Mainn Chen 2017-02-14 01:14:13 UTC
Also, just to be sure - are you saying that additional parameters for deploying the overcloud are a no-go?  If so, can we just verify that you're already setting the parameter that allows events to be collected by Ceilometer?  I know that's off by default in OSP9, but the method for turning it on when deploying an overcloud is documented.  Thanks!

Comment 14 arkady kanevsky 2017-02-14 02:32:33 UTC
Tzu-Mainn,
That is correct. JS-6.0.1 is OSP9 based and we do configure to use ceilometer and not gnocchi. David, can you please, confirm? It is possible that OSPd telemetry puppet is registering gnocchi with keystone. Gnocchi is not part of the JS-6.x release. Needless to say, CF-4.1 works just fine with that configuration and initial test results were uploaded to https://bugzilla.redhat.com/show_bug.cgi?id=1415544.

OpenStack portion of JS-6.0.1 testing and validated has been done already. It is the same as JS-6.0 that has been released almost 3 months ago except for a few bug fixes. But upgrade from JS-5 to JS-6 took a very long time that is why JS-6.0.1 release slipped by more than 2 months. Thus, no changes to openstack configuration can be done now.

Comment 15 Tzu-Mainn Chen 2017-02-14 03:59:39 UTC
Ah, I see.  Okay, we'll look at solutions on the CloudForms side.

There's one other possible solution, which I'm not sure if it would qualify as too invasive: simply have an extra step that deletes the Gnocchi service from Keystone by running 'openstack service delete gnocchi'.  If CloudForms no longer sees the gnocchi service in Keystone, it'll fall back to Ceilometer.

Comment 16 arkady kanevsky 2017-02-14 04:03:49 UTC
That maybe possible if
1.  this is part of CF deployment guide for JS-6.0.1
2. Has no side effect on openstack operation (expect so)
3. Does not cause upgrade issues from OSP9 to OSP10.

Need OSP upgrade team to comment on #3. (Mike Orazi)?

Comment 17 Tzu-Mainn Chen 2017-02-14 06:33:53 UTC
I assume that 1) is possible to update even now?  2) should be true, as the gnocchi isn't being used as the metrics store in OSP9 (causing this entire issue).  I imagine 3) shouldn't be a huge issue, as worst case the gnocchi service just has to be re-registered under Keystone, but confirmation from the upgrades team would be great!

Comment 18 Marios Andreou 2017-02-14 15:15:40 UTC
Hi Tzumainn, Arkady,

so wrt the upgrade there is one 'gnocchi specific' thing we do for mitaka->newton and which may be impacted here but we'd need someone from the telemetry team to confirm ('gnocchi-upgrade' in [5] more on this later).

The upgrade of the controllers happens in two steps... the "controller upgrade step", where the logic in [1][2][3][4][5][6] happens and then the later converge step where the puppet config is re-applied as defined by the tripleo-heat-templates/puppet modules you are using (i.e. as would happen during a normal stack update). So it could end up re-creating/configuring gnocchi under keystone if that is what your tripleo-heat-templates and puppet-tripleo are saying should happen (pointing out/fyi).

The thing we'd need telemetry to confirm is if the invocation of gnocchi-upgrade would be impacted by this... we are explicitly calling it in [5] as part of mitaka->newton. From a quick look and assuming I've understood that 'gnocchi-upgrade' is like [7] because of [8] it doesn't appear like it would cause problems, but that is just a first pass and it would be better for telemetry folks to comment  

Other than these things no I can't see why it would cause problems but we'd need to test it of course.

hope that helps for now? thanks 

[1] https://github.com/openstack/tripleo-heat-templates/blob/stable/newton/extraconfig/tasks/major_upgrade_controller_pacemaker_1.sh
[2] https://github.com/openstack/tripleo-heat-templates/blob/stable/newton/extraconfig/tasks/major_upgrade_controller_pacemaker_2.sh
[3] https://github.com/openstack/tripleo-heat-templates/blob/stable/newton/extraconfig/tasks/major_upgrade_controller_pacemaker_3.sh
[4] https://github.com/openstack/tripleo-heat-templates/blob/stable/newton/extraconfig/tasks/major_upgrade_controller_pacemaker_4.sh
[5] https://github.com/openstack/tripleo-heat-templates/blob/stable/newton/extraconfig/tasks/major_upgrade_controller_pacemaker_5.sh
[6] https://github.com/openstack/tripleo-heat-templates/blob/stable/newton/extraconfig/tasks/major_upgrade_controller_pacemaker_6.sh
[7] https://github.com/openstack/gnocchi/blob/50433d83829f058f4d19bc94cae2e258ab7efe79/gnocchi/cli.py#L50
[8] https://github.com/openstack/gnocchi/blob/c658dbb21cb05e953087f0718573ab1be03124fd/setup.cfg#L130

Comment 19 arkady kanevsky 2017-02-14 15:55:52 UTC
Adding Wayne to take a look at gnocchi upgrade comments.

Comment 20 Tzu-Mainn Chen 2017-02-14 16:28:21 UTC
I talked to an engineer on the Telemetry team.  He commented that upgrades should be fine as long as Gnocchi was re-added into Keystone before the upgrade process.

However, without Gnocchi, Aodh won't work in OSP9.  Are you guys using alarms?  If so, we can try and come up with a CF-only solution.  Marek has proposed one already: add a setting to CloudForms.  By default, this setting will use the current CF behaviour; it can also be set to Gnocchi or Ceilometer to force either service.

The downside is if Gnocchi or Ceilometer is set, and CF has multiple cloud providers registered, then it potentially won't be able to read metrics from all of them - say, if one cloud provider is OSP9/Ceilometer based and another is OSP10/Gnocchi based.

Let me know what you think of these solutions.  We'll be working on the patch for the latter solution in the meantime in case that's what you decide.

Comment 21 Marek Aufart 2017-02-14 19:32:24 UTC
Adding an upstream PR with CF side fix described in Comment #20.

https://github.com/ManageIQ/manageiq/pull/13918

Comment 22 michael_rasoulian 2017-02-14 19:37:19 UTC
After removing the gnocchi service, I am seeing metrics data populate in CF.  I see the data in the logs and when generating a chargeback report.

However, I'm assuming the fix above makes this unnecessary?  What CF release would that be merged into?

Comment 23 Tzu-Mainn Chen 2017-02-14 19:48:37 UTC
The fix above does make this unnecessary; it'll provide a way for you to force the use of ceilometer for metrics collection.  Note that this will prevent CF from collecting metrics from two different OpenStack cloud providers if one uses gnocchi and the other uses ceilometer; to handle that case you'll want to leave the setting on 'auto' and remove the gnocchi service from keystone.

We estimate that this fix can make it into 4.2.1.

Comment 24 michael_rasoulian 2017-02-14 19:58:20 UTC
Thanks Tzu-Mainn.  Can we still expect to get a build of 4.2.1 today, including this fix?

Comment 26 Tzu-Mainn Chen 2017-02-14 21:04:44 UTC
4.2.1 is being built right now.  I'll check to see if it'll be ready today.

Comment 27 Tzu-Mainn Chen 2017-02-14 22:21:58 UTC
It looks like it may slip until tomorrow.

Comment 28 michael_rasoulian 2017-02-15 14:49:19 UTC
Thanks for all your timely support Tzu-Mainn.  Please let me know once the build is ready.

Comment 29 Ronnie Rasouli 2017-05-11 07:06:14 UTC
Tested on RHOS11 + CF 5.8.0.13 Utilization appears from cloud instance.
Be aware to configure gnocchi on advanced settings, metric collection

Comment 30 David Paterson 2017-06-20 14:09:08 UTC
I could use a recap on this one.

By default ceilometer and gnocchi are enabled by default but by default ceilometer is where metrics are stored as of OSP9.

With OSP 10  metrics are now stored in Gnocchi by default?

If so what is upgrade path when going from 9 to 10?  That is still not clear to me.  What steps need to be taken so the customer does not lose data and cloud forms doesn't break?

Comment 31 Tzu-Mainn Chen 2017-06-20 14:30:59 UTC
Hi!  In OSP10, metrics are accessed through Gnocchi for the overcloud and Ceilometer for the undercloud.  You'll want to revert the setting to 'auto', and things should just work after that.