Bug 2133027 - [RHOSP 17.1] aodh uses deprecated gnocchi api to aggregate metrics and doesn't work properly
Summary: [RHOSP 17.1] aodh uses deprecated gnocchi api to aggregate metrics and doesn'...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-aodh
Version: 17.1 (Wallaby)
Hardware: All
OS: Linux
high
high
Target Milestone: beta
: 17.1
Assignee: Yadnesh Kulkarni
QA Contact: Leonid Natapov
mgeary
URL:
Whiteboard:
Depends On:
Blocks: 2093375 2133029 2133030
TreeView+ depends on / blocked
 
Reported: 2022-10-07 14:15 UTC by Leif Madsen
Modified: 2023-08-16 01:12 UTC (History)
6 users (show)

Fixed In Version: openstack-aodh-12.0.1-1.20221111060713.ef6b6c8.el9ost python-aodhclient-2.2.0-1.20221118091109.b747ae3.el9ost
Doc Type: Bug Fix
Doc Text:
The Alarming service (aodh) uses the deprecated gnocchi API to aggregate metrics, which results in incorrect metric measures of CPU usage in gnocchi. With this update, dynamic aggregation in gnocchi supports the ability to make re-aggregations of existing metrics and the ability to manipulate and transform metrics as required. CPU time in gnocchi is correctly calculated.
Clone Of:
: 2133029 (view as bug list)
Environment:
Last Closed: 2023-08-16 01:12:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad.net 1946793 0 None None None 2022-10-07 14:18:02 UTC
OpenStack gerrit 829870 0 None MERGED gnocchi: Use Dynamic Aggregates API 2022-10-07 14:18:02 UTC
OpenStack gerrit 863796 0 None MERGED Ignore Gnocchi API error when the metric is not yet created 2022-11-18 10:08:31 UTC
Red Hat Issue Tracker OSP-19245 0 None None None 2022-10-07 14:20:18 UTC
Red Hat Product Errata RHEA-2023:4577 0 None None None 2023-08-16 01:12:53 UTC

Description Leif Madsen 2022-10-07 14:15:51 UTC
Contents originally posted in upstream issue tracker at https://bugs.launchpad.net/aodh/+bug/1946793

~~~~

On gnocchi API docs, there are 2 API methods to aggregate metrics

1. /v1/aggregation/metric?

See: https://gnocchi.osci.io/rest.html#aggregation-across-metrics-deprecated

This one is deprecated

2. /v1/aggregates?

See: https://gnocchi.osci.io/rest.html#dynamic-aggregates

aodh uses the 1st one to aggregate metrics, for example:

```
        if isinstance(start, datetime.datetime):
            start = start.isoformat()
        if isinstance(stop, datetime.datetime):
            stop = stop.isoformat()

        params = dict(start=start, stop=stop, aggregation=aggregation,
                      reaggregation=reaggregation, granularity=granularity,
                      needed_overlap=needed_overlap, groupby=groupby,
                      refresh=refresh, resample=resample, fill=fill)
        if query is None:
            for metric in metrics:
                self._ensure_metric_is_uuid(metric)
            params['metric'] = metrics
            measures = self._get("v1/aggregation/metric",
                                 params=params).json()
```

aodh doesn't work properly in our production environment after upgraded to Ussuri.

When there is only 1 instance, aodh works properly and alarms can be triggered when the load on the instance is higher than the threshold.

However, after the stack is scaled up, and the second instance is created. The average cpu usage got from gnocchi by aodh evaluator is not correct. The metric measures are negative sometimes.

I manually pulled metrics with gnocchi command

The aggregation of metrics is correct with command

```
openstack metric aggregates
```

It uses new API in the backend

The aggregation of metrics is not correct with command

```
openstack metric measures aggregation
```

It uses the deprecated API which aodh is using.

Comment 2 Leif Madsen 2022-10-07 14:18:54 UTC
Changes are merged upstream, but currently blocked on merge at Xena. Needs further backports to Wallaby and Train, and downstream imports as well.

Comment 3 Leif Madsen 2022-10-07 14:30:02 UTC
Yadnesh and I were both able to reproduce this downstream and upstream in our respective environments, and blocks the ability for administrators to leverage Aodh for alarming in the autoscaling use-case.

Comment 12 Leonid Natapov 2023-05-07 15:29:41 UTC
No negative values in aodh logs

python3-aodh-12.0.1-1.20221215060810.9ac090b.el9ost.noarch
openstack-aodh-common-12.0.1-1.20221215060810.9ac090b.el9ost.noarch
openstack-aodh-api-12.0.1-1.20221215060810.9ac090b.el9ost.noarch

Comment 21 errata-xmlrpc 2023-08-16 01:12:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:4577


Note You need to log in before you can comment on or make changes to this bug.