Contents originally posted in upstream issue tracker at https://bugs.launchpad.net/aodh/+bug/1946793 ~~~~ On gnocchi API docs, there are 2 API methods to aggregate metrics 1. /v1/aggregation/metric? See: https://gnocchi.osci.io/rest.html#aggregation-across-metrics-deprecated This one is deprecated 2. /v1/aggregates? See: https://gnocchi.osci.io/rest.html#dynamic-aggregates aodh uses the 1st one to aggregate metrics, for example: ``` if isinstance(start, datetime.datetime): start = start.isoformat() if isinstance(stop, datetime.datetime): stop = stop.isoformat() params = dict(start=start, stop=stop, aggregation=aggregation, reaggregation=reaggregation, granularity=granularity, needed_overlap=needed_overlap, groupby=groupby, refresh=refresh, resample=resample, fill=fill) if query is None: for metric in metrics: self._ensure_metric_is_uuid(metric) params['metric'] = metrics measures = self._get("v1/aggregation/metric", params=params).json() ``` aodh doesn't work properly in our production environment after upgraded to Ussuri. When there is only 1 instance, aodh works properly and alarms can be triggered when the load on the instance is higher than the threshold. However, after the stack is scaled up, and the second instance is created. The average cpu usage got from gnocchi by aodh evaluator is not correct. The metric measures are negative sometimes. I manually pulled metrics with gnocchi command The aggregation of metrics is correct with command ``` openstack metric aggregates ``` It uses new API in the backend The aggregation of metrics is not correct with command ``` openstack metric measures aggregation ``` It uses the deprecated API which aodh is using.
Changes are merged upstream, but currently blocked on merge at Xena. Needs further backports to Wallaby and Train, and downstream imports as well.
Yadnesh and I were both able to reproduce this downstream and upstream in our respective environments, and blocks the ability for administrators to leverage Aodh for alarming in the autoscaling use-case.
No negative values in aodh logs python3-aodh-12.0.1-1.20221215060810.9ac090b.el9ost.noarch openstack-aodh-common-12.0.1-1.20221215060810.9ac090b.el9ost.noarch openstack-aodh-api-12.0.1-1.20221215060810.9ac090b.el9ost.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2023:4577