Created attachment 1942553 [details] ceilometer config OSP17.1 | Ceilometer doesn't send data to Gnocchi. I have freshly installed OSP17.1 with two instances up and running and ceilometer configured to send data to gnocchi. gnocchi metric list command returns empty. After restarting ceilometer container metrics start to flow W/A Restart ceilometer container Attached files: 1.ceilometer conf files 2.ceilometer logs 3.gnocchi logs
The archive policy used in the configuration is `ceilometer-high` ~~~ # cat pipeline.yaml --- sources: - name: meter_source meters: - "*" sinks: - meter_sink sinks: - name: meter_sink publishers: - gnocchi://?filter_project=service&archive_policy=ceilometer-high - notifier://172.17.1.73:5666/?driver=amqp&topic=osp17-metering # cat event_pipeline.yaml --- sources: - name: event_source events: - "*" sinks: - event_sink sinks: - name: event_sink transformers: triggers: publishers: - gnocchi://?filter_project=service&archive_policy=ceilometer-high - notifier://172.17.1.73:5666/?driver=amqp&topic=osp17-event ~~~ No such archive policy exists in gnocchi which should've been generated during "ceilometer-upgrade". However, it doesn't complain/log anything about the incoming metrics having an undefined archive policy. ~~~ $ openstack metric archive-policy list +--------+-------------+-----------------------------------------------------------------------+---------------------------------+ | name | back_window | definition | aggregation_methods | +--------+-------------+-----------------------------------------------------------------------+---------------------------------+ | bool | 3600 | - timespan: 365 days, 0:00:00, granularity: 0:00:01, points: 31536000 | last | | high | 0 | - timespan: 1:00:00, granularity: 0:00:01, points: 3600 | min, mean, count, max, sum, std | | | | - timespan: 7 days, 0:00:00, granularity: 0:01:00, points: 10080 | | | | | - timespan: 365 days, 0:00:00, granularity: 1:00:00, points: 8760 | | | low | 0 | - timespan: 30 days, 0:00:00, granularity: 0:05:00, points: 8640 | min, mean, count, max, sum, std | | medium | 0 | - timespan: 7 days, 0:00:00, granularity: 0:01:00, points: 10080 | min, mean, count, max, sum, std | | | | - timespan: 365 days, 0:00:00, granularity: 1:00:00, points: 8760 | | +--------+-------------+-----------------------------------------------------------------------+---------------------------------+ ~~~ Upon restarting notification agent on one of the ctrl nodes, the missing policies were created ~~~ $ openstack metric archive-policy list +----------------------+-------------+-----------------------------------------------------------------------+---------------------------------+ | name | back_window | definition | aggregation_methods | +----------------------+-------------+-----------------------------------------------------------------------+---------------------------------+ | bool | 3600 | - timespan: 365 days, 0:00:00, granularity: 0:00:01, points: 31536000 | last | | ceilometer-high | 0 | - timespan: 1:00:00, granularity: 0:00:01, points: 3600 | mean | | | | - timespan: 1 day, 0:00:00, granularity: 0:01:00, points: 1440 | | | | | - timespan: 365 days, 0:00:00, granularity: 1:00:00, points: 8760 | | | ceilometer-high-rate | 0 | - timespan: 1:00:00, granularity: 0:00:01, points: 3600 | mean, rate:mean | | | | - timespan: 1 day, 0:00:00, granularity: 0:01:00, points: 1440 | | | | | - timespan: 365 days, 0:00:00, granularity: 1:00:00, points: 8760 | | | ceilometer-low | 0 | - timespan: 30 days, 0:00:00, granularity: 0:05:00, points: 8640 | mean | | ceilometer-low-rate | 0 | - timespan: 30 days, 0:00:00, granularity: 0:05:00, points: 8640 | mean, rate:mean | | high | 0 | - timespan: 1:00:00, granularity: 0:00:01, points: 3600 | mean, count, max, min, sum, std | | | | - timespan: 7 days, 0:00:00, granularity: 0:01:00, points: 10080 | | | | | - timespan: 365 days, 0:00:00, granularity: 1:00:00, points: 8760 | | | low | 0 | - timespan: 30 days, 0:00:00, granularity: 0:05:00, points: 8640 | mean, count, max, min, sum, std | | medium | 0 | - timespan: 7 days, 0:00:00, granularity: 0:01:00, points: 10080 | mean, count, max, min, sum, std | | | | - timespan: 365 days, 0:00:00, granularity: 1:00:00, points: 8760 | | +----------------------+-------------+-----------------------------------------------------------------------+---------------------------------+ ~~~
Couldn't reproduce this issues. Closing as NOTABUG. If will be consistent reproduction will file a new BZ.
It seems that during deployment, keystone didn't respond to ceilometer's request to obtain gnocchi endpoint using gnocchiclient [1] ~~~ 2023-05-08 18:33:49.147 14 WARNING keystoneauth.identity.generic.base [-] Failed to discover available identity versions when contacting http://172.17.1.82:5000. Attempting to parse version from URL.: keystoneauth1.exceptions.connection.ConnectTimeout: Request to http://172.17.1.82:5000 timed out 2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base [-] Unable to load publisher gnocchi://?filter_project=service&archive_policy=ceilometer-high: keystoneauth1.exceptions.discovery.DiscoveryFailure: Could not find versioned identity endpoints when attempting to authenticate. Please check that your auth_url is correct. Request to http://172.17.1.82:5000 timed out 2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base Traceback (most recent call last): 2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 445, in _make_request 2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base six.raise_from(e, None) 2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base File "<string>", line 3, in raise_from 2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 440, in _make_request 2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base httplib_response = conn.getresponse() 2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base File "/usr/lib64/python3.9/http/client.py", line 1377, in getresponse 2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base response.begin() 2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base File "/usr/lib64/python3.9/http/client.py", line 320, in begin 2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base version, status, reason = self._read_status() 2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base File "/usr/lib64/python3.9/http/client.py", line 281, in _read_status 2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") 2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base File "/usr/lib64/python3.9/socket.py", line 704, in readinto 2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base return self._sock.recv_into(b) 2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base socket.timeout: timed out 2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base ~~~ Since ceilometer couldn't get gnocchiclient[2] with proper auth values, it couldn't create the necessary archive policies[3] Restarting agent_notification service after deployment fixes this because by that time keystone is healthy and responding. This seems intermittent because ceilometer & gnocchi services are spawned during step 4 & 5 till then keystone should be completely operational. [1] https://github.com/openstack/ceilometer/blob/stable/wallaby/ceilometer/gnocchi_client.py#L36-L39 [2] https://github.com/openstack/ceilometer/blob/stable/wallaby/ceilometer/publisher/gnocchi.py#L216-L217 [3] https://github.com/openstack/ceilometer/blob/stable/wallaby/ceilometer/publisher/gnocchi.py#L252
Since we saw this issue happening again I am going to resurrect this bug . I will lower the priority and severity to medium since there is a clear work around for this issue and since it seems to happen only sometimes. Probably a rise condition ?
Bulk moving target milestone to GA after the release of Beta on 14th June '23.
Shifting this to 17.1 z2 due to z1 being only for urgent bugs and this has missed beta and GA.
Root cause of this issue ~~~ 2023-08-06 09:47:24.788 14 WARNING keystoneauth.identity.generic.base [-] Failed to discover available identity versions when contacting http://172.17.1.57:5000. Attempting to parse version from URL.: keystoneauth1.exceptions.connection.ConnectTimeout: Request to http://172.17.1.57:5000 timed out 2023-08-06 09:47:24.792 14 ERROR ceilometer.pipeline.base [-] Unable to load publisher gnocchi://?filter_project=service&archive_policy=ceilometer-high: keystoneauth1.exceptions.discovery.DiscoveryFailure: Could not find versioned identity endpoints when attempting to authenticate. Please check that your auth_url is correct. Request to http://172.17.1.57:5000 timed out ~~~ When notification service is initialized and keystone service is not available, ceilometer will not be able to fetch the endpoint for gnocchi and assumes the gnocchi publisher is invalid. If any ceilometer fails to load any publisher it will not send metrics to it. Which is why no metrics are found in gnocchi. Restarting notification service will reload all publishers which fixes this issue.