Bug 2167428 - [RHOSP 17.1] Ceilometer doesn't send data to Gnocchi. [NEEDINFO]
Summary: [RHOSP 17.1] Ceilometer doesn't send data to Gnocchi.
Keywords:
Status: ON_DEV
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ceilometer
Version: 17.1 (Wallaby)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: z2
: 17.1
Assignee: Yadnesh Kulkarni
QA Contact: Leonid Natapov
mgeary
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-02-06 15:41 UTC by Leonid Natapov
Modified: 2023-08-14 14:18 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
During a new deployment, the keystone service is often not available when the agent-notification service is initializing. This prevents ceilometer from discovering the gnocchi endpoint. As a result, metrics are not sent to gnocchi.
Clone Of:
Environment:
Last Closed: 2023-04-17 12:05:41 UTC
Target Upstream Version:
Embargoed:
jamsmith: needinfo? (ykulkarn)


Attachments (Terms of Use)
ceilometer config (1.04 KB, application/gzip)
2023-02-06 15:41 UTC, Leonid Natapov
no flags Details


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 885690 0 None MERGED Make multiple attempts to obtain gnocchiclient 2023-08-07 05:25:04 UTC
Red Hat Issue Tracker OSP-22090 0 None None None 2023-02-06 15:43:45 UTC

Description Leonid Natapov 2023-02-06 15:41:40 UTC
Created attachment 1942553 [details]
ceilometer config

OSP17.1 | Ceilometer doesn't send data to Gnocchi.

I have freshly installed OSP17.1 with two instances up and running and ceilometer configured to send data to gnocchi.  gnocchi metric list command returns empty.
After restarting ceilometer container metrics start to flow

W/A
Restart ceilometer container

Attached files:
1.ceilometer conf files
2.ceilometer logs
3.gnocchi logs

Comment 3 Yadnesh Kulkarni 2023-02-07 08:19:18 UTC
The archive policy used in the configuration is `ceilometer-high`
~~~
# cat pipeline.yaml 
---
sources:
    - name: meter_source
      meters:
          - "*"
      sinks:
          - meter_sink
sinks:
    - name: meter_sink
      publishers:
          - gnocchi://?filter_project=service&archive_policy=ceilometer-high
          - notifier://172.17.1.73:5666/?driver=amqp&topic=osp17-metering

# cat event_pipeline.yaml 
---
sources:
    - name: event_source
      events:
          - "*"
      sinks:
          - event_sink
sinks:
    - name: event_sink
      transformers:
      triggers:
      publishers:
          - gnocchi://?filter_project=service&archive_policy=ceilometer-high
          - notifier://172.17.1.73:5666/?driver=amqp&topic=osp17-event
~~~

No such archive policy exists in gnocchi which should've been generated during "ceilometer-upgrade".
However, it doesn't complain/log anything about the incoming metrics having an undefined archive policy.
~~~
$ openstack metric archive-policy list
+--------+-------------+-----------------------------------------------------------------------+---------------------------------+
| name   | back_window | definition                                                            | aggregation_methods             |
+--------+-------------+-----------------------------------------------------------------------+---------------------------------+
| bool   |        3600 | - timespan: 365 days, 0:00:00, granularity: 0:00:01, points: 31536000 | last                            |
| high   |           0 | - timespan: 1:00:00, granularity: 0:00:01, points: 3600               | min, mean, count, max, sum, std |
|        |             | - timespan: 7 days, 0:00:00, granularity: 0:01:00, points: 10080      |                                 |
|        |             | - timespan: 365 days, 0:00:00, granularity: 1:00:00, points: 8760     |                                 |
| low    |           0 | - timespan: 30 days, 0:00:00, granularity: 0:05:00, points: 8640      | min, mean, count, max, sum, std |
| medium |           0 | - timespan: 7 days, 0:00:00, granularity: 0:01:00, points: 10080      | min, mean, count, max, sum, std |
|        |             | - timespan: 365 days, 0:00:00, granularity: 1:00:00, points: 8760     |                                 |
+--------+-------------+-----------------------------------------------------------------------+---------------------------------+
~~~


Upon restarting notification agent on one of the ctrl nodes, the missing policies were created
~~~
$ openstack metric archive-policy list
+----------------------+-------------+-----------------------------------------------------------------------+---------------------------------+
| name                 | back_window | definition                                                            | aggregation_methods             |
+----------------------+-------------+-----------------------------------------------------------------------+---------------------------------+
| bool                 |        3600 | - timespan: 365 days, 0:00:00, granularity: 0:00:01, points: 31536000 | last                            |
| ceilometer-high      |           0 | - timespan: 1:00:00, granularity: 0:00:01, points: 3600               | mean                            |
|                      |             | - timespan: 1 day, 0:00:00, granularity: 0:01:00, points: 1440        |                                 |
|                      |             | - timespan: 365 days, 0:00:00, granularity: 1:00:00, points: 8760     |                                 |
| ceilometer-high-rate |           0 | - timespan: 1:00:00, granularity: 0:00:01, points: 3600               | mean, rate:mean                 |
|                      |             | - timespan: 1 day, 0:00:00, granularity: 0:01:00, points: 1440        |                                 |
|                      |             | - timespan: 365 days, 0:00:00, granularity: 1:00:00, points: 8760     |                                 |
| ceilometer-low       |           0 | - timespan: 30 days, 0:00:00, granularity: 0:05:00, points: 8640      | mean                            |
| ceilometer-low-rate  |           0 | - timespan: 30 days, 0:00:00, granularity: 0:05:00, points: 8640      | mean, rate:mean                 |
| high                 |           0 | - timespan: 1:00:00, granularity: 0:00:01, points: 3600               | mean, count, max, min, sum, std |
|                      |             | - timespan: 7 days, 0:00:00, granularity: 0:01:00, points: 10080      |                                 |
|                      |             | - timespan: 365 days, 0:00:00, granularity: 1:00:00, points: 8760     |                                 |
| low                  |           0 | - timespan: 30 days, 0:00:00, granularity: 0:05:00, points: 8640      | mean, count, max, min, sum, std |
| medium               |           0 | - timespan: 7 days, 0:00:00, granularity: 0:01:00, points: 10080      | mean, count, max, min, sum, std |
|                      |             | - timespan: 365 days, 0:00:00, granularity: 1:00:00, points: 8760     |                                 |
+----------------------+-------------+-----------------------------------------------------------------------+---------------------------------+
~~~

Comment 7 Leonid Natapov 2023-04-17 12:05:41 UTC
Couldn't reproduce this issues. Closing as NOTABUG.
If will be consistent reproduction will file a new BZ.

Comment 8 Yadnesh Kulkarni 2023-05-09 14:21:45 UTC
It seems that during deployment, keystone didn't respond to ceilometer's request to obtain gnocchi endpoint using gnocchiclient [1]
~~~
2023-05-08 18:33:49.147 14 WARNING keystoneauth.identity.generic.base [-] Failed to discover available identity versions when contacting http://172.17.1.82:5000. Attempting to parse version from URL.: keystoneauth1.exceptions.connection.ConnectTimeout: Request to http://172.17.1.82:5000 timed out
2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base [-] Unable to load publisher gnocchi://?filter_project=service&archive_policy=ceilometer-high: keystoneauth1.exceptions.discovery.DiscoveryFailure: Could not find versioned identity endpoints when attempting to authenticate. Please check that your auth_url is correct. Request to http://172.17.1.82:5000 timed out
2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base Traceback (most recent call last):
2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base   File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 445, in _make_request
2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base     six.raise_from(e, None)
2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base   File "<string>", line 3, in raise_from
2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base   File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 440, in _make_request
2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base     httplib_response = conn.getresponse()
2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base   File "/usr/lib64/python3.9/http/client.py", line 1377, in getresponse
2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base     response.begin()
2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base   File "/usr/lib64/python3.9/http/client.py", line 320, in begin
2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base     version, status, reason = self._read_status()
2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base   File "/usr/lib64/python3.9/http/client.py", line 281, in _read_status
2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base     line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base   File "/usr/lib64/python3.9/socket.py", line 704, in readinto
2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base     return self._sock.recv_into(b)
2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base socket.timeout: timed out
2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base 
~~~

Since ceilometer couldn't get gnocchiclient[2] with proper auth values, it couldn't create the necessary archive policies[3]

Restarting agent_notification service after deployment fixes this because by that time keystone is healthy and responding. This seems intermittent because ceilometer & gnocchi services
are spawned during step 4 & 5 till then keystone should be completely operational.

[1] https://github.com/openstack/ceilometer/blob/stable/wallaby/ceilometer/gnocchi_client.py#L36-L39
[2] https://github.com/openstack/ceilometer/blob/stable/wallaby/ceilometer/publisher/gnocchi.py#L216-L217
[3] https://github.com/openstack/ceilometer/blob/stable/wallaby/ceilometer/publisher/gnocchi.py#L252

Comment 9 Leonid Natapov 2023-05-09 16:30:44 UTC
Since we saw this issue happening again I am going to resurrect this bug . I will lower the priority and severity to medium since there is a clear work around for this issue and since it seems to happen only sometimes. Probably a rise condition ?

Comment 10 Lukas Svaty 2023-06-16 08:13:29 UTC
Bulk moving target milestone to GA after the release of Beta on 14th June '23.

Comment 11 Leif Madsen 2023-06-16 18:26:29 UTC
Shifting this to 17.1 z2 due to z1 being only for urgent bugs and this has missed beta and GA.

Comment 13 Yadnesh Kulkarni 2023-08-07 05:25:04 UTC
Root cause of this issue

~~~
2023-08-06 09:47:24.788 14 WARNING keystoneauth.identity.generic.base [-] Failed to discover available identity versions when contacting http://172.17.1.57:5000. Attempting to parse version from URL.: keystoneauth1.exceptions.connection.ConnectTimeout: Request to http://172.17.1.57:5000 timed out

2023-08-06 09:47:24.792 14 ERROR ceilometer.pipeline.base [-] Unable to load publisher gnocchi://?filter_project=service&archive_policy=ceilometer-high: keystoneauth1.exceptions.discovery.DiscoveryFailure: Could not find versioned identity endpoints when attempting to authenticate. Please check that your auth_url is correct. Request to http://172.17.1.57:5000 timed out
~~~

When notification service is initialized and keystone service is not available, ceilometer will not be able to fetch the endpoint for gnocchi and assumes the gnocchi publisher is invalid. If any ceilometer fails to load any publisher it will not send metrics to it. Which is why no metrics are found in gnocchi. Restarting notification service will reload all publishers which fixes this issue.


Note You need to log in before you can comment on or make changes to this bug.