Bug 2167428

Summary: [RHOSP 17.1] Ceilometer doesn't send data to Gnocchi.
Product: Red Hat OpenStack Reporter: Leonid Natapov <lnatapov>
Component: openstack-ceilometerAssignee: Yadnesh Kulkarni <ykulkarn>
Status: ON_DEV --- QA Contact: Leonid Natapov <lnatapov>
Severity: medium Docs Contact: mgeary <mgeary>
Priority: medium    
Version: 17.1 (Wallaby)CC: apevec, erpeters, jamsmith, lmadsen, mrunge, ykulkarn
Target Milestone: z2Keywords: Reopened, Triaged
Target Release: 17.1Flags: jamsmith: needinfo? (ykulkarn)
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
During a new deployment, the keystone service is often not available when the agent-notification service is initializing. This prevents ceilometer from discovering the gnocchi endpoint. As a result, metrics are not sent to gnocchi.
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-04-17 12:05:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
ceilometer config none

Description Leonid Natapov 2023-02-06 15:41:40 UTC
Created attachment 1942553 [details]
ceilometer config

OSP17.1 | Ceilometer doesn't send data to Gnocchi.

I have freshly installed OSP17.1 with two instances up and running and ceilometer configured to send data to gnocchi.  gnocchi metric list command returns empty.
After restarting ceilometer container metrics start to flow

W/A
Restart ceilometer container

Attached files:
1.ceilometer conf files
2.ceilometer logs
3.gnocchi logs

Comment 3 Yadnesh Kulkarni 2023-02-07 08:19:18 UTC
The archive policy used in the configuration is `ceilometer-high`
~~~
# cat pipeline.yaml 
---
sources:
    - name: meter_source
      meters:
          - "*"
      sinks:
          - meter_sink
sinks:
    - name: meter_sink
      publishers:
          - gnocchi://?filter_project=service&archive_policy=ceilometer-high
          - notifier://172.17.1.73:5666/?driver=amqp&topic=osp17-metering

# cat event_pipeline.yaml 
---
sources:
    - name: event_source
      events:
          - "*"
      sinks:
          - event_sink
sinks:
    - name: event_sink
      transformers:
      triggers:
      publishers:
          - gnocchi://?filter_project=service&archive_policy=ceilometer-high
          - notifier://172.17.1.73:5666/?driver=amqp&topic=osp17-event
~~~

No such archive policy exists in gnocchi which should've been generated during "ceilometer-upgrade".
However, it doesn't complain/log anything about the incoming metrics having an undefined archive policy.
~~~
$ openstack metric archive-policy list
+--------+-------------+-----------------------------------------------------------------------+---------------------------------+
| name   | back_window | definition                                                            | aggregation_methods             |
+--------+-------------+-----------------------------------------------------------------------+---------------------------------+
| bool   |        3600 | - timespan: 365 days, 0:00:00, granularity: 0:00:01, points: 31536000 | last                            |
| high   |           0 | - timespan: 1:00:00, granularity: 0:00:01, points: 3600               | min, mean, count, max, sum, std |
|        |             | - timespan: 7 days, 0:00:00, granularity: 0:01:00, points: 10080      |                                 |
|        |             | - timespan: 365 days, 0:00:00, granularity: 1:00:00, points: 8760     |                                 |
| low    |           0 | - timespan: 30 days, 0:00:00, granularity: 0:05:00, points: 8640      | min, mean, count, max, sum, std |
| medium |           0 | - timespan: 7 days, 0:00:00, granularity: 0:01:00, points: 10080      | min, mean, count, max, sum, std |
|        |             | - timespan: 365 days, 0:00:00, granularity: 1:00:00, points: 8760     |                                 |
+--------+-------------+-----------------------------------------------------------------------+---------------------------------+
~~~


Upon restarting notification agent on one of the ctrl nodes, the missing policies were created
~~~
$ openstack metric archive-policy list
+----------------------+-------------+-----------------------------------------------------------------------+---------------------------------+
| name                 | back_window | definition                                                            | aggregation_methods             |
+----------------------+-------------+-----------------------------------------------------------------------+---------------------------------+
| bool                 |        3600 | - timespan: 365 days, 0:00:00, granularity: 0:00:01, points: 31536000 | last                            |
| ceilometer-high      |           0 | - timespan: 1:00:00, granularity: 0:00:01, points: 3600               | mean                            |
|                      |             | - timespan: 1 day, 0:00:00, granularity: 0:01:00, points: 1440        |                                 |
|                      |             | - timespan: 365 days, 0:00:00, granularity: 1:00:00, points: 8760     |                                 |
| ceilometer-high-rate |           0 | - timespan: 1:00:00, granularity: 0:00:01, points: 3600               | mean, rate:mean                 |
|                      |             | - timespan: 1 day, 0:00:00, granularity: 0:01:00, points: 1440        |                                 |
|                      |             | - timespan: 365 days, 0:00:00, granularity: 1:00:00, points: 8760     |                                 |
| ceilometer-low       |           0 | - timespan: 30 days, 0:00:00, granularity: 0:05:00, points: 8640      | mean                            |
| ceilometer-low-rate  |           0 | - timespan: 30 days, 0:00:00, granularity: 0:05:00, points: 8640      | mean, rate:mean                 |
| high                 |           0 | - timespan: 1:00:00, granularity: 0:00:01, points: 3600               | mean, count, max, min, sum, std |
|                      |             | - timespan: 7 days, 0:00:00, granularity: 0:01:00, points: 10080      |                                 |
|                      |             | - timespan: 365 days, 0:00:00, granularity: 1:00:00, points: 8760     |                                 |
| low                  |           0 | - timespan: 30 days, 0:00:00, granularity: 0:05:00, points: 8640      | mean, count, max, min, sum, std |
| medium               |           0 | - timespan: 7 days, 0:00:00, granularity: 0:01:00, points: 10080      | mean, count, max, min, sum, std |
|                      |             | - timespan: 365 days, 0:00:00, granularity: 1:00:00, points: 8760     |                                 |
+----------------------+-------------+-----------------------------------------------------------------------+---------------------------------+
~~~

Comment 7 Leonid Natapov 2023-04-17 12:05:41 UTC
Couldn't reproduce this issues. Closing as NOTABUG.
If will be consistent reproduction will file a new BZ.

Comment 8 Yadnesh Kulkarni 2023-05-09 14:21:45 UTC
It seems that during deployment, keystone didn't respond to ceilometer's request to obtain gnocchi endpoint using gnocchiclient [1]
~~~
2023-05-08 18:33:49.147 14 WARNING keystoneauth.identity.generic.base [-] Failed to discover available identity versions when contacting http://172.17.1.82:5000. Attempting to parse version from URL.: keystoneauth1.exceptions.connection.ConnectTimeout: Request to http://172.17.1.82:5000 timed out
2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base [-] Unable to load publisher gnocchi://?filter_project=service&archive_policy=ceilometer-high: keystoneauth1.exceptions.discovery.DiscoveryFailure: Could not find versioned identity endpoints when attempting to authenticate. Please check that your auth_url is correct. Request to http://172.17.1.82:5000 timed out
2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base Traceback (most recent call last):
2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base   File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 445, in _make_request
2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base     six.raise_from(e, None)
2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base   File "<string>", line 3, in raise_from
2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base   File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 440, in _make_request
2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base     httplib_response = conn.getresponse()
2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base   File "/usr/lib64/python3.9/http/client.py", line 1377, in getresponse
2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base     response.begin()
2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base   File "/usr/lib64/python3.9/http/client.py", line 320, in begin
2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base     version, status, reason = self._read_status()
2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base   File "/usr/lib64/python3.9/http/client.py", line 281, in _read_status
2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base     line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base   File "/usr/lib64/python3.9/socket.py", line 704, in readinto
2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base     return self._sock.recv_into(b)
2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base socket.timeout: timed out
2023-05-08 18:33:49.150 14 ERROR ceilometer.pipeline.base 
~~~

Since ceilometer couldn't get gnocchiclient[2] with proper auth values, it couldn't create the necessary archive policies[3]

Restarting agent_notification service after deployment fixes this because by that time keystone is healthy and responding. This seems intermittent because ceilometer & gnocchi services
are spawned during step 4 & 5 till then keystone should be completely operational.

[1] https://github.com/openstack/ceilometer/blob/stable/wallaby/ceilometer/gnocchi_client.py#L36-L39
[2] https://github.com/openstack/ceilometer/blob/stable/wallaby/ceilometer/publisher/gnocchi.py#L216-L217
[3] https://github.com/openstack/ceilometer/blob/stable/wallaby/ceilometer/publisher/gnocchi.py#L252

Comment 9 Leonid Natapov 2023-05-09 16:30:44 UTC
Since we saw this issue happening again I am going to resurrect this bug . I will lower the priority and severity to medium since there is a clear work around for this issue and since it seems to happen only sometimes. Probably a rise condition ?

Comment 10 Lukas Svaty 2023-06-16 08:13:29 UTC
Bulk moving target milestone to GA after the release of Beta on 14th June '23.

Comment 11 Leif Madsen 2023-06-16 18:26:29 UTC
Shifting this to 17.1 z2 due to z1 being only for urgent bugs and this has missed beta and GA.

Comment 13 Yadnesh Kulkarni 2023-08-07 05:25:04 UTC
Root cause of this issue

~~~
2023-08-06 09:47:24.788 14 WARNING keystoneauth.identity.generic.base [-] Failed to discover available identity versions when contacting http://172.17.1.57:5000. Attempting to parse version from URL.: keystoneauth1.exceptions.connection.ConnectTimeout: Request to http://172.17.1.57:5000 timed out

2023-08-06 09:47:24.792 14 ERROR ceilometer.pipeline.base [-] Unable to load publisher gnocchi://?filter_project=service&archive_policy=ceilometer-high: keystoneauth1.exceptions.discovery.DiscoveryFailure: Could not find versioned identity endpoints when attempting to authenticate. Please check that your auth_url is correct. Request to http://172.17.1.57:5000 timed out
~~~

When notification service is initialized and keystone service is not available, ceilometer will not be able to fetch the endpoint for gnocchi and assumes the gnocchi publisher is invalid. If any ceilometer fails to load any publisher it will not send metrics to it. Which is why no metrics are found in gnocchi. Restarting notification service will reload all publishers which fixes this issue.