Bug 1983912

Summary: STF | Ceilometer metrics are not delivered to the STF server after STF server was down and came up again.
Product: Red Hat OpenStack Reporter: Leonid Natapov <lnatapov>
Component: openstack-ceilometerAssignee: Yadnesh Kulkarni <ykulkarn>
Status: NEW --- QA Contact: Leonid Natapov <lnatapov>
Severity: medium Docs Contact:
Priority: medium    
Version: 16.2 (Train)CC: akashavkin, apevec, csibbitt, lhh, lmadsen, mrunge, ykulkarn
Target Milestone: z2Keywords: Reopened, TestOnly, Triaged, ZStream
Target Release: 17.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-03-16 12:36:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Leonid Natapov 2021-07-20 07:11:14 UTC
STF | Ceilometer metrics are not delivered to the STF server after STF server was down and came up again.

Scenario:

1.OSP deployment with STF successfully sends ceilometer metrics to the STF server.
2.OCP cluster with STF server deployed on it goes down for several hours.
3.Ceilometer unable to send metrics because STF server is down and there is "time out" message in the ceilometer logs.
4.STF server comes back to life but ceilometer still unable to send metrics with the same time out message in the logs.

Only after manually restarting metrics_qdr container,ceilometer metrics started to get to the server side.

-----------------------------
021-07-19 17:20:47.797 16 ERROR oslo_messaging.notify.messaging [-] Could not send notification to osp162-metering. Payload={'message_id': '8f39945d-261b-4f95-9ec0-36e6e215ed52', 'publisher_id': 'telemetry.publisher.controller-1.redhat.l
ocal', 'event_type': 'metering', 'priority': 'SAMPLE', 'payload': [{'source': 'openstack', 'counter_name': 'disk.device.read.requests', 'counter_type': 'cumulative', 'counter_unit': 'request', 'counter_volume': 861, 'user_id': '0182fc3ff5
eb4bf79ce6fa4cf2c57e04', 'project_id': 'bcffa5d120ee4f1cad7ed883856735b0', 'resource_id': 'c666091a-1856-46d3-930e-fe6c07df43a0-vda', 'timestamp': '2021-07-19T17:20:16.942127', 'resource_metadata': {'display_name': 'workload_instance_1',
'name': 'instance-00000005', 'instance_id': 'c666091a-1856-46d3-930e-fe6c07df43a0', 'instance_type': 'workload_flavor_1', 'host': 'e18e558a7c9da2666a88a59abf307a2f6306cc4b2b878b58ecf5ce0f', 'instance_host': 'compute-1.redhat.local', 'flav
or': {'id': '42bb29f4-6de8-4352-9839-3593636a71ff', 'name': 'workload_flavor_1', 'vcpus': 1, 'ram': 512, 'disk': 5, 'ephemeral': 0, 'swap': 0}, 'status': 'active', 'state': 'running', 'task_state': '', 'image': {'id': '936668a5-1dad-4ce7-
a484-1be3d4acff0c'}, 'image_ref': '936668a5-1dad-4ce7-a484-1be3d4acff0c', 'image_ref_url': None, 'architecture': 'x86_64', 'os_type': 'hvm', 'vcpus': 1, 'memory_mb': 512, 'disk_gb': 5, 'ephemeral_gb': 0, 'root_gb': 5, 'disk_name': 'vda'},
 'message_id': '9612dd84-e8b5-11eb-8263-5254007a033a', 'monotonic_time': None, 'message_signature': '4cf94a7b9134baba0e06f5f6ac75bbe009762295fe6c5e2e1dce57a50c90e258'}], 'timestamp': '2021-07-19 17:20:17.061598'}: oslo_messaging.exception
s.MessageDeliveryFailure: Notify message sent to <Target topic=osp162-metering.sample> failed: timed out
2021-07-19 17:20:47.797 16 ERROR oslo_messaging.notify.messaging Traceback (most recent call last):
2021-07-19 17:20:47.797 16 ERROR oslo_messaging.notify.messaging   File "/usr/lib/python3.6/site-packages/oslo_messaging/notify/messaging.py", line 69, in notify
2021-07-19 17:20:47.797 16 ERROR oslo_messaging.notify.messaging     retry=retry)
2021-07-19 17:20:47.797 16 ERROR oslo_messaging.notify.messaging   File "/usr/lib/python3.6/site-packages/oslo_messaging/transport.py", line 136, in _send_notification
2021-07-19 17:20:47.797 16 ERROR oslo_messaging.notify.messaging     retry=retry)
2021-07-19 17:20:47.797 16 ERROR oslo_messaging.notify.messaging   File "/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/impl_amqp1.py", line 295, in wrap
2021-07-19 17:20:47.797 16 ERROR oslo_messaging.notify.messaging     return func(self, *args, **kws)
2021-07-19 17:20:47.797 16 ERROR oslo_messaging.notify.messaging   File "/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/impl_amqp1.py", line 397, in send_notification
2021-07-19 17:20:47.797 16 ERROR oslo_messaging.notify.messaging     raise rc
2021-07-19 17:20:47.797 16 ERROR oslo_messaging.notify.messaging oslo_messaging.exceptions.MessageDeliveryFailure: Notify message sent to <Target topic=osp162-metering.sample> failed: timed out
2021-07-19 17:20:47.797 16 ERROR oslo_messaging.notify.messaging
2021-07-19 17:20:47.797 16 WARNING oslo_messaging._drivers.amqp1_driver.controller [-] Notify message sent to <Target topic=osp162-metering.sample> failed: timed out
2021-07-19 17:20:47.797 16 ERROR oslo_messaging.notify.messaging [-] Could not send notification to osp162-metering. Payload={'message_id': 'f2841cd9-2b8e-430f-a76c-23f55ddc0571', 'publisher_id': 'telemetry.publisher.controller-1.redhat.local', 'event_type': 'metering', 'priority': 'SAMPLE', 'payload': [{'source': 'openstack', 'counter_name': 'network.incoming.packets', 'counter_type': 'cumulative', 'counter_unit': 'packet', 'counter_volume': 89, 'user_id': '0182fc3ff5eb4bf79ce6fa4cf2c57e04', 'project_id': 'bcffa5d120ee4f1cad7ed883856735b0', 'resource_id': 'instance-00000005-c666091a-1856-46d3-930e-fe6c07df43a0-tapb0001a7d-b8', 'timestamp': '2021-07-19T17:20:16.981129', 'resource_metadata': {'display_name': 'workload_instance_1', 'name': 'tapb0001a7d-b8', 'instance_id': 'c666091a-1856-46d3-930e-fe6c07df43a0', 'instance_type': 'workload_flavor_1', 'host': 'e18e558a7c9da2666a88a59abf307a2f6306cc4b2b878b58ecf5ce0f', 'instance_host': 'compute-1.redhat.local', 'flavor': {'id': '42bb29f4-6de8-4352-9839-3593636a71ff', 'name': 'workload_flavor_1', 'vcpus': 1, 'ram': 512, 'disk': 5, 'ephemeral': 0, 'swap': 0}, 'status': 'active', 'state': 'running', 'task_state': '', 'image': {'id': '936668a5-1dad-4ce7-a484-1be3d4acff0c'}, 'image_ref': '936668a5-1dad-4ce7-a484-1be3d4acff0c', 'image_ref_url': None, 'architecture': 'x86_64', 'os_type': 'hvm', 'vcpus': 1, 'memory_mb': 512, 'disk_gb': 5, 'ephemeral_gb': 0, 'root_gb': 5, 'mac': 'fa:16:3e:5c:01:b8', 'fref': None, 'parameters': {'interfaceid': 'b0001a7d-b859-4e06-8c7b-66d7dc2d55f6', 'bridge': 'br-int'}, 'vnic_name': 'tapb0001a7d-b8'}, 'message_id': '96180ab6-e8b5-11eb-8263-5254007a033a', 'monotonic_time': None, 'message_signature': '861cd13453fee69c45ecab15a3a4625aab8a07ea3531ac943b13b2663e793dfd'}], 'timestamp': '2021-07-19 17:20:17.100501'}: oslo_messaging.exceptions.MessageDeliveryFailure: Notify message sent to <Target topic=osp162-metering.sample> failed: timed out
2021-07-19 17:20:47.797 16 ERROR oslo_messaging.notify.messaging Traceback (most recent call last):
2021-07-19 17:20:47.797 16 ERROR oslo_messaging.notify.messaging   File "/usr/lib/python3.6/site-packages/oslo_messaging/notify/messaging.py", line 69, in notify
2021-07-19 17:20:47.797 16 ERROR oslo_messaging.notify.messaging     retry=retry)
2021-07-19 17:20:47.797 16 ERROR oslo_messaging.notify.messaging   File "/usr/lib/python3.6/site-packages/oslo_messaging/transport.py", line 136, in _send_notification
2021-07-19 17:20:47.797 16 ERROR oslo_messaging.notify.messaging     retry=retry)
2021-07-19 17:20:47.797 16 ERROR oslo_messaging.notify.messaging   File "/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/impl_amqp1.py", line 295, in wrap
2021-07-19 17:20:47.797 16 ERROR oslo_messaging.notify.messaging     return func(self, *args, **kws)
2021-07-19 17:20:47.797 16 ERROR oslo_messaging.notify.messaging   File "/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/impl_amqp1.py", line 397, in send_notification
2021-07-19 17:20:47.797 16 ERROR oslo_messaging.notify.messaging     raise rc
2021-07-19 17:20:47.797 16 ERROR oslo_messaging.notify.messaging oslo_messaging.exceptions.MessageDeliveryFailure: Notify message sent to <Target topic=osp162-metering.sample> failed: timed out
2021-07-19 17:20:47.797 16 ERROR oslo_messaging.notify.messaging
2021-07-19 17:20:47.799 16 WARNING oslo_messaging._drivers.amqp1_driver.controller [-] Notify message sent to <Target topic=osp162-metering.sample> failed: timed out
2021-07-19 17:20:47.799 16 ERROR oslo_messaging.notify.messaging [-] Could not send notification to osp162-metering. Payload={'message_id': '8fac7376-665d-45eb-94aa-9ad9a150b2b2', 'publisher_id': 'telemetry.publisher.controller-1.redhat.local', 'event_type': 'metering', 'priority': 'SAMPLE', 'payload': [{'source': 'openstack', 'counter_name': 'disk.device.read.bytes', 'counter_type': 'cumulative', 'counter_unit': 'B', 'counter_volume': 23407104, 'user_id': '0182fc3ff5eb4bf79ce6fa4cf2c57e04', 'project_id': 'bcffa5d120ee4f1cad7ed883856735b0', 'resource_id': 'be278c8b-b0eb-4c5d-bd00-fbc5f3305421-vda', 'timestamp': '2021-07-19T17:20:17.073441', 'resource_metadata': {'display_name': 'workload_instance_0', 'name': 'instance-00000002', 'instance_id': 'be278c8b-b0eb-4c5d-bd00-fbc5f3305421', 'instance_type': 'workload_flavor_0', 'host': 'fbf73fd808d0c19d9d9931c48e3fe04034ad6a659cb2364237bb62bb', 'instance_host': 'compute-0.redhat.local', 'flavor': {'id': 'f76ffd91-b230-48f8-b9c6-5cc97398dc96', 'name': 'workload_flavor_0', 'vcpus': 1, 'ram': 512, 'disk': 5, 'ephemeral': 0, 'swap': 0}, 'status': 'active', 'state': 'running', 'task_state': '', 'image': {'id': '6910f5ab-f613-4a25-b81f-1df38faa22b8'}, 'image_ref': '6910f5ab-f613-4a25-b81f-1df38faa22b8', 'image_ref_url': None, 'architecture': 'x86_64', 'os_type': 'hvm', 'vcpus': 1, 'memory_mb': 512, 'disk_gb': 5, 'ephemeral_gb': 0, 'root_gb': 5, 'disk_name': 'vda'}, 'message_id': '962761d2-e8b5-11eb-a6c7-5254005b0f0d', 'monotonic_time': None, 'message_signature': 'af66eb9857c9254280a99dbf6f5332a2dc19099e458978942dc678b5c8262a75'}, {'source': 'openstack', 'counter_name': 'disk.device.read.bytes', 'counter_type': 'cumulative', 'counter_unit': 'B', 'counter_volume': 23407104, 'user_id': '0182fc3ff5eb4bf79ce6fa4cf2c57e04', 'project_id': 'bcffa5d120ee4f1cad7ed883856735b0', 'resource_id': 'b7d0ad79-c92f-4ea8-a66d-02d000c2f28c-vda', 'timestamp': '2021-07-19T17:20:17.073441', 'resource_metadata': {'display_name': 'leonid', 'name': 'instance-00000008', 'instance_id': 'b7d0ad79-c92f-4ea8-a66d-02d000c2f28c', 'instance_type': 'workload_flavor_0', 'host': 'fbf73fd808d0c19d9d9931c48e3fe04034ad6a659cb2364237bb62bb', 'instance_host': 'compute-0.redhat.local', 'flavor': {'id': 'f76ffd91-b230-48f8-b9c6-5cc97398dc96', 'name': 'workload_flavor_0', 'vcpus': 1, 'ram': 512, 'disk': 5, 'ephemeral': 0, 'swap': 0}, 'status': 'active', 'state': 'running', 'task_state': '', 'image': {'id': '6910f5ab-f613-4a25-b81f-1df38faa22b8'}, 'image_ref': '6910f5ab-f613-4a25-b81f-1df38faa22b8', 'image_ref_url': None, 'architecture': 'x86_64', 'os_type': 'hvm', 'vcpus': 1, 'memory_mb': 512, 'disk_gb': 5, 'ephemeral_gb': 0, 'root_gb': 5, 'disk_name': 'vda'}, 'message_id': '96289eb2-e8b5-11eb-a6c7-5254005b0f0d', 'monotonic_time': None, 'message_signature': 'dbe9149ee8276d68f1be58762e09920796f0ae8823fb0d6f0baf579e56682266'}], 'timestamp': '2021-07-19 17:20:17.629814'}: oslo_messaging.exceptions.MessageDeliveryFailure: Notify message sent to <Target topic=osp162-metering.sample> failed: timed out
2021-07-19 17:20:47.799 16 ERROR oslo_messaging.notify.messaging Traceback (most recent call last):
2021-07-19 17:20:47.799 16 ERROR oslo_messaging.notify.messaging   File "/usr/lib/python3.6/site-packages/oslo_messaging/notify/messaging.py", line 69, in notify
2021-07-19 17:20:47.799 16 ERROR oslo_messaging.notify.messaging     retry=retry)

Comment 11 Leonid Natapov 2023-03-16 12:36:25 UTC
I am closing this issue as notabug since I've done some tests with OSP17.1 and STF 1.5. When I've restarted worker nodes on OCP cluster and after they came up I could see metrics flow from ceilometer and collectd. If described issue will be reproduced at some point I will file a bug.