Bug 1819066 - ceilometer-compute stop polling some metrics after Timeout in libvirt
Summary: ceilometer-compute stop polling some metrics after Timeout in libvirt
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ceilometer
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: z12
: 13.0 (Queens)
Assignee: Matthias Runge
QA Contact: Leonid Natapov
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-31 06:52 UTC by Takashi Kajinami
Modified: 2023-09-07 22:37 UTC (History)
4 users (show)

Fixed In Version: openstack-ceilometer-10.0.1-10.el7ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-06-24 12:08:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 716513 0 None MERGED Temporary failures should be treated as temporary. 2021-02-03 22:32:49 UTC
Red Hat Product Errata RHBA-2020:2726 0 None None None 2020-06-24 12:08:45 UTC

Description Takashi Kajinami 2020-03-31 06:52:24 UTC
Description of problem:

ceilometer-compute is polling performance information like cpu or mem via libvirt .
However when it fails to get information from libvirt because of Timeout within libvirt,
it will stop polling some metrics with regarding the timeout as PollSterPermantError
and we need to restart it to make it again poll the disabled metrics

Because the timeout is a transient issue in most of cases, ceilometer-compute should
keep polling the metric even if it failed because of timeout.


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Matthias Runge 2020-03-31 12:55:05 UTC
Removing pollsters with permanent errors from the pollster list seems a sane decision to me.

Can you please provide a bit more details and logs?

Comment 4 Matthias Runge 2020-04-01 07:16:11 UTC
Proposed upstream patch: https://review.opendev.org/716513

Let's see how it goes.

The other question is, why these timeouts occur.

Comment 7 Matthias Runge 2020-04-09 14:02:01 UTC
Backport to train
https://review.opendev.org/718705

Comment 9 Matthias Runge 2020-05-07 14:16:17 UTC
Apparently under some situation, the libvirt call can timeout. Previously, the timeout was seen as permanent; with this patch, this should be treated as transient.

This scenario probably happens under heavy load, but I am not sure how to provoke this.

Comment 16 Leonid Natapov 2020-06-08 11:40:07 UTC
openstack-ceilometer-central-10.0.1-10.el7ost.noarch

Basic telemetry sanity was done.

Comment 19 errata-xmlrpc 2020-06-24 12:08:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2726


Note You need to log in before you can comment on or make changes to this bug.