Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1819066

Summary: ceilometer-compute stop polling some metrics after Timeout in libvirt
Product: Red Hat OpenStack Reporter: Takashi Kajinami <tkajinam>
Component: openstack-ceilometerAssignee: Matthias Runge <mrunge>
Status: CLOSED ERRATA QA Contact: Leonid Natapov <lnatapov>
Severity: high Docs Contact:
Priority: medium    
Version: 13.0 (Queens)CC: apevec, jbadiapa, mgarciac, pkilambi
Target Milestone: z12Keywords: Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-ceilometer-10.0.1-10.el7ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-06-24 12:08:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Takashi Kajinami 2020-03-31 06:52:24 UTC
Description of problem:

ceilometer-compute is polling performance information like cpu or mem via libvirt .
However when it fails to get information from libvirt because of Timeout within libvirt,
it will stop polling some metrics with regarding the timeout as PollSterPermantError
and we need to restart it to make it again poll the disabled metrics

Because the timeout is a transient issue in most of cases, ceilometer-compute should
keep polling the metric even if it failed because of timeout.


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Matthias Runge 2020-03-31 12:55:05 UTC
Removing pollsters with permanent errors from the pollster list seems a sane decision to me.

Can you please provide a bit more details and logs?

Comment 4 Matthias Runge 2020-04-01 07:16:11 UTC
Proposed upstream patch: https://review.opendev.org/716513

Let's see how it goes.

The other question is, why these timeouts occur.

Comment 7 Matthias Runge 2020-04-09 14:02:01 UTC
Backport to train
https://review.opendev.org/718705

Comment 9 Matthias Runge 2020-05-07 14:16:17 UTC
Apparently under some situation, the libvirt call can timeout. Previously, the timeout was seen as permanent; with this patch, this should be treated as transient.

This scenario probably happens under heavy load, but I am not sure how to provoke this.

Comment 16 Leonid Natapov 2020-06-08 11:40:07 UTC
openstack-ceilometer-central-10.0.1-10.el7ost.noarch

Basic telemetry sanity was done.

Comment 19 errata-xmlrpc 2020-06-24 12:08:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2726