Description of problem: ceilometer-compute is polling performance information like cpu or mem via libvirt . However when it fails to get information from libvirt because of Timeout within libvirt, it will stop polling some metrics with regarding the timeout as PollSterPermantError and we need to restart it to make it again poll the disabled metrics Because the timeout is a transient issue in most of cases, ceilometer-compute should keep polling the metric even if it failed because of timeout. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Removing pollsters with permanent errors from the pollster list seems a sane decision to me. Can you please provide a bit more details and logs?
Proposed upstream patch: https://review.opendev.org/716513 Let's see how it goes. The other question is, why these timeouts occur.
Backport to train https://review.opendev.org/718705
Apparently under some situation, the libvirt call can timeout. Previously, the timeout was seen as permanent; with this patch, this should be treated as transient. This scenario probably happens under heavy load, but I am not sure how to provoke this.
openstack-ceilometer-central-10.0.1-10.el7ost.noarch Basic telemetry sanity was done.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2726