Bug 2092088

Summary: [RHOSP 16.2] Ceilometer-agent-compute could be started on compute nodes before libvirt which will cause ceilometer to fail to collect virt mertrics
Product: Red Hat OpenStack Reporter: Leonid Natapov <lnatapov>
Component: openstack-tripleo-heat-templatesAssignee: Yadnesh Kulkarni <ykulkarn>
Status: CLOSED ERRATA QA Contact: Leonid Natapov <lnatapov>
Severity: high Docs Contact: Joanne O'Flynn <joflynn>
Priority: high    
Version: 16.2 (Train)CC: apevec, jamsmith, lmadsen, mburns, ramishra, ykulkarn
Target Milestone: z4Keywords: Triaged
Target Release: 16.2 (Train on RHEL 8.4)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-11.6.1-2.20221010235131.e0d438c.el8ost Doc Type: Bug Fix
Doc Text:
This update fixes a bug that prevented the ceilometer-agent-compute service from collecting libvirt-related metrics. + Previously, the libvirt service started after the ceilometer-agent-compute service, which resulted in "Permission denied" failures and loss of metrics data. Now the libvirt service starts before the ceilometer-agent-compute service and the service can properly collect metrics.
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-12-07 19:23:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2130078    

Description Leonid Natapov 2022-05-31 18:46:40 UTC
Ceilometer-agent-compute could be started on compute nodes before libvirt populated the corresponding directories under /var/run/libvirt which will cause ceilometer to fail to collect virt mertrics.

I observe this behavior on a recent OSP16.2 deployment where ceilometer fails to collect virt metrics for instances.

Error that I see in ceilometer compute log file:

2022-05-29 16:59:40.289 15 DEBUG ceilometer.compute.virt.libvirt.utils [-] Connecting to libvirt: qemu:///system new_libvirt_connection /usr/lib/python3.6/site-packages/ceilometer/compute/virt/libvirt/utils.py:87
2022-05-29 16:59:40.290 15 DEBUG ceilometer.polling.manager [-] Skip loading extension for perf.cache.misses: Failed to connect socket to '/var/run/libvirt/libvirt-sock-ro': Permission denied _catch_extension_load_error /usr/lib/python3.6/sit

Restarting ceilometer_agent_compute container solves the problem.

python3-ceilometer-13.1.3-2.20210802103828.20756c9.el8ost.noarch
openstack-ceilometer-common-13.1.3-2.20210802103828.20756c9.el8ost.noarch
openstack-ceilometer-polling-13.1.3-2.20210802103828.20756c9.el8ost.noarch
openstack-ceilometer-compute-13.1.3-2.20210802103828.20756c9.el8ost.noarch

Comment 13 Leonid Natapov 2022-10-31 15:51:03 UTC
[Unit]
Description=ceilometer_agent_compute container
After=paunch-container-shutdown.service
Wants=tripleo_nova_libvirt.service
After=tripleo_nova_libvirt.service


No permission denied errors.

Comment 19 errata-xmlrpc 2022-12-07 19:23:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.4), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:8794