Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1868372

Summary: collectd-virt plugin doesn't work with latest libvirt
Product: Red Hat Enterprise Virtualization Manager Reporter: Gajanan <gchakkar>
Component: collectdAssignee: Aviv Litman <alitman>
Status: CLOSED ERRATA QA Contact: Guilherme Santos <gdeolive>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.3.10CC: cnagarka, emarcus, gdeolive, jortialc, lleistne, lsvaty, michal.skrivanek, mperina, rlondhe
Target Milestone: ovirt-4.5.1Keywords: Reopened
Target Release: ---Flags: gdeolive: testing_plan_complete+
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: collectd-5.12.0-7 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-07-14 12:41:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Metrics RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2015543    
Bug Blocks:    

Description Gajanan 2020-08-12 12:06:46 UTC
Description of problem:

Issue reported: 

I have an installation of RHEV Metrics Store that collects data from the engine and from a number of RHVH hosts.
On hosts where no VMs are running, collectd runs fine.
On hosts where at least 1 VM is running, collectd generates the following errors in journalctl -u collectd -f:
collectd[1027]: virt plugin: Array index out of bounds: tag_index = 10
collectd[1027]: virt plugin: Array index out of bounds: tag_index = 10
collectd[1027]: virt plugin: Array index out of bounds: tag_index = 10
...
indefinitely, which sends an enormous amount of data to Elasticsearch.

The problem seems to be described here: https://github.com/collectd/collectd/pull/2168

The Ansible script configure_ovirt_machines_for_metrics.sh to deploy collectd and rsyslog on the hosts runs fine without errors.
The configuration of collectd on hypervisors where the problem appears and on hypervisors where the problem doesn't appear is identical.
The rhvh node layer on all hosts is the same: rhvh-4.3.9.2-0.20200324.0+1

~~~
* collectd.service - Collectd statistics daemon
   Loaded: loaded (/usr/lib/systemd/system/collectd.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2020-08-04 16:31:18 CEST; 1 day 15h ago
     Docs: man:collectd(1)
           man:collectd.conf(5)
 Main PID: 24714 (collectd)
    Tasks: 11
   CGroup: /system.slice/collectd.service
           `-24714 /usr/sbin/collectd

Aug 06 08:21:29 <hostname> collectd[24714]: virt plugin: Array index out of bounds: tag_index = 10
Aug 06 08:21:29 <hostname> collectd[24714]: virt plugin: Array index out of bounds: tag_index = 10
Aug 06 08:21:39 <hostname> collectd[24714]: virt plugin: Array index out of bounds: tag_index = 10
Aug 06 08:21:39 <hostname> collectd[24714]: virt plugin: Array index out of bounds: tag_index = 10
Aug 06 08:21:49 <hostname> collectd[24714]: virt plugin: Array index out of bounds: tag_index = 10
Aug 06 08:21:49 <hostname> collectd[24714]: virt plugin: Array index out of bounds: tag_index = 10
Aug 06 08:21:59 <hostname> collectd[24714]: virt plugin: Array index out of bounds: tag_index = 10
Aug 06 08:21:59 <hostname> collectd[24714]: virt plugin: Array index out of bounds: tag_index = 10
Aug 06 08:22:09 <hostname> collectd[24714]: virt plugin: Array index out of bounds: tag_index = 10
Aug 06 08:22:09 <hostname> collectd[24714]: virt plugin: Array index out of bounds: tag_index = 10
~~~

Impact of the issue: 
The impact is that the openshift stack is overloaded because of the amount of messages coming in and so Kibana is unusable.
Right now collectd daemons are stopped on all hosts because it puts too much burden on the network and on the openshift node.

Comment 1 Shirly Radco 2020-08-12 12:28:23 UTC
Hi, what is the collectd version?

Comment 2 Gajanan 2020-08-12 15:15:21 UTC
(In reply to Shirly Radco from comment #1)
> Hi, what is the collectd version?

collectd-5.8.1-3.el7ost.x86_64

Comment 3 Sandro Bonazzola 2020-09-17 13:47:51 UTC
Hi, what is the libvirt version?

Comment 4 Chetan Nagarkar 2020-09-21 05:56:18 UTC
(In reply to Sandro Bonazzola from comment #3)
> Hi, what is the libvirt version?

Hello Sandro, 

libvirt-4.5.0-33.el7.x86_64

Comment 6 Guilherme Santos 2020-11-10 14:48:52 UTC
I have tried to reproduce on RHV 4.3.11 and I didn't face the issue:
# # journalctl -u collectd -f
-- Logs begin at Mon 2020-11-09 17:49:02 IST. --
Nov 10 15:59:30 metrics-ge-1-host-03.lab.eng.tlv2.redhat.com collectd[97406]: plugin_load: plugin "swap" successfully loaded.
Nov 10 15:59:30 metrics-ge-1-host-03.lab.eng.tlv2.redhat.com collectd[97406]: plugin_load: plugin "df" successfully loaded.
Nov 10 15:59:30 metrics-ge-1-host-03.lab.eng.tlv2.redhat.com collectd[97406]: plugin_load: plugin "aggregation" successfully loaded.
Nov 10 15:59:30 metrics-ge-1-host-03.lab.eng.tlv2.redhat.com collectd[97406]: plugin_load: plugin "processes" successfully loaded.
Nov 10 15:59:30 metrics-ge-1-host-03.lab.eng.tlv2.redhat.com collectd[97406]: plugin_load: plugin "write_syslog" successfully loaded.
Nov 10 15:59:30 metrics-ge-1-host-03.lab.eng.tlv2.redhat.com collectd[97406]: write_syslog plugin: Invalid configuration option: MessageFormat.
Nov 10 15:59:30 metrics-ge-1-host-03.lab.eng.tlv2.redhat.com collectd[97406]: Systemd detected, trying to signal readyness.
Nov 10 15:59:30 metrics-ge-1-host-03.lab.eng.tlv2.redhat.com systemd[1]: Started Collectd statistics daemon.
Nov 10 15:59:30 metrics-ge-1-host-03.lab.eng.tlv2.redhat.com collectd[97406]: virt plugin: reader virt-0 initialized
Nov 10 15:59:30 metrics-ge-1-host-03.lab.eng.tlv2.redhat.com collectd[97406]: Initialization complete, entering read-loop.

Host has two VMs running (the metrics-store-installer and the master0) and is running on collectd-5.8.1-3.el7ost.x86_64 and libvirt-4.5.0-36.el7_9.3.x86_64

Seems like the issue is not present in these versions

Comment 10 Martin Perina 2020-12-10 14:50:23 UTC
Feel free to reopen and provider requested information

Comment 13 rohit londhe 2021-10-19 03:24:21 UTC
Hi,

We are facing a problem on openstack compute node:

~~~
virt plugin: Array index out of bounds: tag_index = 11
virt plugin: Array index out of bounds: tag_index = 12
~~~

The problem is here.
https://github.com/collectd/collectd/blob/main/src/virt.c#L946

While libvirt keeps extending its API, collectd didn't catch up.

libvirt-daemon-6.0.0-25.5
collectd-virt-5.11.0-8

Do let us know if there is a need of more supportive information.

Comment 14 Martin Perina 2021-10-19 13:23:59 UTC
The customer case is not related to RHV, so I've created BZ2015543 to track the progreass on Openstack side, which is collectd maintained by. But RHV can benefit from updated collect-virt plugin

Comment 25 Guilherme Santos 2022-06-24 11:53:14 UTC
Verified on virt-engine-4.5.1.2-0.11.el8ev.noarch

metrics sanity and regression tests ran successfully with collectd-virt-5.12.0-7.2.el8ev.x86_64 installed.

Comment 29 errata-xmlrpc 2022-07-14 12:41:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHV RHEL Host (ovirt-host) [ovirt-4.5.1] update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:5583