1868372 – collectd-virt plugin doesn't work with latest libvirt

Bug 1868372 - collectd-virt plugin doesn't work with latest libvirt

Summary: collectd-virt plugin doesn't work with latest libvirt

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	collectd
Sub Component:
Version:	4.3.10
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	ovirt-4.5.1
Target Release:	---
Assignee:	Aviv Litman
QA Contact:	Guilherme Santos
Docs Contact:
URL:
Whiteboard:
Depends On:	2015543
Blocks:
TreeView+	depends on / blocked

Reported:	2020-08-12 12:06 UTC by Gajanan
Modified:	2024-03-25 16:17 UTC (History)
CC List:	9 users (show)
Fixed In Version:	collectd-5.12.0-7
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-07-14 12:41:13 UTC
oVirt Team:	Metrics
Target Upstream Version:
Embargoed:
Flags:	gdeolive: testing_plan_complete+

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	oVirt ovirt-engine pull 176	None	Merged	collectd-virt plugin doesn't work with latest libvirt	2022-04-08 11:56:46 UTC
Github	oVirt ovirt-host pull 10	None	Merged	FIX: collectd-virt plugin doesn't work with latest libvirt	2022-04-08 11:56:45 UTC
Red Hat Knowledge Base (Solution)	5314071	None	None	None	2020-08-12 12:16:58 UTC
Red Hat Product Errata	RHBA-2022:5583	None	None	None	2022-07-14 12:41:29 UTC

Description Gajanan 2020-08-12 12:06:46 UTC

Description of problem:

Issue reported: 

I have an installation of RHEV Metrics Store that collects data from the engine and from a number of RHVH hosts.
On hosts where no VMs are running, collectd runs fine.
On hosts where at least 1 VM is running, collectd generates the following errors in journalctl -u collectd -f:
collectd[1027]: virt plugin: Array index out of bounds: tag_index = 10
collectd[1027]: virt plugin: Array index out of bounds: tag_index = 10
collectd[1027]: virt plugin: Array index out of bounds: tag_index = 10
...
indefinitely, which sends an enormous amount of data to Elasticsearch.

The problem seems to be described here: https://github.com/collectd/collectd/pull/2168

The Ansible script configure_ovirt_machines_for_metrics.sh to deploy collectd and rsyslog on the hosts runs fine without errors.
The configuration of collectd on hypervisors where the problem appears and on hypervisors where the problem doesn't appear is identical.
The rhvh node layer on all hosts is the same: rhvh-4.3.9.2-0.20200324.0+1

~~~
* collectd.service - Collectd statistics daemon
   Loaded: loaded (/usr/lib/systemd/system/collectd.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2020-08-04 16:31:18 CEST; 1 day 15h ago
     Docs: man:collectd(1)
           man:collectd.conf(5)
 Main PID: 24714 (collectd)
    Tasks: 11
   CGroup: /system.slice/collectd.service
           `-24714 /usr/sbin/collectd

Aug 06 08:21:29 <hostname> collectd[24714]: virt plugin: Array index out of bounds: tag_index = 10
Aug 06 08:21:29 <hostname> collectd[24714]: virt plugin: Array index out of bounds: tag_index = 10
Aug 06 08:21:39 <hostname> collectd[24714]: virt plugin: Array index out of bounds: tag_index = 10
Aug 06 08:21:39 <hostname> collectd[24714]: virt plugin: Array index out of bounds: tag_index = 10
Aug 06 08:21:49 <hostname> collectd[24714]: virt plugin: Array index out of bounds: tag_index = 10
Aug 06 08:21:49 <hostname> collectd[24714]: virt plugin: Array index out of bounds: tag_index = 10
Aug 06 08:21:59 <hostname> collectd[24714]: virt plugin: Array index out of bounds: tag_index = 10
Aug 06 08:21:59 <hostname> collectd[24714]: virt plugin: Array index out of bounds: tag_index = 10
Aug 06 08:22:09 <hostname> collectd[24714]: virt plugin: Array index out of bounds: tag_index = 10
Aug 06 08:22:09 <hostname> collectd[24714]: virt plugin: Array index out of bounds: tag_index = 10
~~~

Impact of the issue: 
The impact is that the openshift stack is overloaded because of the amount of messages coming in and so Kibana is unusable.
Right now collectd daemons are stopped on all hosts because it puts too much burden on the network and on the openshift node.

Comment 1 Shirly Radco 2020-08-12 12:28:23 UTC

Hi, what is the collectd version?

Comment 2 Gajanan 2020-08-12 15:15:21 UTC

(In reply to Shirly Radco from comment #1)
> Hi, what is the collectd version?

collectd-5.8.1-3.el7ost.x86_64

Comment 3 Sandro Bonazzola 2020-09-17 13:47:51 UTC

Hi, what is the libvirt version?

Comment 4 Chetan Nagarkar 2020-09-21 05:56:18 UTC

(In reply to Sandro Bonazzola from comment #3)
> Hi, what is the libvirt version?

Hello Sandro, 

libvirt-4.5.0-33.el7.x86_64

Comment 6 Guilherme Santos 2020-11-10 14:48:52 UTC

I have tried to reproduce on RHV 4.3.11 and I didn't face the issue:
# # journalctl -u collectd -f
-- Logs begin at Mon 2020-11-09 17:49:02 IST. --
Nov 10 15:59:30 metrics-ge-1-host-03.lab.eng.tlv2.redhat.com collectd[97406]: plugin_load: plugin "swap" successfully loaded.
Nov 10 15:59:30 metrics-ge-1-host-03.lab.eng.tlv2.redhat.com collectd[97406]: plugin_load: plugin "df" successfully loaded.
Nov 10 15:59:30 metrics-ge-1-host-03.lab.eng.tlv2.redhat.com collectd[97406]: plugin_load: plugin "aggregation" successfully loaded.
Nov 10 15:59:30 metrics-ge-1-host-03.lab.eng.tlv2.redhat.com collectd[97406]: plugin_load: plugin "processes" successfully loaded.
Nov 10 15:59:30 metrics-ge-1-host-03.lab.eng.tlv2.redhat.com collectd[97406]: plugin_load: plugin "write_syslog" successfully loaded.
Nov 10 15:59:30 metrics-ge-1-host-03.lab.eng.tlv2.redhat.com collectd[97406]: write_syslog plugin: Invalid configuration option: MessageFormat.
Nov 10 15:59:30 metrics-ge-1-host-03.lab.eng.tlv2.redhat.com collectd[97406]: Systemd detected, trying to signal readyness.
Nov 10 15:59:30 metrics-ge-1-host-03.lab.eng.tlv2.redhat.com systemd[1]: Started Collectd statistics daemon.
Nov 10 15:59:30 metrics-ge-1-host-03.lab.eng.tlv2.redhat.com collectd[97406]: virt plugin: reader virt-0 initialized
Nov 10 15:59:30 metrics-ge-1-host-03.lab.eng.tlv2.redhat.com collectd[97406]: Initialization complete, entering read-loop.

Host has two VMs running (the metrics-store-installer and the master0) and is running on collectd-5.8.1-3.el7ost.x86_64 and libvirt-4.5.0-36.el7_9.3.x86_64

Seems like the issue is not present in these versions

Comment 10 Martin Perina 2020-12-10 14:50:23 UTC

Feel free to reopen and provider requested information

Comment 13 rohit londhe 2021-10-19 03:24:21 UTC

Hi,

We are facing a problem on openstack compute node:

~~~
virt plugin: Array index out of bounds: tag_index = 11
virt plugin: Array index out of bounds: tag_index = 12
~~~

The problem is here.
https://github.com/collectd/collectd/blob/main/src/virt.c#L946

While libvirt keeps extending its API, collectd didn't catch up.

libvirt-daemon-6.0.0-25.5
collectd-virt-5.11.0-8

Do let us know if there is a need of more supportive information.

Comment 14 Martin Perina 2021-10-19 13:23:59 UTC

The customer case is not related to RHV, so I've created BZ2015543 to track the progreass on Openstack side, which is collectd maintained by. But RHV can benefit from updated collect-virt plugin

Comment 25 Guilherme Santos 2022-06-24 11:53:14 UTC

Verified on virt-engine-4.5.1.2-0.11.el8ev.noarch

metrics sanity and regression tests ran successfully with collectd-virt-5.12.0-7.2.el8ev.x86_64 installed.

Comment 29 errata-xmlrpc 2022-07-14 12:41:13 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHV RHEL Host (ovirt-host) [ovirt-4.5.1] update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:5583

Note You need to log in before you can comment on or make changes to this bug.