Description of problem: Issue reported: I have an installation of RHEV Metrics Store that collects data from the engine and from a number of RHVH hosts. On hosts where no VMs are running, collectd runs fine. On hosts where at least 1 VM is running, collectd generates the following errors in journalctl -u collectd -f: collectd[1027]: virt plugin: Array index out of bounds: tag_index = 10 collectd[1027]: virt plugin: Array index out of bounds: tag_index = 10 collectd[1027]: virt plugin: Array index out of bounds: tag_index = 10 ... indefinitely, which sends an enormous amount of data to Elasticsearch. The problem seems to be described here: https://github.com/collectd/collectd/pull/2168 The Ansible script configure_ovirt_machines_for_metrics.sh to deploy collectd and rsyslog on the hosts runs fine without errors. The configuration of collectd on hypervisors where the problem appears and on hypervisors where the problem doesn't appear is identical. The rhvh node layer on all hosts is the same: rhvh-4.3.9.2-0.20200324.0+1 ~~~ * collectd.service - Collectd statistics daemon Loaded: loaded (/usr/lib/systemd/system/collectd.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2020-08-04 16:31:18 CEST; 1 day 15h ago Docs: man:collectd(1) man:collectd.conf(5) Main PID: 24714 (collectd) Tasks: 11 CGroup: /system.slice/collectd.service `-24714 /usr/sbin/collectd Aug 06 08:21:29 <hostname> collectd[24714]: virt plugin: Array index out of bounds: tag_index = 10 Aug 06 08:21:29 <hostname> collectd[24714]: virt plugin: Array index out of bounds: tag_index = 10 Aug 06 08:21:39 <hostname> collectd[24714]: virt plugin: Array index out of bounds: tag_index = 10 Aug 06 08:21:39 <hostname> collectd[24714]: virt plugin: Array index out of bounds: tag_index = 10 Aug 06 08:21:49 <hostname> collectd[24714]: virt plugin: Array index out of bounds: tag_index = 10 Aug 06 08:21:49 <hostname> collectd[24714]: virt plugin: Array index out of bounds: tag_index = 10 Aug 06 08:21:59 <hostname> collectd[24714]: virt plugin: Array index out of bounds: tag_index = 10 Aug 06 08:21:59 <hostname> collectd[24714]: virt plugin: Array index out of bounds: tag_index = 10 Aug 06 08:22:09 <hostname> collectd[24714]: virt plugin: Array index out of bounds: tag_index = 10 Aug 06 08:22:09 <hostname> collectd[24714]: virt plugin: Array index out of bounds: tag_index = 10 ~~~ Impact of the issue: The impact is that the openshift stack is overloaded because of the amount of messages coming in and so Kibana is unusable. Right now collectd daemons are stopped on all hosts because it puts too much burden on the network and on the openshift node.
Hi, what is the collectd version?
(In reply to Shirly Radco from comment #1) > Hi, what is the collectd version? collectd-5.8.1-3.el7ost.x86_64
Hi, what is the libvirt version?
(In reply to Sandro Bonazzola from comment #3) > Hi, what is the libvirt version? Hello Sandro, libvirt-4.5.0-33.el7.x86_64
I have tried to reproduce on RHV 4.3.11 and I didn't face the issue: # # journalctl -u collectd -f -- Logs begin at Mon 2020-11-09 17:49:02 IST. -- Nov 10 15:59:30 metrics-ge-1-host-03.lab.eng.tlv2.redhat.com collectd[97406]: plugin_load: plugin "swap" successfully loaded. Nov 10 15:59:30 metrics-ge-1-host-03.lab.eng.tlv2.redhat.com collectd[97406]: plugin_load: plugin "df" successfully loaded. Nov 10 15:59:30 metrics-ge-1-host-03.lab.eng.tlv2.redhat.com collectd[97406]: plugin_load: plugin "aggregation" successfully loaded. Nov 10 15:59:30 metrics-ge-1-host-03.lab.eng.tlv2.redhat.com collectd[97406]: plugin_load: plugin "processes" successfully loaded. Nov 10 15:59:30 metrics-ge-1-host-03.lab.eng.tlv2.redhat.com collectd[97406]: plugin_load: plugin "write_syslog" successfully loaded. Nov 10 15:59:30 metrics-ge-1-host-03.lab.eng.tlv2.redhat.com collectd[97406]: write_syslog plugin: Invalid configuration option: MessageFormat. Nov 10 15:59:30 metrics-ge-1-host-03.lab.eng.tlv2.redhat.com collectd[97406]: Systemd detected, trying to signal readyness. Nov 10 15:59:30 metrics-ge-1-host-03.lab.eng.tlv2.redhat.com systemd[1]: Started Collectd statistics daemon. Nov 10 15:59:30 metrics-ge-1-host-03.lab.eng.tlv2.redhat.com collectd[97406]: virt plugin: reader virt-0 initialized Nov 10 15:59:30 metrics-ge-1-host-03.lab.eng.tlv2.redhat.com collectd[97406]: Initialization complete, entering read-loop. Host has two VMs running (the metrics-store-installer and the master0) and is running on collectd-5.8.1-3.el7ost.x86_64 and libvirt-4.5.0-36.el7_9.3.x86_64 Seems like the issue is not present in these versions
Feel free to reopen and provider requested information
Hi, We are facing a problem on openstack compute node: ~~~ virt plugin: Array index out of bounds: tag_index = 11 virt plugin: Array index out of bounds: tag_index = 12 ~~~ The problem is here. https://github.com/collectd/collectd/blob/main/src/virt.c#L946 While libvirt keeps extending its API, collectd didn't catch up. libvirt-daemon-6.0.0-25.5 collectd-virt-5.11.0-8 Do let us know if there is a need of more supportive information.
The customer case is not related to RHV, so I've created BZ2015543 to track the progreass on Openstack side, which is collectd maintained by. But RHV can benefit from updated collect-virt plugin
Verified on virt-engine-4.5.1.2-0.11.el8ev.noarch metrics sanity and regression tests ran successfully with collectd-virt-5.12.0-7.2.el8ev.x86_64 installed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHV RHEL Host (ovirt-host) [ovirt-4.5.1] update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:5583