Bug 2015543

Summary: collectd-virt plugin doesn't work with latest libvirt
Product: Red Hat OpenStack Reporter: Martin Perina <mperina>
Component: collectdAssignee: Emma Foley <efoley>
Status: CLOSED ERRATA QA Contact: Leonid Natapov <lnatapov>
Severity: high Docs Contact: Joanne O'Flynn <joflynn>
Priority: high    
Version: 16.1 (Train)CC: alisci, alitman, efoley, jbadiapa, lars, lmadsen, mmagr, mrunge, rbruzzon, rlondhe, ssigwald, vkoul
Target Milestone: z9Keywords: Triaged, ZStream
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: collectd-5.11.0-10.el8ost Doc Type: Bug Fix
Doc Text:
Libvirt was updated and as a result provides more metrics. Due to API changes, collectd was incompatible, resulting in a potential application crash. With this update, collectd was adjusted to provide hugepage usage via the virt plugin. The hugepages metric is exposed via the virt plugin and collectd was adjusted to match the API changes so that the application no longer crashes when pulling the virt metrics.
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-12-07 20:25:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2038881    
Bug Blocks: 1868372    

Description Martin Perina 2021-10-19 13:18:46 UTC
Creating this bug as a result of reopening https://bugzilla.redhat.com/show_bug.cgi?id=1868372#c13

> Hi,
> 
> We are facing a problem on openstack compute node:
> 
> ~~~
> virt plugin: Array index out of bounds: tag_index = 11
> virt plugin: Array index out of bounds: tag_index = 12
> ~~~
> 
> The problem is here.
> https://github.com/collectd/collectd/blob/main/src/virt.c#L946
> 
> While libvirt keeps extending its API, collectd didn't catch up.
> 
> libvirt-daemon-6.0.0-25.5
> collectd-virt-5.11.0-8
> 
> Do let us know if there is a need of more supportive information.

Comment 1 Matthias Runge 2021-10-19 19:02:50 UTC
Which OSP version is affected? ... how to reproduce?

Comment 2 Martin Perina 2021-10-20 06:24:05 UTC
(In reply to Matthias Runge from comment #1)
> Which OSP version is affected? ... how to reproduce?

I have no additional information, I've only created this bug because attached customer case has been added to already close RHV bug.
Rohit, as it's your case could you please reply?

This is interesting from RHV point of view, as always uses latest libvirt from AV for EL8 and we still support sending data from RHV hypervisors through collectd to external ElasticSearch

Comment 3 rohit londhe 2021-10-20 07:26:45 UTC
Hello Matthias,

As per customers update-

No specific steps to reproduce. It just happens after deployment.
This is the guide I follow to enable STF.
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html/service_telemetry_framework_1.3/index

It's OSP 16.1.


Here are some more details:

()[root@compute /]# virsh dommemstat instance-00000XYZ
actual 524288
swap_in 0
swap_out 0
major_fault 0
minor_fault 74765
unused 441644
available 489032
usable 459116
last_update 1634605069
disk_caches 19096
hugetlb_pgalloc 0
hugetlb_pgfail 0
rss 270968

https://github.com/collectd/collectd/blob/main/src/virt.c#L940
static const char *tags[] = {"swap_in",        "swap_out",   "major_fault",
                               "minor_fault",    "unused",     "available",
                               "actual_balloon", "rss",        "usable",
                               "last_update",    "disk_caches"};

Those 2 hugepage stats is not supported by collectd.

Comment 4 Matthias Runge 2021-10-20 07:54:51 UTC
Thank you, I found the api change for libvirt, see https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainMemoryStatTags
However, I don't have insight where this version is deployed to.
It looks like we need to fix 16.1+ (all upper versions).

That looks like the following quick and dirty (and also untested) patch should fix the issue

diff --git a/src/virt.c b/src/virt.c
@@ -937,10 +951,11 @@ static void memory_submit(virDomainPtr dom, gauge_t value) {
 
 static void memory_stats_submit(gauge_t value, virDomainPtr dom,
                                 int tag_index) {
-  static const char *tags[] = {"swap_in",        "swap_out",   "major_fault",
-                               "minor_fault",    "unused",     "available",
-                               "actual_balloon", "rss",        "usable",
-                               "last_update",    "disk_caches"};
+  static const char *tags[] = {
+      "swap_in",        "swap_out",    "major_fault",    "minor_fault",
+      "unused",         "available",   "actual_balloon", "rss",
+      "usable",         "last_update", "disk_caches",    "hugetlb_pgalloc",
+      "hugetlb_pgalloc"};

Comment 5 Matthias Runge 2021-10-20 13:28:18 UTC
in order to reproduce the issue, you'll need to add 
Extrastats "memory"
to the virt plugin configuration.

Comment 8 rohit londhe 2021-10-23 02:38:22 UTC
Hello,

"memory" is listed by default, no need to add it specifically.
https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/metrics/collectd-container-puppet.yaml#L473

Comment 16 Matthias Runge 2022-01-10 15:52:38 UTC
*** Bug 2038881 has been marked as a duplicate of this bug. ***

Comment 17 Matthias Runge 2022-01-17 07:04:30 UTC
moving this to modified, since the build is available.

Comment 19 Riccardo Bruzzone 2022-01-25 14:04:12 UTC
Hello,
Our Customer is asking for an update of the case associated to this BUG.
About this request, is there any vision when the fix planned will be available also in RHOSP 16.2 ?

Thank you so much in advance

Comment 24 Leonid Natapov 2022-11-01 11:35:49 UTC
No "Array index out of bounds" message in collectd.log on compute nodes. OSP16.1z9

Comment 30 errata-xmlrpc 2022-12-07 20:25:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.9 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:8795