2143235 – memory utilization of virtnodedevd.service is constantly growing

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2143235 - memory utilization of virtnodedevd.service is constantly growing

Summary: memory utilization of virtnodedevd.service is constantly growing

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 9
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	CentOS Stream
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Michal Privoznik
QA Contact:	yalzhang@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-11-16 12:21 UTC by Jaroslav Pulchart
Modified:	2023-05-09 08:09 UTC (History)
CC List:	13 users (show)
Fixed In Version:	libvirt-8.10.0-1.el9
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-05-09 07:27:15 UTC
Type:	Bug
Target Upstream Version:	8.10.0
Embargoed:
Dependent Products:
Flags:	pm-rhel: mirror+

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHELPLAN-139580	0	None	None	None	2022-11-16 12:34:02 UTC
Red Hat Product Errata	RHBA-2023:2171	0	None	None	None	2023-05-09 07:27:30 UTC

Description Jaroslav Pulchart 2022-11-16 12:21:45 UTC

Description of problem:

Memory utilization of virtnodedevd.service is constantly growing. After two weeks it uses 1.2GB of RAM, where initial amount (after fresh start) is just 16MB.

Version-Release number of selected component (if applicable):
libvirt-*-8.7.0-1.el9

How reproducible:

1/ virtnodedevd.service is just running for several days
2/ observe memory utilization grow


Steps to Reproduce:
1. Enable virtnodedevd service or socket
2. We use openstack-nova-compute which keep socket connection to virtnodedevd and as result it is newer stopped by timeout. However there is none communication between them.

Actual results:

Service memory utilization is growing, like "Memory: 1.2G" after 2weeks and 1day.

# systemctl status virtnodedevd.service
● virtnodedevd.service - Virtualization nodedev daemon
     Loaded: loaded (/usr/lib/systemd/system/virtnodedevd.service; disabled; vendor preset: disabled)
     Active: active (running) since Tue 2022-11-01 08:30:26 CET; 2 weeks 1 day ago
TriggeredBy: ● virtnodedevd-ro.socket
             ● virtnodedevd-admin.socket
             ● virtnodedevd.socket
       Docs: man:virtnodedevd(8)
             https://libvirt.org
   Main PID: 6223 (virtnodedevd)
      Tasks: 19 (limit: 127531)
     Memory: 1.2G
        CPU: 2h 35min 27.934s
     CGroup: /system.slice/virtnodedevd.service
             └─6223 /usr/sbin/virtnodedevd --timeout 120



Expected results:

Memory utilization is low and constant (like immediately after service restart):

# systemctl restart virtnodedevd.service
# systemctl status virtnodedevd.service
● virtnodedevd.service - Virtualization nodedev daemon
     Loaded: loaded (/usr/lib/systemd/system/virtnodedevd.service; disabled; vendor preset: disabled)
     Active: active (running) since Wed 2022-11-16 13:16:47 CET; 1s ago
TriggeredBy: ● virtnodedevd-admin.socket
             ● virtnodedevd-ro.socket
             ● virtnodedevd.socket
       Docs: man:virtnodedevd(8)
             https://libvirt.org
   Main PID: 3077832 (virtnodedevd)
      Tasks: 19 (limit: 127531)
     Memory: 15.9M
        CPU: 166ms
     CGroup: /system.slice/virtnodedevd.service
             └─3077832 /usr/sbin/virtnodedevd --timeout 120

Additional info:
n/a

Comment 1 Jaroslav Pulchart 2022-11-21 08:32:47 UTC

I set set MaxMem limit for the virtnodedevd.service service to 100MB. 

  cat /etc/systemd/system/virtnodedevd.service.d/service.conf
  [Service]
  MemoryMax=100M


my expectation was "the process will be OOM killed in approx 24h", however it balance at the 100MB memory usage instead without issue:

	# systemctl status virtnodedevd.service
	● virtnodedevd.service - Virtualization nodedev daemon
		 Loaded: loaded (/usr/lib/systemd/system/virtnodedevd.service; disabled; vendor preset: disabled)
		Drop-In: /etc/systemd/system/virtnodedevd.service.d
		         └─service.conf
		 Active: active (running) since Fri 2022-11-18 19:28:09 CET; 2 days ago
	TriggeredBy: ● virtnodedevd-admin.socket
		         ● virtnodedevd-ro.socket
		         ● virtnodedevd.socket
		   Docs: man:virtnodedevd(8)
		         https://libvirt.org
	   Main PID: 624476 (virtnodedevd)
		  Tasks: 19 (limit: 206089)
		 Memory: 93.2M (max: 100.0M available: 6.7M)
		    CPU: 41min 31.152s
		 CGroup: /system.slice/virtnodedevd.service
		         └─624476 /usr/sbin/virtnodedevd --timeout 20


	# systemctl status virtnodedevd.service | grep Memory:
		 Memory: 93.3M (max: 100.0M available: 6.6M)

	# systemctl status virtnodedevd.service | grep Memory:
		 Memory: 93.3M (max: 100.0M available: 6.6M)

	# systemctl status virtnodedevd.service | grep Memory:
		 Memory: 94.0M (max: 100.0M available: 5.9M)

	# systemctl status virtnodedevd.service | grep Memory:
		 Memory: 94.5M (max: 100.0M available: 5.4M)

	# systemctl status virtnodedevd.service | grep Memory:
		 Memory: 94.2M (max: 100.0M available: 5.7M)

	# systemctl status virtnodedevd.service | grep Memory:
		 Memory: 94.2M (max: 100.0M available: 5.7M)

	# systemctl status virtnodedevd.service | grep Memory:
		 Memory: 94.2M (max: 100.0M available: 5.7M)

	# systemctl status virtnodedevd.service | grep Memory:
		 Memory: 94.6M (max: 100.0M available: 5.3M)

	# systemctl status virtnodedevd.service | grep Memory:
		 Memory: 94.9M (max: 100.0M available: 5.0M)

	# systemctl status virtnodedevd.service | grep Memory:
		 Memory: 96.1M (max: 100.0M available: 3.8M)

	# systemctl status virtnodedevd.service | grep Memory:
		 Memory: 95.8M (max: 100.0M available: 4.1M)

	# systemctl status virtnodedevd.service | grep Memory:
		 Memory: 95.7M (max: 100.0M available: 4.2M)


anybody knows why the process consume all available memory (GBs) if no memory limit is set, however it is capable to work without any issue with low MaxMem limit and has balanced memory usage?

Comment 2 yalzhang@redhat.com 2022-11-24 07:35:41 UTC

Hi Jaroslav, Could you please help to collect some logs about it? for example, run "journalctl -u virtnodedevd" and check if there is any clue. 
And is there any vm running on the system? Or is there any heavy workload on this system? Since the virtnodedevd services is continously running for more than 2 weeks. The virtnodedevd service will timeout and be inactive if any related function is not called during 120s.

Comment 3 Jaroslav Pulchart 2022-11-24 13:11:10 UTC

> Could you please help to collect some logs about it?

Logs are "empty", we can see starting/stopping as I was doing service restart:

# journalctl -u virtnodedevd
Nov 16 15:41:36 cmp0096.na3.pcigdc.com systemd[1]: Starting Virtualization nodedev daemon...
Nov 16 15:41:36 cmp0096.na3.pcigdc.com systemd[1]: Started Virtualization nodedev daemon.
Nov 16 20:49:12 cmp0096.na3.pcigdc.com systemd[1]: Stopping Virtualization nodedev daemon...
Nov 16 20:49:12 cmp0096.na3.pcigdc.com systemd[1]: virtnodedevd.service: Deactivated successfully.
Nov 16 20:49:12 cmp0096.na3.pcigdc.com systemd[1]: Stopped Virtualization nodedev daemon.
Nov 16 20:49:12 cmp0096.na3.pcigdc.com systemd[1]: virtnodedevd.service: Consumed 1min 50.893s CPU time.
Nov 16 20:49:12 cmp0096.na3.pcigdc.com systemd[1]: Starting Virtualization nodedev daemon...
Nov 16 20:49:12 cmp0096.na3.pcigdc.com systemd[1]: Started Virtualization nodedev daemon.
Nov 16 20:51:12 cmp0096.na3.pcigdc.com systemd[1]: virtnodedevd.service: Deactivated successfully.
Nov 18 19:00:45 cmp0096.na3.pcigdc.com systemd[1]: Starting Virtualization nodedev daemon...
Nov 18 19:00:45 cmp0096.na3.pcigdc.com systemd[1]: Started Virtualization nodedev daemon.


> And is there any vm running on the system? Or is there any heavy workload on this system? 

The situation do not depends on VM deployment. It is observed on host which was empty (no VMs) and there is 0% utilization.

> Since the virtnodedevd services is continously running for more than 2 weeks. The virtnodedevd service will timeout and be inactive if any related function is not called during 120s.


It is not deactivated. The Openstack Nova Compute service is keeping connection into it. I try to lover the timeout to 20s to ensure it will be deactivated without any luck.

I added some extra debug logs into openstack nova and I saw that nova is periodically running function on libvirt's connection, approx each 30s, as "conn.listAllDevices()" but not only. See: https://github.com/openstack/nova/blob/stable/yoga/nova/virt/libvirt/host.py#L1520

Comment 4 yalzhang@redhat.com 2022-11-25 04:07:00 UTC

I have tested on libvirt-8.9.0-2.el9.x86_64 with below senarios:
1. Run "virsh nodedev-list" for 1000 times, and check the memory occupied by virtnodedevd service; The memory occupied increased from 13.9M to 24.0M after 8min;
2. Test memory leak by valgrind; 

Details:
1. Start the virtnodedevd and check the memory, it's 13.9M. Then run "virsh nodedev-list" for 1000 times, and check the memory occupied by virtnodedevd service again.
# cat test.sh
#!/bin/sh
systemctl start virtnodedevd
systemctl status virtnodedevd
i=0
while [ $i -ne 1000 ]
do
	virsh nodedev-list
        i=$(($i+1))
        echo "$i"
done
systemctl status virtnodedevd

# sh test.sh
......
● virtnodedevd.service - Virtualization nodedev daemon
     Loaded: loaded (/usr/lib/systemd/system/virtnodedevd.service; disabled; vendor preset: disabled)
     Active: active (running) since Thu 2022-11-24 22:36:12 EST; 7s ago
TriggeredBy: ● virtnodedevd-admin.socket
             ● virtnodedevd.socket
             ● virtnodedevd-ro.socket
       Docs: man:virtnodedevd(8)
             https://libvirt.org
   Main PID: 3547 (virtnodedevd)
      Tasks: 19 (limit: 407705)
     Memory: 13.9M
        CPU: 253ms
     CGroup: /system.slice/virtnodedevd.service
             └─3547 /usr/sbin/virtnodedevd --timeout 120

......after 1000 times run "virsh nodedev-list"......
● virtnodedevd.service - Virtualization nodedev daemon
     Loaded: loaded (/usr/lib/systemd/system/virtnodedevd.service; disabled; vendor preset: disabled)
     Active: active (running) since Thu 2022-11-24 22:52:22 EST; 8min ago
TriggeredBy: ● virtnodedevd-admin.socket
             ● virtnodedevd.socket
             ● virtnodedevd-ro.socket
       Docs: man:virtnodedevd(8)
             https://libvirt.org
   Main PID: 9631 (virtnodedevd)
      Tasks: 19 (limit: 407705)
     Memory: 24.0M
        CPU: 1min 2.165s
     CGroup: /system.slice/virtnodedevd.service
             └─9631 /usr/sbin/virtnodedevd --timeout 120

Nov 24 22:52:22 dell-per740xd-19.lab.eng.pek2.redhat.com systemd[1]: Starting Virtualization nodedev daemon...
Nov 24 22:52:22 dell-per740xd-19.lab.eng.pek2.redhat.com systemd[1]: Started Virtualization nodedev daemon.


Test with valgrind:
1. stop the virtnodedevd service and sockets:
# systemctl status virtnodedevd
○ virtnodedevd.service - Virtualization nodedev daemon
     Loaded: loaded (/usr/lib/systemd/system/virtnodedevd.service; disabled; vendor preset: disabled)
     Active: inactive (dead) since Thu 2022-11-24 23:03:03 EST; 16s ago
   Duration: 10min 40.890s
TriggeredBy: ○ virtnodedevd-admin.socket
             ○ virtnodedevd.socket
             ○ virtnodedevd-ro.socket
       Docs: man:virtnodedevd(8)
             https://libvirt.org
    Process: 9631 ExecStart=/usr/sbin/virtnodedevd $VIRTNODEDEVD_ARGS (code=exited, status=0/SUCCESS)
   Main PID: 9631 (code=exited, status=0/SUCCESS)
        CPU: 1min 2.371s

2. Run valgrind in one terminal:
# valgrind --leak-check=full  virtnodedevd
==15745== Memcheck, a memory error detector
==15745== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==15745== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==15745== Command: virtnodedevd
==15745== 

3. Run "nodedev-list" in another terminal:
# virsh nodedev-list

4.Check the info in first terminal, there is memory leak(full log attached):
==15745== LEAK SUMMARY:
==15745==    definitely lost: 384 bytes in 12 blocks
==15745==    indirectly lost: 4,563 bytes in 174 blocks
==15745==      possibly lost: 896 bytes in 2 blocks
==15745==    still reachable: 1,059,261 bytes in 13,765 blocks
==15745==         suppressed: 0 bytes in 0 blocks
==15745== Reachable blocks (those to which a pointer was found) are not shown.
==15745== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==15745== 
==15745== For lists of detected and suppressed errors, rerun with: -s
==15745== ERROR SUMMARY: 7 errors from 7 contexts (suppressed: 0 from 0)

Comment 6 Peter Krempa 2022-11-25 08:51:02 UTC

The only real leak from the attached log seems to be:

==15745== 4,684 (192 direct, 4,492 indirect) bytes in 8 blocks are definitely lost in loss record 2,395 of 2,403
==15745==    at 0x4849464: calloc (vg_replace_malloc.c:1328)
==15745==    by 0x4D39320: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.6800.4)
==15745==    by 0x4999762: virPCIVPDParse (virpcivpd.c:656)
==15745==    by 0x497A7F8: virPCIDeviceGetVPD (virpci.c:2691)
==15745==    by 0x4A23DB7: UnknownInlinedFun (node_device_conf.c:3084)
==15745==    by 0x4A23DB7: virNodeDeviceGetPCIDynamicCaps (node_device_conf.c:3117)
==15745==    by 0x1900CABF: UnknownInlinedFun (node_device_udev.c:415)
==15745==    by 0x1900CABF: UnknownInlinedFun (node_device_udev.c:1399)
==15745==    by 0x1900CABF: udevAddOneDevice (node_device_udev.c:1564)
==15745==    by 0x1900DFDD: UnknownInlinedFun (node_device_udev.c:1638)
==15745==    by 0x1900DFDD: UnknownInlinedFun (node_device_udev.c:1692)
==15745==    by 0x1900DFDD: nodeStateInitializeEnumerate (node_device_udev.c:2017)
==15745==    by 0x4991F08: virThreadHelper (virthread.c:256)
==15745==    by 0x5136801: start_thread (in /usr/lib64/libc.so.6)
==15745==    by 0x50D6313: clone (in /usr/lib64/libc.so.6)

Others are single-shot allocations via virOnce/pthread_once.

The leak itself (~4kiB) doesn't though explain the ~10MiB increase in consumed memory as accounted by systemd though.

Comment 7 Jaroslav Pulchart 2022-11-25 09:27:17 UTC

My guess is that the issue is not a memory leak.

The reason (from my point of view):

when I set MemoryMax:

  cat /etc/systemd/system/virtnodedevd.service.d/service.conf
  [Service]
  MemoryMax=100M

then it is using up-to MemoryMax RAM only, keeping it in balanced usage (freeing something, then allocating) in controlled way around MemoryMax size. That will not be possible in case of memory leak as process will not be capable to free memory and systemd will kill it when it uses MemoryMax asap. That is not observed. If I'm correct.

Comment 8 chhu 2022-11-30 02:05:10 UTC

Hi, Jaroslav

Which OpenStack are you using, it's not RedHat OpenStack 17.0, right ?

In RHOSP 17.0, it's using kolla, tripleo_nova_virtnodedevd.service starts the nova_virtnodedevd container,
kolla_start runs "command": "/usr/sbin/virtqemud --config /etc/libvirt/virtqemud.conf" to start the virtnodedevd

Comment 9 chhu 2022-11-30 02:08:22 UTC

Hi, Jaroslav

Which OpenStack are you using, it's not RedHat OpenStack 17.0, right ?

In RHOSP 17.0, it's using kolla, tripleo_nova_virtnodedevd.service starts the nova_virtnodedevd container,
kolla_start runs "command": "/usr/sbin/virtnodedevd --config /etc/libvirt/virtnodedevd.conf" to start the virtnodedevd

Comment 11 Jaroslav Pulchart 2022-11-30 06:46:20 UTC

I would like to avoid discussion about Openstack versions and distributions. This report is about libvirts's virtnodedevd memory utilization growing in case there is any kind of service using it for long period of time.

Comment 13 yalzhang@redhat.com 2022-11-30 10:13:57 UTC

Hi Peter, could you please help to check the attachment in comment 12? The file can be checked with massif-visualizer or ms_print. It's catched on one of my system. Just run "virsh nodedev-list" continously can reproduce the memory grow issue. Many thanks to Luyao's help to debug the issue and catch the log file. And Luyao said that something needs to be fixed in virPCIVPDResourceCustomUpsertValue from the log file.

Comment 14 Daniel Berrangé 2022-11-30 10:25:30 UTC

(In reply to yalzhang from comment #13)
> Hi Peter, could you please help to check the attachment in comment 12? The
> file can be checked with massif-visualizer or ms_print. It's catched on one
> of my system. Just run "virsh nodedev-list" continously can reproduce the
> memory grow issue. Many thanks to Luyao's help to debug the issue and catch
> the log file. And Luyao said that something needs to be fixed in
> virPCIVPDResourceCustomUpsertValue from the log file.

If it is VPD related that could explain why only some people see a leak - VPD information depends on the hardware present - certain NICs will have it IIRC.

Comment 15 yalzhang@redhat.com 2022-11-30 12:29:06 UTC

(In reply to Daniel Berrangé from comment #14)
> (In reply to yalzhang from comment #13)
> > Hi Peter, could you please help to check the attachment in comment 12? The
> > file can be checked with massif-visualizer or ms_print. It's catched on one
> > of my system. Just run "virsh nodedev-list" continously can reproduce the
> > memory grow issue. Many thanks to Luyao's help to debug the issue and catch
> > the log file. And Luyao said that something needs to be fixed in
> > virPCIVPDResourceCustomUpsertValue from the log file.
> 
> If it is VPD related that could explain why only some people see a leak -
> VPD information depends on the hardware present - certain NICs will have it
> IIRC.

Yes, I can not reproduce the "memory grow" issue on an old desktop(Checked just now, there is no VPD info for the NIC). But it can be reproduced on a beaker server with modern NICs.

Comment 16 Michal Privoznik 2022-11-30 13:23:14 UTC

I am able to reproduce on a machine with a VPD PCI device and got these stack traces:

==62886== 479 (24 direct, 455 indirect) bytes in 1 blocks are definitely lost in loss record 2,119 of 2,164
==62886==    at 0x486D0CC: calloc (vg_replace_malloc.c:1328)
==62886==    by 0x4E4B047: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.6800.4)
==62886==    by 0x49DA84B: virPCIVPDParse (virpcivpd.c:656)
==62886==    by 0x49AC5C3: virPCIDeviceGetVPD (virpci.c:2691)
==62886==    by 0x4A96F23: virNodeDeviceGetPCIVPDDynamicCap (node_device_conf.c:3081)
==62886==    by 0x4A97083: virNodeDeviceGetPCIDynamicCaps (node_device_conf.c:3114)
==62886==    by 0x4A95C6B: virNodeDeviceUpdateCaps (node_device_conf.c:2681)
==62886==    by 0xC1D887F: nodeDeviceGetXMLDesc (node_device_driver.c:355)
==62886==    by 0x4C52093: virNodeDeviceGetXMLDesc (libvirt-nodedev.c:287)
==62886==    by 0x154693: remoteDispatchNodeDeviceGetXMLDesc (remote_daemon_dispatch_stubs.h:15681)
==62886==    by 0x1545FB: remoteDispatchNodeDeviceGetXMLDescHelper (remote_daemon_dispatch_stubs.h:15658)
==62886==    by 0x4ACECC3: virNetServerProgramDispatchCall (virnetserverprogram.c:428)
==62886== 
==62886== 958 (48 direct, 910 indirect) bytes in 2 blocks are definitely lost in loss record 2,135 of 2,164
==62886==    at 0x486D0CC: calloc (vg_replace_malloc.c:1328)
==62886==    by 0x4E4B047: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.6800.4)
==62886==    by 0x49DA84B: virPCIVPDParse (virpcivpd.c:656)
==62886==    by 0x49AC5C3: virPCIDeviceGetVPD (virpci.c:2691)
==62886==    by 0x4A96F23: virNodeDeviceGetPCIVPDDynamicCap (node_device_conf.c:3081)
==62886==    by 0x4A97083: virNodeDeviceGetPCIDynamicCaps (node_device_conf.c:3114)
==62886==    by 0xC1DDBB7: udevProcessPCI (node_device_udev.c:415)
==62886==    by 0xC1E0463: udevGetDeviceDetails (node_device_udev.c:1399)
==62886==    by 0xC1E09BB: udevAddOneDevice (node_device_udev.c:1564)
==62886==    by 0xC1E0CA7: udevProcessDeviceListEntry (node_device_udev.c:1638)
==62886==    by 0xC1E0E47: udevEnumerateDevices (node_device_udev.c:1692)
==62886==    by 0xC1E17EB: nodeStateInitializeEnumerate (node_device_udev.c:2019)
==62886== 
==62886== 2,874 (144 direct, 2,730 indirect) bytes in 6 blocks are definitely lost in loss record 2,152 of 2,164
==62886==    at 0x486D0CC: calloc (vg_replace_malloc.c:1328)
==62886==    by 0x4E4B047: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.6800.4)
==62886==    by 0x49DA84B: virPCIVPDParse (virpcivpd.c:656)
==62886==    by 0x49AC5C3: virPCIDeviceGetVPD (virpci.c:2691)
==62886==    by 0x4A96F23: virNodeDeviceGetPCIVPDDynamicCap (node_device_conf.c:3081)
==62886==    by 0x4A97083: virNodeDeviceGetPCIDynamicCaps (node_device_conf.c:3114)
==62886==    by 0x4A95C6B: virNodeDeviceUpdateCaps (node_device_conf.c:2681)
==62886==    by 0x4A98EEB: virNodeDeviceObjMatch (virnodedeviceobj.c:877)
==62886==    by 0x4A9943F: virNodeDeviceObjListExportCallback (virnodedeviceobj.c:948)
==62886==    by 0x496F303: virHashForEach (virhash.c:367)
==62886==    by 0x4A9959B: virNodeDeviceObjListExport (virnodedeviceobj.c:982)

Comment 17 Michal Privoznik 2022-11-30 14:00:50 UTC

Alright, I have a fix for VPD problem. However, I'm not sure whether that's the one causing this bug. Jaroslav, could you confirm that 'virsh nodedev-list --cap vpd' prints something out? Alternatively, I can provide a build with my fix if you want to test that.

Comment 18 Jaroslav Pulchart 2022-11-30 14:35:57 UTC

Michal, the output of 'virsh nodedev-list --cap vpd' is:

pci_0000_41_00_0
pci_0000_41_00_1
pci_0000_63_00_0
pci_0000_63_00_1

Comment 19 Michal Privoznik 2022-11-30 15:24:51 UTC

Perfect, so that's very likely it. If you want to test my fix, I've made a scratch build here:

https://mprivozn.fedorapeople.org/rpms/nodedev/

Comment 20 Jaroslav Pulchart 2022-11-30 16:41:03 UTC

Thanks Michal,

I took your src package and build it in our koji, install it at one of the spare server, 

# rpm -qa | grep libvirt
python3-libvirt-8.7.0-1.el9.x86_64
libvirt-libs-8.9.0-3.el9_rc.8f4280bca8.x86_64
libvirt-daemon-8.9.0-3.el9_rc.8f4280bca8.x86_64
libvirt-daemon-driver-storage-core-8.9.0-3.el9_rc.8f4280bca8.x86_64
libvirt-daemon-driver-network-8.9.0-3.el9_rc.8f4280bca8.x86_64
libvirt-daemon-driver-nwfilter-8.9.0-3.el9_rc.8f4280bca8.x86_64
libvirt-daemon-config-nwfilter-8.9.0-3.el9_rc.8f4280bca8.x86_64
libvirt-daemon-config-network-8.9.0-3.el9_rc.8f4280bca8.x86_64
libvirt-daemon-driver-storage-disk-8.9.0-3.el9_rc.8f4280bca8.x86_64
libvirt-daemon-driver-storage-iscsi-8.9.0-3.el9_rc.8f4280bca8.x86_64
libvirt-daemon-driver-storage-logical-8.9.0-3.el9_rc.8f4280bca8.x86_64
libvirt-daemon-driver-storage-mpath-8.9.0-3.el9_rc.8f4280bca8.x86_64
libvirt-daemon-driver-storage-rbd-8.9.0-3.el9_rc.8f4280bca8.x86_64
libvirt-daemon-driver-storage-scsi-8.9.0-3.el9_rc.8f4280bca8.x86_64
libvirt-daemon-driver-storage-8.9.0-3.el9_rc.8f4280bca8.x86_64
libvirt-daemon-driver-interface-8.9.0-3.el9_rc.8f4280bca8.x86_64
libvirt-daemon-driver-nodedev-8.9.0-3.el9_rc.8f4280bca8.x86_64
libvirt-daemon-driver-qemu-8.9.0-3.el9_rc.8f4280bca8.x86_64
libvirt-daemon-driver-secret-8.9.0-3.el9_rc.8f4280bca8.x86_64
libvirt-client-8.9.0-3.el9_rc.8f4280bca8.x86_64
libvirt-8.9.0-3.el9_rc.8f4280bca8.x86_64


removed the MemoryMax systemd service limit and restart it. Current situation after 33 minutes of running:

# systemctl status virtnodedevd.service 
● virtnodedevd.service - Virtualization nodedev daemon
     Loaded: loaded (/usr/lib/systemd/system/virtnodedevd.service; disabled; vendor preset: disabled)
    Drop-In: /etc/systemd/system/virtnodedevd.service.d
             └─service.conf
     Active: active (running) since Wed 2022-11-30 17:02:39 CET; 33min ago
TriggeredBy: ● virtnodedevd-admin.socket
             ● virtnodedevd-ro.socket
             ● virtnodedevd.socket
       Docs: man:virtnodedevd(8)
             https://libvirt.org
   Main PID: 6031 (virtnodedevd)
      Tasks: 19 (limit: 206089)
     Memory: 21.8M
        CPU: 26.408s
     CGroup: /system.slice/virtnodedevd.service
             └─6031 /usr/sbin/virtnodedevd --timeout 120

so lets see tomorrow how it will looks.

Comment 21 yalzhang@redhat.com 2022-12-01 00:55:11 UTC

I have tried on one of my system with VPD NICs which can reproduce the issue, the scratch build works well. After 9 hours run "virsh nodedev-list"(more than 60000 times), the occupied memory is around 20M. 
# systemctl status virtnodedevd
● virtnodedevd.service - Virtualization nodedev daemon
     Loaded: loaded (/usr/lib/systemd/system/virtnodedevd.service; disabled; vendor preset: disabled)
     Active: active (running) since Wed 2022-11-30 10:49:49 EST; 9h ago
TriggeredBy: ● virtnodedevd.socket
             ● virtnodedevd-ro.socket
             ● virtnodedevd-admin.socket
       Docs: man:virtnodedevd(8)
             https://libvirt.org
   Main PID: 141029 (virtnodedevd)
      Tasks: 19 (limit: 407718)
     Memory: 19.4M
......

Comment 22 Jaroslav Pulchart 2022-12-01 06:28:56 UTC

So far so good. 14h of runing and we are oscillating around 22.5MB:

# systemctl status virtnodedevd.service 
● virtnodedevd.service - Virtualization nodedev daemon
     Loaded: loaded (/usr/lib/systemd/system/virtnodedevd.service; disabled; vendor preset: disabled)
    Drop-In: /etc/systemd/system/virtnodedevd.service.d
             └─service.conf
     Active: active (running) since Wed 2022-11-30 17:02:39 CET; 14h ago
TriggeredBy: ● virtnodedevd-admin.socket
             ● virtnodedevd-ro.socket
             ● virtnodedevd.socket
       Docs: man:virtnodedevd(8)
             https://libvirt.org
   Main PID: 6031 (virtnodedevd)
      Tasks: 19 (limit: 206089)
     Memory: 22.3M
        CPU: 10min 14.499s
     CGroup: /system.slice/virtnodedevd.service
             └─6031 /usr/sbin/virtnodedevd --timeout 120

Comment 23 Michal Privoznik 2022-12-01 07:42:43 UTC

Perfect! I've merged patch as:

commit 64d32118540aca3d42bc5ee21c8b780cafe04bfa
Author:     Michal Prívozník <mprivozn>
AuthorDate: Wed Nov 30 14:53:21 2022 +0100
Commit:     Michal Prívozník <mprivozn>
CommitDate: Thu Dec 1 08:38:01 2022 +0100

    node_device_conf: Avoid memleak in virNodeDeviceGetPCIVPDDynamicCap()
    
    The virNodeDeviceGetPCIVPDDynamicCap() function is called from
    virNodeDeviceGetPCIDynamicCaps() and therefore has to be a wee
    bit more clever about adding VPD capability. Namely, it has to
    remove the old one before adding a new one. This is how other
    functions called from virNodeDeviceGetPCIDynamicCaps() behave
    as well.
    
    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2143235
    Signed-off-by: Michal Privoznik <mprivozn>
    Reviewed-by: Peter Krempa <pkrempa>

v8.10.0-rc2-8-g64d3211854

Comment 24 yalzhang@redhat.com 2022-12-05 01:25:46 UTC

Test with libvirt-8.10.0-1.el9.x86_64, the issue is fixed.

Comment 28 Jaroslav Pulchart 2022-12-06 12:49:23 UTC

I can confirm the 8.10.0-1 is OK. The virtnodedevd.service consumes 18.2M of RAM after 17h of running (now grow).

Comment 31 yalzhang@redhat.com 2022-12-08 05:05:00 UTC

Move the bug to be verified based on above verification.

Comment 33 errata-xmlrpc 2023-05-09 07:27:15 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (libvirt bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2171

Note You need to log in before you can comment on or make changes to this bug.