2165428 – [memory leak]libvirt hit memory leak when start service

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2165428 - [memory leak]libvirt hit memory leak when start service

Summary: [memory leak]libvirt hit memory leak when start service

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	8.8
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Michal Privoznik
QA Contact:	liang cong
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2144443
TreeView+	depends on / blocked

Reported:	2023-01-30 01:38 UTC by mhou
Modified:	2023-05-16 09:02 UTC (History)
CC List:	9 users (show)
Fixed In Version:	libvirt-8.0.0-15.module+el8.8.0+18023+bf5b754e
Doc Type:	Bug Fix
Doc Text:	Cause: When libvirtd is started via socket activation a memory leak could occur. This is because internally, a structure that holds socket FDs obtained from systemd was allocated, but due to a bug the structure wasn't freed. Consequence: memleak Fix: The bug was fixed and now there's no memleak. Result: no memleak
Clone Of:
Environment:
Last Closed:	2023-05-16 08:18:36 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:
Flags:	pm-rhel: mirror+

Attachments	(Terms of Use)
valgrind log (5.58 MB, text/plain) 2023-01-30 01:38 UTC, mhou	no flags	Details
valgrind of libvirt-8.0.0-15.el8_rc.98293d506d (5.84 MB, text/plain) 2023-01-31 02:50 UTC, mhou	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHELPLAN-146840	0	None	None	None	2023-01-30 01:41:28 UTC
Red Hat Product Errata	RHSA-2023:2757	0	None	None	None	2023-05-16 08:19:53 UTC

Description mhou 2023-01-30 01:38:20 UTC

Created attachment 1941034 [details]
valgrind log

Description of problem:
Use valgrind to detected memory leak when start libvirtd

Version-Release number of selected component (if applicable):
corresponding libvirt packages version:
libvirt-daemon-driver-storage-iscsi-direct-8.0.0-14.module+el8.8.0+17806+11a519fc.x86_64
libvirt-daemon-driver-nwfilter-8.0.0-14.module+el8.8.0+17806+11a519fc.x86_64
libvirt-daemon-driver-storage-core-8.0.0-14.module+el8.8.0+17806+11a519fc.x86_64
libvirt-daemon-driver-storage-rbd-8.0.0-14.module+el8.8.0+17806+11a519fc.x86_64
libvirt-daemon-driver-qemu-8.0.0-14.module+el8.8.0+17806+11a519fc.x86_64
libvirt-daemon-config-network-8.0.0-14.module+el8.8.0+17806+11a519fc.x86_64
libvirt-libs-8.0.0-14.module+el8.8.0+17806+11a519fc.x86_64
libvirt-daemon-8.0.0-14.module+el8.8.0+17806+11a519fc.x86_64
libvirt-daemon-driver-storage-gluster-8.0.0-14.module+el8.8.0+17806+11a519fc.x86_64
libvirt-daemon-driver-storage-mpath-8.0.0-14.module+el8.8.0+17806+11a519fc.x86_64
libvirt-daemon-driver-interface-8.0.0-14.module+el8.8.0+17806+11a519fc.x86_64
libvirt-glib-3.0.0-1.el8.x86_64
libvirt-client-8.0.0-14.module+el8.8.0+17806+11a519fc.x86_64
libvirt-daemon-driver-storage-disk-8.0.0-14.module+el8.8.0+17806+11a519fc.x86_64
libvirt-daemon-driver-storage-logical-8.0.0-14.module+el8.8.0+17806+11a519fc.x86_64
libvirt-daemon-driver-storage-8.0.0-14.module+el8.8.0+17806+11a519fc.x86_64
libvirt-daemon-driver-secret-8.0.0-14.module+el8.8.0+17806+11a519fc.x86_64
libvirt-daemon-kvm-8.0.0-14.module+el8.8.0+17806+11a519fc.x86_64
libvirt-8.0.0-14.module+el8.8.0+17806+11a519fc.x86_64
python3-libvirt-8.0.0-2.module+el8.8.0+16781+9f4724c2.x86_64
libvirt-daemon-driver-network-8.0.0-14.module+el8.8.0+17806+11a519fc.x86_64
libvirt-daemon-driver-storage-scsi-8.0.0-14.module+el8.8.0+17806+11a519fc.x86_64
libvirt-daemon-config-nwfilter-8.0.0-14.module+el8.8.0+17806+11a519fc.x86_64
libvirt-daemon-driver-storage-iscsi-8.0.0-14.module+el8.8.0+17806+11a519fc.x86_64
libvirt-daemon-driver-nodedev-8.0.0-14.module+el8.8.0+17806+11a519fc.x86_64

kernel version:4.18.0-452.el8.x86_64

How reproducible: 100%


Steps to Reproduce:
1. configure hugepage and cpu isolation as below:
[root@dell-per740-77 ~]# cat /proc/cmdline 
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-452.el8.x86_64 root=/dev/mapper/rhel_dell--per740--77-root ro pci=realloc resume=/dev/mapper/rhel_dell--per740--77-swap rd.lvm.lv=rhel_dell-per740-77/root rd.lvm.lv=rhel_dell-per740-77/swap console=ttyS0,115200n81 skew_tick=1 isolcpus=managed_irq,domain,2,26,3,27,4,28,5,29,6,30,7,31,8,32,9,33,10,34,11,35,12,36,13,37,14,38,15,39,16,40,17,41,18,42,19,43,20,44,21,45,22,46,23,47 intel_pstate=disable nosoftlockup tsc=reliable nohz=on nohz_full=2,26,3,27,4,28,5,29,6,30,7,31,8,32,9,33,10,34,11,35,12,36,13,37,14,38,15,39,16,40,17,41,18,42,19,43,20,44,21,45,22,46,23,47 rcu_nocbs=2,26,3,27,4,28,5,29,6,30,7,31,8,32,9,33,10,34,11,35,12,36,13,37,14,38,15,39,16,40,17,41,18,42,19,43,20,44,21,45,22,46,23,47 irqaffinity=0,1,24,25 crashkernel=1G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G default_hugepagesz=1G hugepagesz=1G hugepages=48 intel_iommu=on iommu=pt intel_idle.max_cstate=0 processor.max_cstate=0 intel_pstate=disable idle=poll rcu_nocb_poll
[root@dell-per740-77 ~]# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46
node 0 size: 31559 MB
node 0 free: 891 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47
node 1 size: 31184 MB
node 1 free: 5831 MB
node distances:
node   0   1 
  0:  10  21 
  1:  21  10 


2. reboot test server and wait it come back. add hugetlbfs to qemu.conf
echo "group = 'hugetlbfs'" >> /etc/libvirt/qemu.conf

3. enable valgrind on /etc/systemd/system/libvirtd.service
ExecStart=/usr/bin/valgrind --tool=memcheck --leak-check=full --show-leak-kinds=all --log-file=/var/log/valgrind/libvirtd.log /usr/sbin/libvirtd $LIBVIRTD_ARGS

4. make a log file and reload libvirtd services 
mkdir /var/log/valgrind/
systemctl daemon-reload
systemctl restart libvirtd

Actual results:
1. check memory leak on /var/log/valgrind/libvirtd.log. More log information, please check on attachment. 

Expected results:
1. no more error log on /var/log/valgrind/libvirtd.log

Additional info:
1. this issue found on performance test when run rhel8.8 test. 
Whether it is the RT kernel or the stock kernel, it will cause a call trace about memory during the test.

stock kernel(4.18.0-452.el8.x86_64): https://beaker.engineering.redhat.com/jobs/7472403
call trace: https://beaker-archive.hosts.prod.psi.bos.redhat.com/beaker-logs/2023/01/74724/7472403/13295330/console.log

RT kernel(kernel-rt-4.18.0-451.rt7.237.el8)
https://beaker.engineering.redhat.com/jobs/7455177
https://beaker-archive.hosts.prod.psi.bos.redhat.com/beaker-logs/2023/01/74551/7455177/13270299/console.log

Comment 2 Michal Privoznik 2023-01-30 11:46:22 UTC

I believe this issue was fixed by the following upstream commit:

commit 8a1915c4d6c33669dcb390d0708cb6e5d651770d
Author:     Peng Liang <liangpeng10>
AuthorDate: Wed Mar 2 17:22:05 2022 +0800
Commit:     Michal Prívozník <mprivozn>
CommitDate: Fri Mar 4 10:53:03 2022 +0100

    rpc: Fix memory leak of fds
    
    In virSystemdActivationClaimFDs, the memory of ent->fds has been stolen
    and stored in fds, but fds is never freed, which causes a memory leak.
    Fix it by declaring fds as g_autofree.
    
    Reported-by: Jie Tang <tangjie18>
    Signed-off-by: Peng Liang <liangpeng10>
    Reviewed-by: Michal Privoznik <mprivozn>

v8.2.0-rc1~203

Let me create a scratch build with that patch on top.

Comment 5 mhou 2023-01-31 02:18:47 UTC

This issue can be reproduced without any kernel commandline. Here is simply test step.

libvirt version: libvirt-8.0.0-14.module+el8.8.0+17806+11a519fc.x86_64
kernel version: 4.18.0-453.rt7.239.el8.x86_64/ 4.18.0-452.el8.x86_64

kernel commandline:
# cat /proc/cmdline 
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-453.rt7.239.el8.x86_64 root=/dev/mapper/rhel_dell--per740--77-root ro resume=/dev/mapper/rhel_dell--per740--77-swap rd.lvm.lv=rhel_dell-per740-77/root rd.lvm.lv=rhel_dell-per740-77/swap console=ttyS0,115200n81 crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M

1. add valgrind to libvirt systemd
vim /usr/lib/systemd/system/libvirtd.service
ExecStart=/usr/bin/valgrind --tool=memcheck --leak-check=full --show-leak-kinds=all --log-file=/var/log/valgrind/libvirtd.log /usr/sbin/libvirtd $LIBVIRTD_ARGS

mkdir /var/log/valgrind/
systemctl daemon-reload
systemctl restart libvirtd

2. check memory error on /var/log/valgrind/libvirtd.log
==3735== Memcheck, a memory error detector
==3735== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==3735== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==3735== Command: /usr/sbin/libvirtd --timeout 120
==3735== Parent PID: 1
==3735==
==3824==
==3824== HEAP SUMMARY:
==3824==     in use at exit: 2,797,365 bytes in 15,994 blocks
==3824==   total heap usage: 56,421 allocs, 40,427 frees, 11,310,246 bytes allocated
==3824==
.......
==3735== LEAK SUMMARY:
==3735==    definitely lost: 12 bytes in 3 blocks
==3735==    indirectly lost: 0 bytes in 0 blocks
==3735==      possibly lost: 3,040 bytes in 26 blocks
==3735==    still reachable: 1,256,142 bytes in 13,720 blocks
==3735==                       of which reachable via heuristic:
==3735==                         length64           : 1,464 bytes in 30 blocks
==3735==                         newarray           : 1,824 bytes in 34 blocks
==3735==         suppressed: 0 bytes in 0 blocks
==3735== 
==3735== For lists of detected and suppressed errors, rerun with: -s
==3735== ERROR SUMMARY: 27 errors from 27 contexts (suppressed: 0 from 0)

Comment 6 mhou 2023-01-31 02:45:08 UTC

This issue still can be reproduced on libvirt-8.0.0-15.el8_rc.98293d506d.x86_64. I upload the full of valgrind log on attachment.

Test kernel: 4.18.0-452.el8.x86_64

Here is libvirtd.service. 
[root@dell-per740-60 ~]# cat /usr/lib/systemd/system/libvirtd.service 
[Unit]
Description=Virtualization daemon
Requires=virtlogd.socket
Requires=virtlockd.socket
# Use Wants instead of Requires so that users
# can disable these three .socket units to revert
# to a traditional non-activation deployment setup
Wants=libvirtd.socket
Wants=libvirtd-ro.socket
Wants=libvirtd-admin.socket
Wants=systemd-machined.service
Before=libvirt-guests.service
After=network.target
After=firewalld.service
After=iptables.service
After=ip6tables.service
After=dbus.service
After=iscsid.service
After=apparmor.service
After=local-fs.target
After=remote-fs.target
After=systemd-logind.service
After=systemd-machined.service
After=xencommons.service
Conflicts=xendomains.service
Documentation=man:libvirtd(8)
Documentation=https://libvirt.org

[Service]
Type=notify
EnvironmentFile=-/etc/sysconfig/libvirtd
#ExecStart=/usr/sbin/libvirtd $LIBVIRTD_ARGS
ExecStart=/usr/bin/valgrind --tool=memcheck --leak-check=full --show-leak-kinds=all --log-file=/var/log/valgrind/libvirtd.log /usr/sbin/libvirtd $LIBVIRTD_ARGS
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
# At least 1 FD per guest, often 2 (eg qemu monitor + qemu agent).
# eg if we want to support 4096 guests, we'll typically need 8192 FDs
# If changing this, also consider virtlogd.service & virtlockd.service
# limits which are also related to number of guests
LimitNOFILE=8192
# The cgroups pids controller can limit the number of tasks started by
# the daemon, which can limit the number of domains for some hypervisors.
# A conservative default of 8 tasks per guest results in a TasksMax of
# 32k to support 4096 guests.
TasksMax=32768
# With cgroups v2 there is no devices controller anymore, we have to use
# eBPF to control access to devices.  In order to do that we create a eBPF
# hash MAP which locks memory.  The default map size for 64 devices together
# with program takes 12k per guest.  After rounding up we will get 64M to
# support 4096 guests.
LimitMEMLOCK=64M

[Install]
WantedBy=multi-user.target
Also=virtlockd.socket
Also=virtlogd.socket
Also=libvirtd.socket
Also=libvirtd-ro.socket

Comment 7 mhou 2023-01-31 02:50:10 UTC

Created attachment 1941210 [details]
valgrind of libvirt-8.0.0-15.el8_rc.98293d506d

Comment 8 Michal Privoznik 2023-01-31 08:43:37 UTC

(In reply to mhou from comment #7)
> Created attachment 1941210 [details]
> valgrind of libvirt-8.0.0-15.el8_rc.98293d506d

From the attachment:

==10089== Memcheck, a memory error detector
==10089== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==10089== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==10089== Command: /usr/sbin/libvirtd --timeout 120
==10089== Parent PID: 1
==10089== 
...
==10162== 16,384 bytes in 1 blocks are definitely lost in loss record 3,531 of 3,553
==10162==    at 0x4C3CE4B: calloc (vg_replace_malloc.c:1328)
==10162==    by 0x592463D: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.5600.4)
==10162==    by 0x4F659D2: virProcessNamespaceAvailable (virprocess.c:1498)
==10162==    by 0x2698B48D: qemuDomainNamespaceAvailable (qemu_namespace.c:876)
==10162==    by 0x268EA127: virQEMUDriverConfigNew (qemu_conf.c:289)
==10162==    by 0x2690C0F3: qemuStateInitialize (qemu_driver.c:608)
==10162==    by 0x5130D4E: virStateInitialize (libvirt.c:658)
==10162==    by 0x5130D4E: virStateInitialize (libvirt.c:640)
==10162==    by 0x131B93: daemonRunStateInit (remote_daemon.c:616)
==10162==    by 0x4F788CA: virThreadHelper (virthread.c:241)
==10162==    by 0x8EA41C9: start_thread (in /usr/lib64/libpthread-2.28.so)
==10162==    by 0x666DE72: clone (in /usr/lib64/libc-2.28.so)
...
==10162== LEAK SUMMARY:
==10162==    definitely lost: 16,384 bytes in 1 blocks
==10162==    indirectly lost: 0 bytes in 0 blocks
==10162==      possibly lost: 13,880 bytes in 80 blocks
==10162==    still reachable: 2,745,458 bytes in 15,476 blocks
==10162==                       of which reachable via heuristic:
==10162==                         length64           : 1,464 bytes in 30 blocks
==10162==                         newarray           : 1,824 bytes in 34 blocks
==10162==         suppressed: 0 bytes in 0 blocks
==10162== 
==10162== For lists of detected and suppressed errors, rerun with: -s
==10162== ERROR SUMMARY: 57 errors from 57 contexts (suppressed: 0 from 0)
...
==10089== LEAK SUMMARY:
==10089==    definitely lost: 0 bytes in 0 blocks
==10089==    indirectly lost: 0 bytes in 0 blocks
==10089==      possibly lost: 3,040 bytes in 26 blocks
==10089==    still reachable: 1,262,961 bytes in 13,856 blocks
==10089==                       of which reachable via heuristic:
==10089==                         length64           : 1,464 bytes in 30 blocks
==10089==                         newarray           : 1,824 bytes in 34 blocks
==10089==         suppressed: 0 bytes in 0 blocks
==10089== 
==10089== For lists of detected and suppressed errors, rerun with: -s
==10089== ERROR SUMMARY: 26 errors from 26 contexts (suppressed: 0 from 0)


Let me explain what's happening here. The libvirtd process was started (with PID 10089), then it forked, creating a child (PID 10162). In that child a mem leak occurred. But there's no memleak in the parent. Memleaks in a child process are okay (as long as the child is not a long running process, like QEMU). In fact, valgrind acknowledges this fact by having --child-silent-after-fork= cmd line argument. And looking at the valgrind command line from comment 6 I don't see it passed. Therefore, I think the proper valgrind cmd line should look like this:

  ExecStart=/usr/bin/valgrind --tool=memcheck --child-silent-after-fork=yes --leak-check=full --show-leak-kinds=all --log-file=/var/log/valgrind/libvirtd.log /usr/sbin/libvirtd $LIBVIRTD_ARGS

Anyway, there's no memleak occurring in the main process (10089). There are some "possibly lost" segments, but there's almost nothing we can do about them as they come from libraries that libvirt links with (e.g. libpciaccess) - they often allocate memory on first API call and keep it for the sake of subsequent API calls.

IOW, as long as there's:

==10089== LEAK SUMMARY:
==10089==    definitely lost: 0 bytes in 0 block

no memleak was detected. If you want to suppres those 'still reachable' lines, then you'd need to tweak --show-leak-kinds= argument. For instance instead of --show-leak-kinds=all, use --show-leak-kinds=definite.

Comment 9 Michal Privoznik 2023-01-31 08:48:58 UTC

To POST:

https://gitlab.com/redhat/rhel/src/libvirt/-/merge_requests/81

Comment 12 mhou 2023-01-31 09:16:10 UTC

Hello Michal

After follow your suggestions, I was tuning parameter of valgrind as below.
# cat /etc/systemd/system/libvirtd.service.d/override.conf 
[Service]
ExecStart="/usr/bin/valgrind --tool=memcheck --child-silent-after-fork=yes --leak-check=full --show-leak-kinds=definite --log-file=/var/log/valgrind/libvirtd.log /usr/sbin/libvirtd $LIBVIRTD_ARGS"

# tail -f /var/log/valgrind/libvirtd.log 

=12511== Memcheck, a memory error detector
==12511== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==12511== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==12511== Command: /usr/sbin/libvirtd --timeout 120
==12511== Parent PID: 1
==12511== 

test libvirt version:
libvirt-8.0.0-15.el8_rc.98293d506d.x86_64

Comment 13 liang cong 2023-02-01 02:32:33 UTC

Tested on scratch build: libvirt-8.0.0-15.el8_rc.98293d506d.x86_64

Test steps:
1. Edit libvirtd start cmd as comment 8 suggested like below:
# cat /usr/lib/systemd/system/libvirtd.service | grep ExecStart
ExecStart=/usr/bin/valgrind --tool=memcheck --child-silent-after-fork=yes --leak-check=full --show-leak-kinds=definite --log-file=/var/log/valgrind/libvirtd.log /usr/sbin/libvirtd $LIBVIRTD_ARGS

2. Restart service
# systemctl daemon-reload
# systemctl restart libvirtd

3. Start a vm then I got the below error:
# virsh start vm1
error: Failed to start domain 'vm1'
error: can't connect to virtlogd: Cannot write data: Broken pipe

4. Then wait for about 5 minutes, I got below log in /var/log/valgrind/libvirtd.log, no "definitely lost" in log.

==56808== Memcheck, a memory error detector
==56808== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==56808== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==56808== Command: /usr/sbin/libvirtd --timeout 120
==56808== Parent PID: 1
==56808== 
==56808== Warning: noted but unhandled ioctl 0xaea3 with no size/direction hints.
==56808==    This could cause spurious value errors to appear.
==56808==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==56808== 
==56808== HEAP SUMMARY:
==56808==     in use at exit: 1,332,708 bytes in 15,076 blocks
==56808==   total heap usage: 629,677 allocs, 614,601 frees, 1,578,755,870 bytes allocated
==56808== 
==56808== LEAK SUMMARY:
==56808==    definitely lost: 0 bytes in 0 blocks
==56808==    indirectly lost: 0 bytes in 0 blocks
==56808==      possibly lost: 3,056 bytes in 26 blocks
==56808==    still reachable: 1,314,068 bytes in 14,940 blocks
==56808==                       of which reachable via heuristic:
==56808==                         length64           : 1,464 bytes in 30 blocks
==56808==                         newarray           : 1,824 bytes in 34 blocks
==56808==         suppressed: 0 bytes in 0 blocks
==56808== Reachable blocks (those to which a pointer was found) are not shown.
==56808== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==56808== 
==56808== For lists of detected and suppressed errors, rerun with: -s
==56808== ERROR SUMMARY: 26 errors from 26 contexts (suppressed: 0 from 0)


Hi Michal,

For the error I met at step3, if I use default service start cmd "/usr/sbin/libvirtd $LIBVIRTD_ARGS" in /usr/lib/systemd/system/libvirtd.service, that would be ok, so it is caused by valgrind cmd, do you know the solution that could make libvirt working as usual with valgrind?
And for the test above, the result means there is no memory leak in libvirtd parent process, right?

Comment 14 Michal Privoznik 2023-02-01 07:35:33 UTC

(In reply to liang cong from comment #13)
> Hi Michal,
> 
> For the error I met at step3, if I use default service start cmd
> "/usr/sbin/libvirtd $LIBVIRTD_ARGS" in
> /usr/lib/systemd/system/libvirtd.service, that would be ok, so it is caused
> by valgrind cmd, do you know the solution that could make libvirt working as
> usual with valgrind?

I suspect this is because of selinux. I think that when libvirtd is ran under valgrind then the process gets a different label than when run without it (maybe libvirtd under valgrind runs as unconfined?). The solution might be to use runcon:

ExecStart=runcon system_u:system_r:virtd_t:s0-s0:c0.c1023 valgrind ... libvirtd ...

> And for the test above, the result means there is no memory leak in libvirtd
> parent process, right?

Right. "definitely lost: 0 bytes in 0 blocks" means no memleak.

Comment 17 liang cong 2023-02-01 09:24:38 UTC

Hi michal, 
Thx for your info, I followed your suggestion, but when I restart the libvirtd service then I got this error:
# systemctl restart libvirtd
Job for libvirtd.service failed because the control process exited with error code.
See "systemctl status libvirtd.service" and "journalctl -xe" for details.

I use cmd "journalctl -xe" for details, I see there are some error info like below:
SELinux is preventing runcon from entrypoint access on the file /usr/bin/valgrind.
                                                                                
 *****  Plugin catchall (100. confidence) suggests   **************************
                                                                                
If you believe that runcon should be allowed entrypoint access on the valgrind file by default.
Then you should report this as a bug.
You can generate a local policy module to allow this access.
Do
allow this access for now by executing:
# ausearch -c 'runcon' --raw | audit2allow -M my-runcon
# semodule -X 300 -i my-runcon.pp



I follow the instruction to use command:
# ausearch -c 'runcon' --raw | audit2allow -M my-runcon
# semodule -X 300 -i my-runcon.pp

Then systemctl restart libvirtd succeeded.

After that, I did a basic lifecycle test for guest which including: start, reboot, suspend, resume, save, restore, managedsave, destroy. I got the definitely lost after I finish all the test as below log:

==60686== Memcheck, a memory error detector
==60686== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==60686== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==60686== Command: /usr/sbin/libvirtd --timeout 120
==60686== Parent PID: 1
==60686== 
==60686== Warning: noted but unhandled ioctl 0xaea3 with no size/direction hints.
==60686==    This could cause spurious value errors to appear.
==60686==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==60686== Warning: noted but unhandled ioctl 0x89a2 with no size/direction hints.
==60686==    This could cause spurious value errors to appear.
==60686==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==60686== 
==60686== HEAP SUMMARY:
==60686==     in use at exit: 1,347,319 bytes in 15,253 blocks
==60686==   total heap usage: 1,053,238 allocs, 1,037,985 frees, 1,635,916,091 bytes allocated
==60686== 
==60686== 45 bytes in 4 blocks are definitely lost in loss record 1,631 of 2,872
==60686==    at 0x4C38135: malloc (vg_replace_malloc.c:381)
==60686==    by 0x59245E5: g_malloc (in /usr/lib64/libglib-2.0.so.0.5600.4)
==60686==    by 0x593E2D2: g_strdup (in /usr/lib64/libglib-2.0.so.0.5600.4)
==60686==    by 0x4F27850: ??? (in /usr/lib64/libvirt.so.0.8000.0)
==60686==    by 0x33AC72A5: ??? (in /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so)
==60686==    by 0x33AC7693: ??? (in /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so)
==60686==    by 0x33AC8B4D: qemuDomainBuildNamespace (in /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so)
==60686==    by 0x33AD3E1B: qemuProcessLaunch (in /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so)
==60686==    by 0x33AD9204: qemuProcessStart (in /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so)
==60686==    by 0x33A76AC4: ??? (in /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so)
==60686==    by 0x33A770FE: ??? (in /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so)
==60686==    by 0x5145CD6: virDomainCreate (in /usr/lib64/libvirt.so.0.8000.0)
==60686== 
==60686== 45 bytes in 4 blocks are definitely lost in loss record 1,632 of 2,872
==60686==    at 0x4C38135: malloc (vg_replace_malloc.c:381)
==60686==    by 0x59245E5: g_malloc (in /usr/lib64/libglib-2.0.so.0.5600.4)
==60686==    by 0x593E2D2: g_strdup (in /usr/lib64/libglib-2.0.so.0.5600.4)
==60686==    by 0x4F27850: ??? (in /usr/lib64/libvirt.so.0.8000.0)
==60686==    by 0x33AC72A5: ??? (in /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so)
==60686==    by 0x33AC7693: ??? (in /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so)
==60686==    by 0x33AC8B4D: qemuDomainBuildNamespace (in /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so)
==60686==    by 0x33AD3E1B: qemuProcessLaunch (in /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so)
==60686==    by 0x33AD9204: qemuProcessStart (in /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so)
==60686==    by 0x33ADDB2E: qemuSaveImageStartVM (in /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so)
==60686==    by 0x33A6B61D: ??? (in /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so)
==60686==    by 0x5134D7C: virDomainRestore (in /usr/lib64/libvirt.so.0.8000.0)
==60686== 
==60686== 45 bytes in 4 blocks are definitely lost in loss record 1,633 of 2,872
==60686==    at 0x4C38135: malloc (vg_replace_malloc.c:381)
==60686==    by 0x59245E5: g_malloc (in /usr/lib64/libglib-2.0.so.0.5600.4)
==60686==    by 0x593E2D2: g_strdup (in /usr/lib64/libglib-2.0.so.0.5600.4)
==60686==    by 0x4F27850: ??? (in /usr/lib64/libvirt.so.0.8000.0)
==60686==    by 0x33AC72A5: ??? (in /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so)
==60686==    by 0x33AC7693: ??? (in /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so)
==60686==    by 0x33AC8B4D: qemuDomainBuildNamespace (in /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so)
==60686==    by 0x33AD3E1B: qemuProcessLaunch (in /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so)
==60686==    by 0x33AD9204: qemuProcessStart (in /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so)
==60686==    by 0x33ADDB2E: qemuSaveImageStartVM (in /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so)
==60686==    by 0x33A76CC2: ??? (in /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so)
==60686==    by 0x33A770FE: ??? (in /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so)
==60686== 
==60686== LEAK SUMMARY:
==60686==    definitely lost: 135 bytes in 12 blocks
==60686==    indirectly lost: 0 bytes in 0 blocks
==60686==      possibly lost: 3,056 bytes in 26 blocks
==60686==    still reachable: 1,328,544 bytes in 15,105 blocks
==60686==                       of which reachable via heuristic:
==60686==                         length64           : 1,464 bytes in 30 blocks
==60686==                         newarray           : 1,824 bytes in 34 blocks
==60686==         suppressed: 0 bytes in 0 blocks
==60686== Reachable blocks (those to which a pointer was found) are not shown.
==60686== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==60686== 
==60686== For lists of detected and suppressed errors, rerun with: -s
==60686== ERROR SUMMARY: 29 errors from 29 contexts (suppressed: 0 from 0)


After I check the log, I think these "definitely lost" are not come from service restart but the guest operation in "start, reboot, suspend, resume, save, restore, managedsave, destroy".

So could you help to identify:
1. whether my way to start the service is ok?
2. the memory leak recorded is this bug related or need file a new bug?
Thx a lot.

Comment 18 Michal Privoznik 2023-02-01 09:50:45 UTC

(In reply to liang cong from comment #17)
> So could you help to identify:
> 1. whether my way to start the service is ok?

Yes, that looks okay.

> 2. the memory leak recorded is this bug related or need file a new bug?

No, it's new. Can you please attach your domain XML? I wasn't able to reproduce with any of my domains.

Comment 20 liang cong 2023-02-01 11:18:59 UTC

(In reply to Michal Privoznik from comment #18)
> (In reply to liang cong from comment #17)
> > So could you help to identify:
> > 1. whether my way to start the service is ok?
> 
> Yes, that looks okay.
> 
> > 2. the memory leak recorded is this bug related or need file a new bug?
> 
> No, it's new. Can you please attach your domain XML? I wasn't able to
> reproduce with any of my domains.

I attached the domain xml as comment 19.
And I would verify this bug and for the new issue I may open a new bug, thx a lot.

Comment 21 mhou 2023-02-01 12:25:13 UTC

Hello Liang

Please add me to new bug. Many thanks.

Comment 22 Michal Privoznik 2023-02-01 15:33:53 UTC

(In reply to liang cong from comment #20)

> I attached the domain xml as comment 19.
> And I would verify this bug and for the new issue I may open a new bug, thx
> a lot.

I'm still unable to reproduce. Can you please:

1) make sure debuginfo for whole libvirt is installed? Those ??? frames in the valgrind output suggest that some debuginfos are missing.
2) attach whole valgrind output?

Thanks.

Comment 23 liang cong 2023-02-02 01:19:12 UTC

(In reply to Michal Privoznik from comment #22)
> (In reply to liang cong from comment #20)
> 
> > I attached the domain xml as comment 19.
> > And I would verify this bug and for the new issue I may open a new bug, thx
> > a lot.
> 
> I'm still unable to reproduce. Can you please:
> 
> 1) make sure debuginfo for whole libvirt is installed? Those ??? frames in
> the valgrind output suggest that some debuginfos are missing.
> 2) attach whole valgrind output?
> 
> Thanks.

Firstly, I would verify this bug, then try to reproduce on formal build libvirt-8.0.0-15.module+el8.8.0+18023+bf5b754e, if reproducible, a new bug will be filed.

Comment 24 liang cong 2023-02-02 03:31:11 UTC

Verified on build:
# rpm -q libvirt
libvirt-8.0.0-15.module+el8.8.0+18023+bf5b754e.x86_64

Test steps:
Test steps:
1. Edit libvirtd start cmd as below:
# cat /usr/lib/systemd/system/libvirtd.service | grep ExecStart
ExecStart=/usr/bin/valgrind --tool=memcheck --child-silent-after-fork=yes --leak-check=full --show-leak-kinds=definite --log-file=/var/log/valgrind/libvirtd.log /usr/sbin/libvirtd $LIBVIRTD_ARGS

2. Restart service
# systemctl daemon-reload
# systemctl restart libvirtd

3. Check the valgrind log /var/log/valgrind/libvirtd.log for 3 minutes to wait libvirtd exit by itself
# cat /var/log/valgrind/libvirtd.log 
==41262== Memcheck, a memory error detector
==41262== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==41262== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==41262== Command: /usr/sbin/libvirtd --timeout 120
==41262== Parent PID: 1
==41262== 
==41262== Warning: noted but unhandled ioctl 0xaea3 with no size/direction hints.
==41262==    This could cause spurious value errors to appear.
==41262==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==41262== 
==41262== HEAP SUMMARY:
==41262==     in use at exit: 1,312,376 bytes in 14,892 blocks
==41262==   total heap usage: 475,738 allocs, 460,846 frees, 1,549,063,621 bytes allocated
==41262== 
==41262== LEAK SUMMARY:
==41262==    definitely lost: 0 bytes in 0 blocks
==41262==    indirectly lost: 0 bytes in 0 blocks
==41262==      possibly lost: 3,048 bytes in 26 blocks
==41262==    still reachable: 1,293,856 bytes in 14,757 blocks
==41262==                       of which reachable via heuristic:
==41262==                         length64           : 1,464 bytes in 30 blocks
==41262==                         newarray           : 1,824 bytes in 34 blocks
==41262==         suppressed: 0 bytes in 0 blocks
==41262== Reachable blocks (those to which a pointer was found) are not shown.
==41262== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==41262== 
==41262== For lists of detected and suppressed errors, rerun with: -s
==41262== ERROR SUMMARY: 26 errors from 26 contexts (suppressed: 0 from 0)

From the log there is no memory lead related definitely lost found.

Comment 25 Michal Privoznik 2023-02-02 07:52:55 UTC

(In reply to liang cong from comment #23)
> 
> Firstly, I would verify this bug, then try to reproduce on formal build
> libvirt-8.0.0-15.module+el8.8.0+18023+bf5b754e, if reproducible, a new bug
> will be filed.

I've managed to reproduce. My earlier attempts failed, because I tried on RHEL-9 (don't understand why, since this is filed against RHEL-8). Sorry about that. There is a memleak, yes:

==2583== 45 bytes in 4 blocks are definitely lost in loss record 1,605 of 2,809
==2583==    at 0x4C38135: malloc (vg_replace_malloc.c:381)
==2583==    by 0x59245E5: g_malloc (in /usr/lib64/libglib-2.0.so.0.5600.4)
==2583==    by 0x593E2D2: g_strdup (in /usr/lib64/libglib-2.0.so.0.5600.4)
==2583==    by 0x4F27840: virFileGetMountSubtreeImpl (virfile.c:1997)
==2583==    by 0x26DB22A5: qemuDomainGetPreservedMounts (qemu_namespace.c:140)
==2583==    by 0x26DB2693: qemuNamespaceMknodPaths (qemu_namespace.c:1266)
==2583==    by 0x26DB3B4D: qemuDomainBuildNamespace (qemu_namespace.c:677)
==2583==    by 0x26DBEE1B: qemuProcessLaunch (qemu_process.c:7517)
==2583==    by 0x26DC4204: qemuProcessStart (qemu_process.c:7834)
==2583==    by 0x26D61AC4: qemuDomainObjStart.constprop.59 (qemu_driver.c:6365)
==2583==    by 0x26D620FE: qemuDomainCreateWithFlags (qemu_driver.c:6416)
==2583==    by 0x5145CA6: virDomainCreate (libvirt-domain.c:6721)

This was fixed upstream by the following commit:

https://gitlab.com/libvirt/libvirt/-/commit/bca7a53333ead7c1afd178728de74c2977cd4b5e

which is part of v8.10.0 release. I wonder whether we should use this bug to backport the patch, or create a new one.

Comment 26 liang cong 2023-02-02 08:31:09 UTC

(In reply to Michal Privoznik from comment #25)
> (In reply to liang cong from comment #23)
> > 
> > Firstly, I would verify this bug, then try to reproduce on formal build
> > libvirt-8.0.0-15.module+el8.8.0+18023+bf5b754e, if reproducible, a new bug
> > will be filed.
> 
> I've managed to reproduce. My earlier attempts failed, because I tried on
> RHEL-9 (don't understand why, since this is filed against RHEL-8). Sorry
> about that. There is a memleak, yes:
> 
> ==2583== 45 bytes in 4 blocks are definitely lost in loss record 1,605 of
> 2,809
> ==2583==    at 0x4C38135: malloc (vg_replace_malloc.c:381)
> ==2583==    by 0x59245E5: g_malloc (in /usr/lib64/libglib-2.0.so.0.5600.4)
> ==2583==    by 0x593E2D2: g_strdup (in /usr/lib64/libglib-2.0.so.0.5600.4)
> ==2583==    by 0x4F27840: virFileGetMountSubtreeImpl (virfile.c:1997)
> ==2583==    by 0x26DB22A5: qemuDomainGetPreservedMounts
> (qemu_namespace.c:140)
> ==2583==    by 0x26DB2693: qemuNamespaceMknodPaths (qemu_namespace.c:1266)
> ==2583==    by 0x26DB3B4D: qemuDomainBuildNamespace (qemu_namespace.c:677)
> ==2583==    by 0x26DBEE1B: qemuProcessLaunch (qemu_process.c:7517)
> ==2583==    by 0x26DC4204: qemuProcessStart (qemu_process.c:7834)
> ==2583==    by 0x26D61AC4: qemuDomainObjStart.constprop.59
> (qemu_driver.c:6365)
> ==2583==    by 0x26D620FE: qemuDomainCreateWithFlags (qemu_driver.c:6416)
> ==2583==    by 0x5145CA6: virDomainCreate (libvirt-domain.c:6721)
> 
> This was fixed upstream by the following commit:
> 
> https://gitlab.com/libvirt/libvirt/-/commit/
> bca7a53333ead7c1afd178728de74c2977cd4b5e
> 
> which is part of v8.10.0 release. I wonder whether we should use this bug to
> backport the patch, or create a new one.

Hi michal, I filed another bug#2166573 to track this issue, pls take a look.

Comment 28 errata-xmlrpc 2023-05-16 08:18:36 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: virt:rhel and virt-devel:rhel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:2757

Note You need to log in before you can comment on or make changes to this bug.