Created attachment 1591910 [details] the script, domain xml&cmdline, origin data, leak graph Version: libvirt-5.5.0-1.module+el8.1.0+3580+d7f6488d.x86_64 qemu-kvm-4.0.0-5.module+el8.1.0+3622+5812d9bf.x86_64 About 666 bytes leak per cycle. +++ This bug was initially created as a clone of Bug #1731329 +++ Description of problem: As subject Version-Release number of selected component (if applicable): libvirt-4.5.0-23.virtcov.el7.x86_64 qemu-kvm-rhev-2.12.0-33.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. Prepare a running vm 2. Run the python script(after install psutils and matplotlib), to create snapshot and do blockcommit for the vm: # ./main.py -e "./snapshot-commit.sh PC" -c 2000 -p "`pidof qemu-kvm libvirtd`" -i 0.2 3. Check the libvirtd ram usage graph, which shows a rising sawteeth line: Han Han <hhan> Attachments10:29 AM (4 hours ago) to s3-bug-review, yisun Description of problem: As subject Version-Release number of selected component (if applicable): libvirt-4.5.0-23.virtcov.el7.x86_64 qemu-kvm-rhev-2.12.0-33.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. Prepare a running vm 2. Run the python script(after install psutils and matplotlib), to create snapshot and do blockcommit for the vm: # ./main.py -e "./snapshot-commit.sh PC" -c 2000 -p "`pidof qemu-kvm libvirtd`" -i 0.2 3. Check the libvirtd ram usage graph, which shows a rising sawteeth line: See the libvirt.png Actual results: As above. About 1540 Byte leak every cycle. Expected results: No memory leak --- Additional comment from RHEL Product and Program Management on 2019-07-19 06:45:47 UTC --- Since this bug report was entered in Red Hat Bugzilla, the release flag has been set to ? to ensure that it is properly evaluated for this release.
./main.py -e "./snapshot-commit.sh PC" -c 2000 -p "`pidof qemu-kvm libvirtd`" -i 0.2 BTW, the command means execute ./snapshot-commit.sh for 2000 cycles, monitor ram usage of qemu-kvm and libvirtd, 0.2s interval before the next cycle.
Yesterday I've identified a memory leak which originated from the re-detection of storage metadata from the disk after a blockjob (commit/pull/copy) finished. The valgrind dump referenced virStorageFileGetMetadataInternal/virStorageFileGetMetadataRecurse/virStorageFileGetMetadata call chain where I've identified few pointer being overwritten. Re-running it with the patch reverted results in virStorageSourceCopy being mentioned. In my experiments the leaked memory amount was not as high as you claim (only max 200 bytes) but I didn't see any other memory leak while running with valgrind. commit f0430d069af991475de6fa83ed62a45f8669c645 Author: Peter Krempa <pkrempa> Date: Thu Jul 18 16:32:44 2019 +0200 util: storage: Don't leak metadata on repeated calls of virStorageFileGetMetadata When querying storage metadata after a block job we re-run virStorageFileGetMetadata on the top level storage file. This means that the workers (virStorageFileGetMetadataInternal) must not overwrite any pointers without freeing them. This was not considered for src->compat and src->features. Fix it and add a comment mentioning that. One more commit may be required for clean backport: commit 5b8e64f0bcbbab826cff5be1b0adb000923abfb4 Author: Peter Krempa <pkrempa> Date: Thu Jul 18 16:30:18 2019 +0200 util: storage: Clean up label use in virStorageFileGetMetadataInternal The function does not do any cleanup, so replace the 'cleanup' label with return of -1 and the 'done' label with return of 0.
(In reply to Peter Krempa from comment #2) > Yesterday I've identified a memory leak which originated from the > re-detection of storage metadata from the disk after a blockjob > (commit/pull/copy) finished. > > The valgrind dump referenced > virStorageFileGetMetadataInternal/virStorageFileGetMetadataRecurse/ > virStorageFileGetMetadata call chain where I've identified few pointer being > overwritten. Re-running it with the patch reverted results in > virStorageSourceCopy being mentioned. > > In my experiments the leaked memory amount was not as high as you claim > (only max 200 bytes) but I didn't see any other memory leak while running > with valgrind. > > commit f0430d069af991475de6fa83ed62a45f8669c645 > Author: Peter Krempa <pkrempa> > Date: Thu Jul 18 16:32:44 2019 +0200 > > util: storage: Don't leak metadata on repeated calls of > virStorageFileGetMetadata > > When querying storage metadata after a block job we re-run > virStorageFileGetMetadata on the top level storage file. This means that > the workers (virStorageFileGetMetadataInternal) must not overwrite any > pointers without freeing them. > > This was not considered for src->compat and src->features. Fix it and > add a comment mentioning that. > > One more commit may be required for clean backport: > > commit 5b8e64f0bcbbab826cff5be1b0adb000923abfb4 > Author: Peter Krempa <pkrempa> > Date: Thu Jul 18 16:30:18 2019 +0200 > > util: storage: Clean up label use in virStorageFileGetMetadataInternal > > The function does not do any cleanup, so replace the 'cleanup' label > with return of -1 and the 'done' label with return of 0. Hi Peter, How did you manage to start vm when libvirtd started by valgrind? I faced a error when started by valgrind: # valgrind --leak-check=full libvirtd # virsh start pc error: Failed to start domain pc error: internal error: child reported (status=125): Unable to enter mount namespace: Function not implemented It is caused by setns failure. The syscall setns is not supported by valgrind now. Do you have any workaround for that issue?
I turned off namespace support: # grep namespace /etc/libvirt/qemu.conf namespaces = [ ]
The following libvirt definite lost pop up when doing blockcommit with '--shallow --active --pivot' option: ==12573== 32 bytes in 1 blocks are definitely lost in loss record 949 of 2,669 ==12573== at 0x4C331EA: calloc (vg_replace_malloc.c:762) ==12573== by 0x83029D4: _dlerror_run (dlerror.c:141) ==12573== by 0x8302259: dlopen@@GLIBC_2.2.5 (dlopen.c:87) ==12573== by 0x532C63A: virModuleLoadFile (virmodule.c:51) ==12573== by 0x532C63A: virModuleLoad (virmodule.c:117) ==12573== by 0x5363CED: virStorageFileLoadBackendModule.constprop.1 (virstoragefilebackend.c:65) ==12573== by 0x5363D3A: virStorageFileBackendOnceInit (virstoragefilebackend.c:76) ==12573== by 0x5363D3A: virStorageFileBackendOnce (virstoragefilebackend.c:86) ==12573== by 0x7CB9E56: __pthread_once_slow (pthread_once.c:116) ==12573== by 0x5363E91: virStorageFileBackendInitialize (virstoragefilebackend.c:86) ==12573== by 0x5363E91: virStorageFileBackendForType (virstoragefilebackend.c:118) ==12573== by 0x535F43F: virStorageFileGetBackendForSupportCheck.part.16 (virstoragefile.c:4431) ==12573== by 0x535F4C7: virStorageFileGetBackendForSupportCheck (virstoragefile.c:4454) ==12573== by 0x535F4C7: virStorageFileSupportsBackingChainTraversal (virstoragefile.c:4444) ==12573== by 0x536372C: virStorageFileGetMetadataRecurse (virstoragefile.c:4857) ==12573== by 0x5363B07: virStorageFileGetMetadata (virstoragefile.c:4980) ==12573== 80 bytes in 1 blocks are definitely lost in loss record 1,565 of 2,669 ==12573== at 0x4C331EA: calloc (vg_replace_malloc.c:762) ==12573== by 0x52DFF57: virAlloc (viralloc.c:143) ==12573== by 0x5305FFF: virLastErrorObject (virerror.c:247) ==12573== by 0x530668C: virResetLastError (virerror.c:506) ==12573== by 0x54FE208: virDomainLookupByName (libvirt-domain.c:414) ==12573== by 0x159342: remoteDispatchDomainLookupByName (remote_daemon_dispatch_stubs.h:7388) ==12573== by 0x159342: remoteDispatchDomainLookupByNameHelper (remote_daemon_dispatch_stubs.h:7366) ==12573== by 0x5445CC3: virNetServerProgramDispatchCall (virnetserverprogram.c:435) ==12573== by 0x5445CC3: virNetServerProgramDispatch (virnetserverprogram.c:302) ==12573== by 0x544AEAB: virNetServerProcessMsg (virnetserver.c:137) ==12573== by 0x544AEAB: virNetServerHandleJob (virnetserver.c:158) ==12573== by 0x536CFBF: virThreadPoolWorker (virthreadpool.c:163) ==12573== by 0x536C2CB: virThreadHelper (virthread.c:206) ==12573== by 0x7CB22DD: start_thread (pthread_create.c:486) ==12573== by 0x8601462: clone (clone.S:95) ==12573== ==12573== 80 bytes in 1 blocks are definitely lost in loss record 1,566 of 2,669 ==12573== at 0x4C331EA: calloc (vg_replace_malloc.c:762) ==12573== by 0x52DFF57: virAlloc (viralloc.c:143) ==12573== by 0x5305FFF: virLastErrorObject (virerror.c:247) ==12573== by 0x530668C: virResetLastError (virerror.c:506) ==12573== by 0x551F5AC: virDomainSnapshotGetXMLDesc (libvirt-domain-snapshot.c:276) ==12573== by 0x14ABE9: remoteDispatchDomainSnapshotGetXMLDesc (remote_daemon_dispatch_stubs.h:12003) ==12573== by 0x14ABE9: remoteDispatchDomainSnapshotGetXMLDescHelper (remote_daemon_dispatch_stubs.h:11973) ==12573== by 0x5445CC3: virNetServerProgramDispatchCall (virnetserverprogram.c:435) ==12573== by 0x5445CC3: virNetServerProgramDispatch (virnetserverprogram.c:302) ==12573== by 0x544AEAB: virNetServerProcessMsg (virnetserver.c:137) ==12573== by 0x544AEAB: virNetServerHandleJob (virnetserver.c:158) ==12573== by 0x536CFBF: virThreadPoolWorker (virthreadpool.c:163) ==12573== by 0x536C2CB: virThreadHelper (virthread.c:206) ==12573== by 0x7CB22DD: start_thread (pthread_create.c:486) ==12573== by 0x8601462: clone (clone.S:95) ==12573== 80 bytes in 1 blocks are definitely lost in loss record 1,567 of 2,669 ==12573== at 0x4C331EA: calloc (vg_replace_malloc.c:762) ==12573== by 0x52DFF57: virAlloc (viralloc.c:143) ==12573== by 0x5305FFF: virLastErrorObject (virerror.c:247) ==12573== by 0x530668C: virResetLastError (virerror.c:506) ==12573== by 0x5505130: virDomainGetState (libvirt-domain.c:2476) ==12573== by 0x147EFD: remoteDispatchDomainGetState (remote_daemon_dispatch.c:4288) ==12573== by 0x147EFD: remoteDispatchDomainGetStateHelper (remote_daemon_dispatch_stubs.h:6540) ==12573== by 0x5445CC3: virNetServerProgramDispatchCall (virnetserverprogram.c:435) ==12573== by 0x5445CC3: virNetServerProgramDispatch (virnetserverprogram.c:302) ==12573== by 0x544AEAB: virNetServerProcessMsg (virnetserver.c:137) ==12573== by 0x544AEAB: virNetServerHandleJob (virnetserver.c:158) ==12573== by 0x536CFBF: virThreadPoolWorker (virthreadpool.c:163) ==12573== by 0x536C2CB: virThreadHelper (virthread.c:206) ==12573== by 0x7CB22DD: start_thread (pthread_create.c:486) ==12573== by 0x8601462: clone (clone.S:95) ==12573== 82 bytes in 2 blocks are definitely lost in loss record 1,570 of 2,669 ==12573== at 0x4C30EDB: malloc (vg_replace_malloc.c:309) ==12573== by 0x86127A7: __vasprintf_chk (vasprintf_chk.c:80) ==12573== by 0x5364B98: UnknownInlinedFun (stdio2.h:213) ==12573== by 0x5364B98: virVasprintfInternal (virstring.c:740) ==12573== by 0x5364C82: virAsprintfInternal (virstring.c:761) ==12573== by 0x53E8501: virNetworkObjGetPortStatusDir (virnetworkobj.c:1619) ==12573== by 0x53E88E2: virNetworkObjLoadAllPorts (virnetworkobj.c:1854) ==12573== by 0x53E88E2: virNetworkObjLoadAllState (virnetworkobj.c:1095) ==12573== by 0x20E636B9: networkStateInitialize (bridge_driver.c:660) ==12573== by 0x54FC214: virStateInitialize (libvirt.c:653) ==12573== by 0x12E11E: daemonRunStateInit (remote_daemon.c:792) ==12573== by 0x536C2F9: virThreadHelper (virthread.c:206) ==12573== by 0x7CB22DD: start_thread (pthread_create.c:486) ==12573== by 0x8601462: clone (clone.S:95) ==12573== 240 bytes in 3 blocks are definitely lost in loss record 2,017 of 2,669 ==12573== at 0x4C331EA: calloc (vg_replace_malloc.c:762) ==12573== by 0x52DFF57: virAlloc (viralloc.c:143) ==12573== by 0x5305FFF: virLastErrorObject (virerror.c:247) ==12573== by 0x530668C: virResetLastError (virerror.c:506) ==12573== by 0x54FC517: virConnectOpen (libvirt.c:1107) ==12573== by 0x15B326: remoteDispatchConnectOpen (remote_daemon_dispatch.c:1978) ==12573== by 0x15B326: remoteDispatchConnectOpenHelper (remote_daemon_dispatch_stubs.h:3388) ==12573== by 0x5445CC3: virNetServerProgramDispatchCall (virnetserverprogram.c:435) ==12573== by 0x5445CC3: virNetServerProgramDispatch (virnetserverprogram.c:302) ==12573== by 0x544AEAB: virNetServerProcessMsg (virnetserver.c:137) ==12573== by 0x544AEAB: virNetServerHandleJob (virnetserver.c:158) ==12573== by 0x536CFBF: virThreadPoolWorker (virthreadpool.c:163) ==12573== by 0x536C2CB: virThreadHelper (virthread.c:206) ==12573== by 0x7CB22DD: start_thread (pthread_create.c:486) ==12573== by 0x8601462: clone (clone.S:95) ==12573== ==12573== 968 bytes in 1 blocks are definitely lost in loss record 2,350 of 2,669 ==12573== at 0x4C331EA: calloc (vg_replace_malloc.c:762) ==12573== by 0x6D26E1E: xmlGetGlobalState (in /usr/lib64/libxml2.so.2.9.7) ==12573== by 0x6D266B4: __xmlKeepBlanksDefaultValue (in /usr/lib64/libxml2.so.2.9.7) ==12573== by 0x6CB6100: xmlKeepBlanksDefault (in /usr/lib64/libxml2.so.2.9.7) ==12573== by 0x53CB4C8: virDomainDefParse (domain_conf.c:21402) ==12573== by 0x3329AD7E: qemuDomainDefineXMLFlags (qemu_driver.c:7597) ==12573== by 0x550EE13: virDomainDefineXMLFlags (libvirt-domain.c:6197) ==12573== by 0x1392E5: remoteDispatchDomainDefineXMLFlags (remote_daemon_dispatch_stubs.h:4783) ==12573== by 0x1392E5: remoteDispatchDomainDefineXMLFlagsHelper (remote_daemon_dispatch_stubs.h:4761) ==12573== by 0x5445CC3: virNetServerProgramDispatchCall (virnetserverprogram.c:435) ==12573== by 0x5445CC3: virNetServerProgramDispatch (virnetserverprogram.c:302) ==12573== by 0x544AEAB: virNetServerProcessMsg (virnetserver.c:137) ==12573== by 0x544AEAB: virNetServerHandleJob (virnetserver.c:158) ==12573== by 0x536CFBF: virThreadPoolWorker (virthreadpool.c:163) ==12573== by 0x536C2CB: virThreadHelper (virthread.c:206) ==12573== 1,936 bytes in 2 blocks are definitely lost in loss record 2,479 of 2,669 ==12573== at 0x4C331EA: calloc (vg_replace_malloc.c:762) ==12573== by 0x6D26E1E: xmlGetGlobalState (in /usr/lib64/libxml2.so.2.9.7) ==12573== by 0x6D266B4: __xmlKeepBlanksDefaultValue (in /usr/lib64/libxml2.so.2.9.7) ==12573== by 0x6CB6100: xmlKeepBlanksDefault (in /usr/lib64/libxml2.so.2.9.7) ==12573== by 0x53CB4C8: virDomainDefParse (domain_conf.c:21402) ==12573== by 0x53CB5D2: virDomainDefCopy (domain_conf.c:29291) ==12573== by 0x53CB647: virDomainObjSetDefTransient (domain_conf.c:3680) ==12573== by 0x53CB647: virDomainObjSetDefTransient (domain_conf.c:3668) ==12573== by 0x3325060C: qemuProcessInit (qemu_process.c:5569) ==12573== by 0x33252074: qemuProcessStart (qemu_process.c:7096) ==12573== by 0x332B275F: qemuDomainObjStart.constprop.50 (qemu_driver.c:7496) ==12573== by 0x332B2D48: qemuDomainCreateWithFlags (qemu_driver.c:7549) ==12573== by 0x550F816: virDomainCreate (libvirt-domain.c:6520)
Created attachment 1591945 [details] The full leak report of comment5
The first hunk is consistent with what the patch fixes. The rest does not seem to be related to blockjobs.
As I believe that the commit mentioned in Comment #2 fixes the issue and it is pushed upstream I'm moving this to POST. v5.5.0-204-gf0430d069a
Created attachment 1606405 [details] 20000 rounds of blockcommit
verified on libvirt-5.6.0-1.module+el8.1.0+3890+4d3d259c.x86_64 with 20000 cycle, and libvirtd's memory is stable. As attachment attachment 1606405 [details]
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3723