Bug 1731332
| Summary: | Memory leak when external snapshot create and shallow blockcommit cycle | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Han Han <hhan> | ||||||
| Component: | libvirt | Assignee: | Peter Krempa <pkrempa> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | yisun | ||||||
| Severity: | unspecified | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 8.1 | CC: | dyuan, fjin, jdenemar, libvirt-maint, lmen, pkrempa, rbalakri, virt-bugs, xuzhang, yalzhang, yisun | ||||||
| Target Milestone: | rc | Flags: | knoel:
mirror+
|
||||||
| Target Release: | 8.0 | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | libvirt-5.6.0-1.el8 | Doc Type: | If docs needed, set a value | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | 1731329 | Environment: | |||||||
| Last Closed: | 2019-11-06 07:17:49 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | 1731329 | ||||||||
| Bug Blocks: | |||||||||
| Attachments: |
|
||||||||
|
Description
Han Han
2019-07-19 06:53:51 UTC
./main.py -e "./snapshot-commit.sh PC" -c 2000 -p "`pidof qemu-kvm libvirtd`" -i 0.2 BTW, the command means execute ./snapshot-commit.sh for 2000 cycles, monitor ram usage of qemu-kvm and libvirtd, 0.2s interval before the next cycle. Yesterday I've identified a memory leak which originated from the re-detection of storage metadata from the disk after a blockjob (commit/pull/copy) finished.
The valgrind dump referenced virStorageFileGetMetadataInternal/virStorageFileGetMetadataRecurse/virStorageFileGetMetadata call chain where I've identified few pointer being overwritten. Re-running it with the patch reverted results in virStorageSourceCopy being mentioned.
In my experiments the leaked memory amount was not as high as you claim (only max 200 bytes) but I didn't see any other memory leak while running with valgrind.
commit f0430d069af991475de6fa83ed62a45f8669c645
Author: Peter Krempa <pkrempa>
Date: Thu Jul 18 16:32:44 2019 +0200
util: storage: Don't leak metadata on repeated calls of virStorageFileGetMetadata
When querying storage metadata after a block job we re-run
virStorageFileGetMetadata on the top level storage file. This means that
the workers (virStorageFileGetMetadataInternal) must not overwrite any
pointers without freeing them.
This was not considered for src->compat and src->features. Fix it and
add a comment mentioning that.
One more commit may be required for clean backport:
commit 5b8e64f0bcbbab826cff5be1b0adb000923abfb4
Author: Peter Krempa <pkrempa>
Date: Thu Jul 18 16:30:18 2019 +0200
util: storage: Clean up label use in virStorageFileGetMetadataInternal
The function does not do any cleanup, so replace the 'cleanup' label
with return of -1 and the 'done' label with return of 0.
(In reply to Peter Krempa from comment #2) > Yesterday I've identified a memory leak which originated from the > re-detection of storage metadata from the disk after a blockjob > (commit/pull/copy) finished. > > The valgrind dump referenced > virStorageFileGetMetadataInternal/virStorageFileGetMetadataRecurse/ > virStorageFileGetMetadata call chain where I've identified few pointer being > overwritten. Re-running it with the patch reverted results in > virStorageSourceCopy being mentioned. > > In my experiments the leaked memory amount was not as high as you claim > (only max 200 bytes) but I didn't see any other memory leak while running > with valgrind. > > commit f0430d069af991475de6fa83ed62a45f8669c645 > Author: Peter Krempa <pkrempa> > Date: Thu Jul 18 16:32:44 2019 +0200 > > util: storage: Don't leak metadata on repeated calls of > virStorageFileGetMetadata > > When querying storage metadata after a block job we re-run > virStorageFileGetMetadata on the top level storage file. This means that > the workers (virStorageFileGetMetadataInternal) must not overwrite any > pointers without freeing them. > > This was not considered for src->compat and src->features. Fix it and > add a comment mentioning that. > > One more commit may be required for clean backport: > > commit 5b8e64f0bcbbab826cff5be1b0adb000923abfb4 > Author: Peter Krempa <pkrempa> > Date: Thu Jul 18 16:30:18 2019 +0200 > > util: storage: Clean up label use in virStorageFileGetMetadataInternal > > The function does not do any cleanup, so replace the 'cleanup' label > with return of -1 and the 'done' label with return of 0. Hi Peter, How did you manage to start vm when libvirtd started by valgrind? I faced a error when started by valgrind: # valgrind --leak-check=full libvirtd # virsh start pc error: Failed to start domain pc error: internal error: child reported (status=125): Unable to enter mount namespace: Function not implemented It is caused by setns failure. The syscall setns is not supported by valgrind now. Do you have any workaround for that issue? I turned off namespace support: # grep namespace /etc/libvirt/qemu.conf namespaces = [ ] The following libvirt definite lost pop up when doing blockcommit with '--shallow --active --pivot' option: ==12573== 32 bytes in 1 blocks are definitely lost in loss record 949 of 2,669 ==12573== at 0x4C331EA: calloc (vg_replace_malloc.c:762) ==12573== by 0x83029D4: _dlerror_run (dlerror.c:141) ==12573== by 0x8302259: dlopen@@GLIBC_2.2.5 (dlopen.c:87) ==12573== by 0x532C63A: virModuleLoadFile (virmodule.c:51) ==12573== by 0x532C63A: virModuleLoad (virmodule.c:117) ==12573== by 0x5363CED: virStorageFileLoadBackendModule.constprop.1 (virstoragefilebackend.c:65) ==12573== by 0x5363D3A: virStorageFileBackendOnceInit (virstoragefilebackend.c:76) ==12573== by 0x5363D3A: virStorageFileBackendOnce (virstoragefilebackend.c:86) ==12573== by 0x7CB9E56: __pthread_once_slow (pthread_once.c:116) ==12573== by 0x5363E91: virStorageFileBackendInitialize (virstoragefilebackend.c:86) ==12573== by 0x5363E91: virStorageFileBackendForType (virstoragefilebackend.c:118) ==12573== by 0x535F43F: virStorageFileGetBackendForSupportCheck.part.16 (virstoragefile.c:4431) ==12573== by 0x535F4C7: virStorageFileGetBackendForSupportCheck (virstoragefile.c:4454) ==12573== by 0x535F4C7: virStorageFileSupportsBackingChainTraversal (virstoragefile.c:4444) ==12573== by 0x536372C: virStorageFileGetMetadataRecurse (virstoragefile.c:4857) ==12573== by 0x5363B07: virStorageFileGetMetadata (virstoragefile.c:4980) ==12573== 80 bytes in 1 blocks are definitely lost in loss record 1,565 of 2,669 ==12573== at 0x4C331EA: calloc (vg_replace_malloc.c:762) ==12573== by 0x52DFF57: virAlloc (viralloc.c:143) ==12573== by 0x5305FFF: virLastErrorObject (virerror.c:247) ==12573== by 0x530668C: virResetLastError (virerror.c:506) ==12573== by 0x54FE208: virDomainLookupByName (libvirt-domain.c:414) ==12573== by 0x159342: remoteDispatchDomainLookupByName (remote_daemon_dispatch_stubs.h:7388) ==12573== by 0x159342: remoteDispatchDomainLookupByNameHelper (remote_daemon_dispatch_stubs.h:7366) ==12573== by 0x5445CC3: virNetServerProgramDispatchCall (virnetserverprogram.c:435) ==12573== by 0x5445CC3: virNetServerProgramDispatch (virnetserverprogram.c:302) ==12573== by 0x544AEAB: virNetServerProcessMsg (virnetserver.c:137) ==12573== by 0x544AEAB: virNetServerHandleJob (virnetserver.c:158) ==12573== by 0x536CFBF: virThreadPoolWorker (virthreadpool.c:163) ==12573== by 0x536C2CB: virThreadHelper (virthread.c:206) ==12573== by 0x7CB22DD: start_thread (pthread_create.c:486) ==12573== by 0x8601462: clone (clone.S:95) ==12573== ==12573== 80 bytes in 1 blocks are definitely lost in loss record 1,566 of 2,669 ==12573== at 0x4C331EA: calloc (vg_replace_malloc.c:762) ==12573== by 0x52DFF57: virAlloc (viralloc.c:143) ==12573== by 0x5305FFF: virLastErrorObject (virerror.c:247) ==12573== by 0x530668C: virResetLastError (virerror.c:506) ==12573== by 0x551F5AC: virDomainSnapshotGetXMLDesc (libvirt-domain-snapshot.c:276) ==12573== by 0x14ABE9: remoteDispatchDomainSnapshotGetXMLDesc (remote_daemon_dispatch_stubs.h:12003) ==12573== by 0x14ABE9: remoteDispatchDomainSnapshotGetXMLDescHelper (remote_daemon_dispatch_stubs.h:11973) ==12573== by 0x5445CC3: virNetServerProgramDispatchCall (virnetserverprogram.c:435) ==12573== by 0x5445CC3: virNetServerProgramDispatch (virnetserverprogram.c:302) ==12573== by 0x544AEAB: virNetServerProcessMsg (virnetserver.c:137) ==12573== by 0x544AEAB: virNetServerHandleJob (virnetserver.c:158) ==12573== by 0x536CFBF: virThreadPoolWorker (virthreadpool.c:163) ==12573== by 0x536C2CB: virThreadHelper (virthread.c:206) ==12573== by 0x7CB22DD: start_thread (pthread_create.c:486) ==12573== by 0x8601462: clone (clone.S:95) ==12573== 80 bytes in 1 blocks are definitely lost in loss record 1,567 of 2,669 ==12573== at 0x4C331EA: calloc (vg_replace_malloc.c:762) ==12573== by 0x52DFF57: virAlloc (viralloc.c:143) ==12573== by 0x5305FFF: virLastErrorObject (virerror.c:247) ==12573== by 0x530668C: virResetLastError (virerror.c:506) ==12573== by 0x5505130: virDomainGetState (libvirt-domain.c:2476) ==12573== by 0x147EFD: remoteDispatchDomainGetState (remote_daemon_dispatch.c:4288) ==12573== by 0x147EFD: remoteDispatchDomainGetStateHelper (remote_daemon_dispatch_stubs.h:6540) ==12573== by 0x5445CC3: virNetServerProgramDispatchCall (virnetserverprogram.c:435) ==12573== by 0x5445CC3: virNetServerProgramDispatch (virnetserverprogram.c:302) ==12573== by 0x544AEAB: virNetServerProcessMsg (virnetserver.c:137) ==12573== by 0x544AEAB: virNetServerHandleJob (virnetserver.c:158) ==12573== by 0x536CFBF: virThreadPoolWorker (virthreadpool.c:163) ==12573== by 0x536C2CB: virThreadHelper (virthread.c:206) ==12573== by 0x7CB22DD: start_thread (pthread_create.c:486) ==12573== by 0x8601462: clone (clone.S:95) ==12573== 82 bytes in 2 blocks are definitely lost in loss record 1,570 of 2,669 ==12573== at 0x4C30EDB: malloc (vg_replace_malloc.c:309) ==12573== by 0x86127A7: __vasprintf_chk (vasprintf_chk.c:80) ==12573== by 0x5364B98: UnknownInlinedFun (stdio2.h:213) ==12573== by 0x5364B98: virVasprintfInternal (virstring.c:740) ==12573== by 0x5364C82: virAsprintfInternal (virstring.c:761) ==12573== by 0x53E8501: virNetworkObjGetPortStatusDir (virnetworkobj.c:1619) ==12573== by 0x53E88E2: virNetworkObjLoadAllPorts (virnetworkobj.c:1854) ==12573== by 0x53E88E2: virNetworkObjLoadAllState (virnetworkobj.c:1095) ==12573== by 0x20E636B9: networkStateInitialize (bridge_driver.c:660) ==12573== by 0x54FC214: virStateInitialize (libvirt.c:653) ==12573== by 0x12E11E: daemonRunStateInit (remote_daemon.c:792) ==12573== by 0x536C2F9: virThreadHelper (virthread.c:206) ==12573== by 0x7CB22DD: start_thread (pthread_create.c:486) ==12573== by 0x8601462: clone (clone.S:95) ==12573== 240 bytes in 3 blocks are definitely lost in loss record 2,017 of 2,669 ==12573== at 0x4C331EA: calloc (vg_replace_malloc.c:762) ==12573== by 0x52DFF57: virAlloc (viralloc.c:143) ==12573== by 0x5305FFF: virLastErrorObject (virerror.c:247) ==12573== by 0x530668C: virResetLastError (virerror.c:506) ==12573== by 0x54FC517: virConnectOpen (libvirt.c:1107) ==12573== by 0x15B326: remoteDispatchConnectOpen (remote_daemon_dispatch.c:1978) ==12573== by 0x15B326: remoteDispatchConnectOpenHelper (remote_daemon_dispatch_stubs.h:3388) ==12573== by 0x5445CC3: virNetServerProgramDispatchCall (virnetserverprogram.c:435) ==12573== by 0x5445CC3: virNetServerProgramDispatch (virnetserverprogram.c:302) ==12573== by 0x544AEAB: virNetServerProcessMsg (virnetserver.c:137) ==12573== by 0x544AEAB: virNetServerHandleJob (virnetserver.c:158) ==12573== by 0x536CFBF: virThreadPoolWorker (virthreadpool.c:163) ==12573== by 0x536C2CB: virThreadHelper (virthread.c:206) ==12573== by 0x7CB22DD: start_thread (pthread_create.c:486) ==12573== by 0x8601462: clone (clone.S:95) ==12573== ==12573== 968 bytes in 1 blocks are definitely lost in loss record 2,350 of 2,669 ==12573== at 0x4C331EA: calloc (vg_replace_malloc.c:762) ==12573== by 0x6D26E1E: xmlGetGlobalState (in /usr/lib64/libxml2.so.2.9.7) ==12573== by 0x6D266B4: __xmlKeepBlanksDefaultValue (in /usr/lib64/libxml2.so.2.9.7) ==12573== by 0x6CB6100: xmlKeepBlanksDefault (in /usr/lib64/libxml2.so.2.9.7) ==12573== by 0x53CB4C8: virDomainDefParse (domain_conf.c:21402) ==12573== by 0x3329AD7E: qemuDomainDefineXMLFlags (qemu_driver.c:7597) ==12573== by 0x550EE13: virDomainDefineXMLFlags (libvirt-domain.c:6197) ==12573== by 0x1392E5: remoteDispatchDomainDefineXMLFlags (remote_daemon_dispatch_stubs.h:4783) ==12573== by 0x1392E5: remoteDispatchDomainDefineXMLFlagsHelper (remote_daemon_dispatch_stubs.h:4761) ==12573== by 0x5445CC3: virNetServerProgramDispatchCall (virnetserverprogram.c:435) ==12573== by 0x5445CC3: virNetServerProgramDispatch (virnetserverprogram.c:302) ==12573== by 0x544AEAB: virNetServerProcessMsg (virnetserver.c:137) ==12573== by 0x544AEAB: virNetServerHandleJob (virnetserver.c:158) ==12573== by 0x536CFBF: virThreadPoolWorker (virthreadpool.c:163) ==12573== by 0x536C2CB: virThreadHelper (virthread.c:206) ==12573== 1,936 bytes in 2 blocks are definitely lost in loss record 2,479 of 2,669 ==12573== at 0x4C331EA: calloc (vg_replace_malloc.c:762) ==12573== by 0x6D26E1E: xmlGetGlobalState (in /usr/lib64/libxml2.so.2.9.7) ==12573== by 0x6D266B4: __xmlKeepBlanksDefaultValue (in /usr/lib64/libxml2.so.2.9.7) ==12573== by 0x6CB6100: xmlKeepBlanksDefault (in /usr/lib64/libxml2.so.2.9.7) ==12573== by 0x53CB4C8: virDomainDefParse (domain_conf.c:21402) ==12573== by 0x53CB5D2: virDomainDefCopy (domain_conf.c:29291) ==12573== by 0x53CB647: virDomainObjSetDefTransient (domain_conf.c:3680) ==12573== by 0x53CB647: virDomainObjSetDefTransient (domain_conf.c:3668) ==12573== by 0x3325060C: qemuProcessInit (qemu_process.c:5569) ==12573== by 0x33252074: qemuProcessStart (qemu_process.c:7096) ==12573== by 0x332B275F: qemuDomainObjStart.constprop.50 (qemu_driver.c:7496) ==12573== by 0x332B2D48: qemuDomainCreateWithFlags (qemu_driver.c:7549) ==12573== by 0x550F816: virDomainCreate (libvirt-domain.c:6520) Created attachment 1591945 [details] The full leak report of comment5 The first hunk is consistent with what the patch fixes. The rest does not seem to be related to blockjobs. As I believe that the commit mentioned in Comment #2 fixes the issue and it is pushed upstream I'm moving this to POST. v5.5.0-204-gf0430d069a Created attachment 1606405 [details]
20000 rounds of blockcommit
verified on libvirt-5.6.0-1.module+el8.1.0+3890+4d3d259c.x86_64
with 20000 cycle, and libvirtd's memory is stable. As attachment attachment 1606405 [details]
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3723 |