Bug 1731332 - Memory leak when external snapshot create and shallow blockcommit cycle
Summary: Memory leak when external snapshot create and shallow blockcommit cycle
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 8.0
Assignee: Peter Krempa
QA Contact: yisun
URL:
Whiteboard:
Depends On: 1731329
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-19 06:53 UTC by Han Han
Modified: 2019-11-06 07:18 UTC (History)
11 users (show)

Fixed In Version: libvirt-5.6.0-1.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1731329
Environment:
Last Closed: 2019-11-06 07:17:49 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)
The full leak report of comment5 (29.32 KB, text/plain)
2019-07-19 10:26 UTC, Han Han
no flags Details
20000 rounds of blockcommit (34.27 KB, image/svg+xml)
2019-08-21 06:43 UTC, yisun
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:3723 None None None 2019-11-06 07:18:16 UTC

Description Han Han 2019-07-19 06:53:51 UTC
Created attachment 1591910 [details]
the script, domain xml&cmdline, origin data, leak graph

Version:
libvirt-5.5.0-1.module+el8.1.0+3580+d7f6488d.x86_64
qemu-kvm-4.0.0-5.module+el8.1.0+3622+5812d9bf.x86_64

About 666 bytes leak per cycle.

+++ This bug was initially created as a clone of Bug #1731329 +++

Description of problem:
As subject

Version-Release number of selected component (if applicable):
libvirt-4.5.0-23.virtcov.el7.x86_64
qemu-kvm-rhev-2.12.0-33.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Prepare a running vm
2. Run the python script(after install psutils and matplotlib), to create snapshot and do blockcommit for the vm:

# ./main.py -e "./snapshot-commit.sh PC" -c 2000 -p "`pidof qemu-kvm libvirtd`" -i 0.2
3. Check the libvirtd ram usage graph, which shows a rising sawteeth line:

Han Han <hhan@redhat.com>
	
Attachments10:29 AM (4 hours ago)
	
to s3-bug-review, yisun
Description of problem:
As subject

Version-Release number of selected component (if applicable):
libvirt-4.5.0-23.virtcov.el7.x86_64
qemu-kvm-rhev-2.12.0-33.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Prepare a running vm
2. Run the python script(after install psutils and matplotlib), to create snapshot and do blockcommit for the vm:

# ./main.py -e "./snapshot-commit.sh PC" -c 2000 -p "`pidof qemu-kvm libvirtd`" -i 0.2
3. Check the libvirtd ram usage graph, which shows a rising sawteeth line:
See the libvirt.png


Actual results:
As above. About 1540 Byte leak every cycle.

Expected results:
No memory leak

--- Additional comment from RHEL Product and Program Management on 2019-07-19 06:45:47 UTC ---

Since this bug report was entered in Red Hat Bugzilla, the release flag has been set to ? to ensure that it is properly evaluated for this release.

Comment 1 Han Han 2019-07-19 07:12:52 UTC
./main.py -e "./snapshot-commit.sh PC" -c 2000 -p "`pidof qemu-kvm libvirtd`" -i 0.2
BTW, the command means execute ./snapshot-commit.sh for 2000 cycles, monitor ram usage of qemu-kvm and libvirtd, 0.2s interval before the next cycle.

Comment 2 Peter Krempa 2019-07-19 08:20:30 UTC
Yesterday I've identified a memory leak which originated from the re-detection of storage metadata from the disk after a blockjob (commit/pull/copy) finished.

The valgrind dump referenced virStorageFileGetMetadataInternal/virStorageFileGetMetadataRecurse/virStorageFileGetMetadata call chain where I've identified few pointer being overwritten. Re-running it with the patch reverted results in virStorageSourceCopy being mentioned.

In my experiments the leaked memory amount was not as high as you claim (only max 200 bytes) but I didn't see any other memory leak while running with valgrind.

commit f0430d069af991475de6fa83ed62a45f8669c645
Author: Peter Krempa <pkrempa@redhat.com>
Date:   Thu Jul 18 16:32:44 2019 +0200

    util: storage: Don't leak metadata on repeated calls of virStorageFileGetMetadata
    
    When querying storage metadata after a block job we re-run
    virStorageFileGetMetadata on the top level storage file. This means that
    the workers (virStorageFileGetMetadataInternal) must not overwrite any
    pointers without freeing them.
    
    This was not considered for src->compat and src->features. Fix it and
    add a comment mentioning that.

One more commit may be required for clean backport:

commit 5b8e64f0bcbbab826cff5be1b0adb000923abfb4
Author: Peter Krempa <pkrempa@redhat.com>
Date:   Thu Jul 18 16:30:18 2019 +0200

    util: storage: Clean up label use in virStorageFileGetMetadataInternal
    
    The function does not do any cleanup, so replace the 'cleanup' label
    with return of -1 and the 'done' label with return of 0.

Comment 3 Han Han 2019-07-19 08:53:56 UTC
(In reply to Peter Krempa from comment #2)
> Yesterday I've identified a memory leak which originated from the
> re-detection of storage metadata from the disk after a blockjob
> (commit/pull/copy) finished.
> 
> The valgrind dump referenced
> virStorageFileGetMetadataInternal/virStorageFileGetMetadataRecurse/
> virStorageFileGetMetadata call chain where I've identified few pointer being
> overwritten. Re-running it with the patch reverted results in
> virStorageSourceCopy being mentioned.
> 
> In my experiments the leaked memory amount was not as high as you claim
> (only max 200 bytes) but I didn't see any other memory leak while running
> with valgrind.
> 
> commit f0430d069af991475de6fa83ed62a45f8669c645
> Author: Peter Krempa <pkrempa@redhat.com>
> Date:   Thu Jul 18 16:32:44 2019 +0200
> 
>     util: storage: Don't leak metadata on repeated calls of
> virStorageFileGetMetadata
>     
>     When querying storage metadata after a block job we re-run
>     virStorageFileGetMetadata on the top level storage file. This means that
>     the workers (virStorageFileGetMetadataInternal) must not overwrite any
>     pointers without freeing them.
>     
>     This was not considered for src->compat and src->features. Fix it and
>     add a comment mentioning that.
> 
> One more commit may be required for clean backport:
> 
> commit 5b8e64f0bcbbab826cff5be1b0adb000923abfb4
> Author: Peter Krempa <pkrempa@redhat.com>
> Date:   Thu Jul 18 16:30:18 2019 +0200
> 
>     util: storage: Clean up label use in virStorageFileGetMetadataInternal
>     
>     The function does not do any cleanup, so replace the 'cleanup' label
>     with return of -1 and the 'done' label with return of 0.

Hi Peter, 
How did you manage to start vm when libvirtd started by valgrind? 
I faced a error when started by valgrind:
# valgrind --leak-check=full libvirtd

# virsh start pc
error: Failed to start domain pc
error: internal error: child reported (status=125): Unable to enter mount namespace: Function not implemented

It is caused by setns failure. The syscall setns is not supported by valgrind now. Do you have any workaround for that issue?

Comment 4 Peter Krempa 2019-07-19 09:44:14 UTC
I turned off namespace support:

# grep namespace /etc/libvirt/qemu.conf
namespaces = [ ]

Comment 5 Han Han 2019-07-19 10:24:38 UTC
The following libvirt definite lost pop up when doing blockcommit with '--shallow --active --pivot' option:
==12573== 32 bytes in 1 blocks are definitely lost in loss record 949 of 2,669                                                                                              
==12573==    at 0x4C331EA: calloc (vg_replace_malloc.c:762)                                                                                                                 
==12573==    by 0x83029D4: _dlerror_run (dlerror.c:141)                                                                                                                     
==12573==    by 0x8302259: dlopen@@GLIBC_2.2.5 (dlopen.c:87)                                                                                                                
==12573==    by 0x532C63A: virModuleLoadFile (virmodule.c:51)                                                                                                               
==12573==    by 0x532C63A: virModuleLoad (virmodule.c:117)                                                                                                                  
==12573==    by 0x5363CED: virStorageFileLoadBackendModule.constprop.1 (virstoragefilebackend.c:65)                                                                         
==12573==    by 0x5363D3A: virStorageFileBackendOnceInit (virstoragefilebackend.c:76)                                                                                       
==12573==    by 0x5363D3A: virStorageFileBackendOnce (virstoragefilebackend.c:86)                                                                                           
==12573==    by 0x7CB9E56: __pthread_once_slow (pthread_once.c:116)                                                                                                         
==12573==    by 0x5363E91: virStorageFileBackendInitialize (virstoragefilebackend.c:86)                                                                                     
==12573==    by 0x5363E91: virStorageFileBackendForType (virstoragefilebackend.c:118)                                                                                       
==12573==    by 0x535F43F: virStorageFileGetBackendForSupportCheck.part.16 (virstoragefile.c:4431)                                                                          
==12573==    by 0x535F4C7: virStorageFileGetBackendForSupportCheck (virstoragefile.c:4454)                                                                                  
==12573==    by 0x535F4C7: virStorageFileSupportsBackingChainTraversal (virstoragefile.c:4444)                                                                              
==12573==    by 0x536372C: virStorageFileGetMetadataRecurse (virstoragefile.c:4857)                                                                                         
==12573==    by 0x5363B07: virStorageFileGetMetadata (virstoragefile.c:4980)   

==12573== 80 bytes in 1 blocks are definitely lost in loss record 1,565 of 2,669                                                                                            
==12573==    at 0x4C331EA: calloc (vg_replace_malloc.c:762)                                                                                                                 
==12573==    by 0x52DFF57: virAlloc (viralloc.c:143)                                                                                                                        
==12573==    by 0x5305FFF: virLastErrorObject (virerror.c:247)                                                                                                              
==12573==    by 0x530668C: virResetLastError (virerror.c:506)                                                                                                               
==12573==    by 0x54FE208: virDomainLookupByName (libvirt-domain.c:414)                                                                                                     
==12573==    by 0x159342: remoteDispatchDomainLookupByName (remote_daemon_dispatch_stubs.h:7388)                                                                            
==12573==    by 0x159342: remoteDispatchDomainLookupByNameHelper (remote_daemon_dispatch_stubs.h:7366)                                                                      
==12573==    by 0x5445CC3: virNetServerProgramDispatchCall (virnetserverprogram.c:435)                                                                                      
==12573==    by 0x5445CC3: virNetServerProgramDispatch (virnetserverprogram.c:302)                                                                                          
==12573==    by 0x544AEAB: virNetServerProcessMsg (virnetserver.c:137)                                                                                                      
==12573==    by 0x544AEAB: virNetServerHandleJob (virnetserver.c:158)                                                                                                       
==12573==    by 0x536CFBF: virThreadPoolWorker (virthreadpool.c:163)                                                                                                        
==12573==    by 0x536C2CB: virThreadHelper (virthread.c:206)                                                                                                                
==12573==    by 0x7CB22DD: start_thread (pthread_create.c:486)                                                                                                              
==12573==    by 0x8601462: clone (clone.S:95)                                                                                                                               
==12573==                                                                                                                                                                   
==12573== 80 bytes in 1 blocks are definitely lost in loss record 1,566 of 2,669                                                                                            
==12573==    at 0x4C331EA: calloc (vg_replace_malloc.c:762)                                                                                                                 
==12573==    by 0x52DFF57: virAlloc (viralloc.c:143)                                                                                                                        
==12573==    by 0x5305FFF: virLastErrorObject (virerror.c:247)                                                                                                              
==12573==    by 0x530668C: virResetLastError (virerror.c:506)                                                                                                               
==12573==    by 0x551F5AC: virDomainSnapshotGetXMLDesc (libvirt-domain-snapshot.c:276)                                                                                      
==12573==    by 0x14ABE9: remoteDispatchDomainSnapshotGetXMLDesc (remote_daemon_dispatch_stubs.h:12003)                                                                     
==12573==    by 0x14ABE9: remoteDispatchDomainSnapshotGetXMLDescHelper (remote_daemon_dispatch_stubs.h:11973)                                                               
==12573==    by 0x5445CC3: virNetServerProgramDispatchCall (virnetserverprogram.c:435)                                                                                      
==12573==    by 0x5445CC3: virNetServerProgramDispatch (virnetserverprogram.c:302)                                                                                          
==12573==    by 0x544AEAB: virNetServerProcessMsg (virnetserver.c:137)                                                                                                      
==12573==    by 0x544AEAB: virNetServerHandleJob (virnetserver.c:158)                                                                                                       
==12573==    by 0x536CFBF: virThreadPoolWorker (virthreadpool.c:163)                                                                                                        
==12573==    by 0x536C2CB: virThreadHelper (virthread.c:206)                                                                                                                
==12573==    by 0x7CB22DD: start_thread (pthread_create.c:486)                                                                                                              
==12573==    by 0x8601462: clone (clone.S:95)  

==12573== 80 bytes in 1 blocks are definitely lost in loss record 1,567 of 2,669                                                                                            
==12573==    at 0x4C331EA: calloc (vg_replace_malloc.c:762)                                                                                                                 
==12573==    by 0x52DFF57: virAlloc (viralloc.c:143)                                                                                                                        
==12573==    by 0x5305FFF: virLastErrorObject (virerror.c:247)                                                                                                              
==12573==    by 0x530668C: virResetLastError (virerror.c:506)                                                                                                               
==12573==    by 0x5505130: virDomainGetState (libvirt-domain.c:2476)                                                                                                        
==12573==    by 0x147EFD: remoteDispatchDomainGetState (remote_daemon_dispatch.c:4288)                                                                                      
==12573==    by 0x147EFD: remoteDispatchDomainGetStateHelper (remote_daemon_dispatch_stubs.h:6540)                                                                          
==12573==    by 0x5445CC3: virNetServerProgramDispatchCall (virnetserverprogram.c:435)                                                                                      
==12573==    by 0x5445CC3: virNetServerProgramDispatch (virnetserverprogram.c:302)                                                                                          
==12573==    by 0x544AEAB: virNetServerProcessMsg (virnetserver.c:137)                                                                                                      
==12573==    by 0x544AEAB: virNetServerHandleJob (virnetserver.c:158)                                                                                                       
==12573==    by 0x536CFBF: virThreadPoolWorker (virthreadpool.c:163)                                                                                                        
==12573==    by 0x536C2CB: virThreadHelper (virthread.c:206)                                                                                                                
==12573==    by 0x7CB22DD: start_thread (pthread_create.c:486)                                                                                                              
==12573==    by 0x8601462: clone (clone.S:95)     

==12573== 82 bytes in 2 blocks are definitely lost in loss record 1,570 of 2,669                                                                                            
==12573==    at 0x4C30EDB: malloc (vg_replace_malloc.c:309)                                                                                                                 
==12573==    by 0x86127A7: __vasprintf_chk (vasprintf_chk.c:80)                                                                                                             
==12573==    by 0x5364B98: UnknownInlinedFun (stdio2.h:213)                                                                                                                 
==12573==    by 0x5364B98: virVasprintfInternal (virstring.c:740)                                                                                                           
==12573==    by 0x5364C82: virAsprintfInternal (virstring.c:761)                                                                                                            
==12573==    by 0x53E8501: virNetworkObjGetPortStatusDir (virnetworkobj.c:1619)                                                                                             
==12573==    by 0x53E88E2: virNetworkObjLoadAllPorts (virnetworkobj.c:1854)                                                                                                 
==12573==    by 0x53E88E2: virNetworkObjLoadAllState (virnetworkobj.c:1095)                                                                                                 
==12573==    by 0x20E636B9: networkStateInitialize (bridge_driver.c:660)                                                                                                    
==12573==    by 0x54FC214: virStateInitialize (libvirt.c:653)                                                                                                               
==12573==    by 0x12E11E: daemonRunStateInit (remote_daemon.c:792)                                                                                                          
==12573==    by 0x536C2F9: virThreadHelper (virthread.c:206)                                                                                                                
==12573==    by 0x7CB22DD: start_thread (pthread_create.c:486)                                                                                                              
==12573==    by 0x8601462: clone (clone.S:95)  

==12573== 240 bytes in 3 blocks are definitely lost in loss record 2,017 of 2,669                                                                                           
==12573==    at 0x4C331EA: calloc (vg_replace_malloc.c:762)                                                                                                                 
==12573==    by 0x52DFF57: virAlloc (viralloc.c:143)                                                                                                                        
==12573==    by 0x5305FFF: virLastErrorObject (virerror.c:247)                                                                                                              
==12573==    by 0x530668C: virResetLastError (virerror.c:506)                                                                                                               
==12573==    by 0x54FC517: virConnectOpen (libvirt.c:1107)                                                                                                                  
==12573==    by 0x15B326: remoteDispatchConnectOpen (remote_daemon_dispatch.c:1978)                                                                                         
==12573==    by 0x15B326: remoteDispatchConnectOpenHelper (remote_daemon_dispatch_stubs.h:3388)                                                                             
==12573==    by 0x5445CC3: virNetServerProgramDispatchCall (virnetserverprogram.c:435)                                                                                      
==12573==    by 0x5445CC3: virNetServerProgramDispatch (virnetserverprogram.c:302)                                                                                          
==12573==    by 0x544AEAB: virNetServerProcessMsg (virnetserver.c:137)                                                                                                      
==12573==    by 0x544AEAB: virNetServerHandleJob (virnetserver.c:158)                                                                                                       
==12573==    by 0x536CFBF: virThreadPoolWorker (virthreadpool.c:163)                                                                                                        
==12573==    by 0x536C2CB: virThreadHelper (virthread.c:206)                                                                                                                
==12573==    by 0x7CB22DD: start_thread (pthread_create.c:486)                                                                                                              
==12573==    by 0x8601462: clone (clone.S:95)                                                                                                                               
==12573==                                      

==12573== 968 bytes in 1 blocks are definitely lost in loss record 2,350 of 2,669                                                                                           
==12573==    at 0x4C331EA: calloc (vg_replace_malloc.c:762)                                                                                                                 
==12573==    by 0x6D26E1E: xmlGetGlobalState (in /usr/lib64/libxml2.so.2.9.7)                                                                                               
==12573==    by 0x6D266B4: __xmlKeepBlanksDefaultValue (in /usr/lib64/libxml2.so.2.9.7)                                                                                     
==12573==    by 0x6CB6100: xmlKeepBlanksDefault (in /usr/lib64/libxml2.so.2.9.7)                                                                                            
==12573==    by 0x53CB4C8: virDomainDefParse (domain_conf.c:21402)
==12573==    by 0x3329AD7E: qemuDomainDefineXMLFlags (qemu_driver.c:7597)
==12573==    by 0x550EE13: virDomainDefineXMLFlags (libvirt-domain.c:6197)
==12573==    by 0x1392E5: remoteDispatchDomainDefineXMLFlags (remote_daemon_dispatch_stubs.h:4783)
==12573==    by 0x1392E5: remoteDispatchDomainDefineXMLFlagsHelper (remote_daemon_dispatch_stubs.h:4761)
==12573==    by 0x5445CC3: virNetServerProgramDispatchCall (virnetserverprogram.c:435)
==12573==    by 0x5445CC3: virNetServerProgramDispatch (virnetserverprogram.c:302)
==12573==    by 0x544AEAB: virNetServerProcessMsg (virnetserver.c:137)    
==12573==    by 0x544AEAB: virNetServerHandleJob (virnetserver.c:158)
==12573==    by 0x536CFBF: virThreadPoolWorker (virthreadpool.c:163) 
==12573==    by 0x536C2CB: virThreadHelper (virthread.c:206)  

==12573== 1,936 bytes in 2 blocks are definitely lost in loss record 2,479 of 2,669
==12573==    at 0x4C331EA: calloc (vg_replace_malloc.c:762)
==12573==    by 0x6D26E1E: xmlGetGlobalState (in /usr/lib64/libxml2.so.2.9.7)
==12573==    by 0x6D266B4: __xmlKeepBlanksDefaultValue (in /usr/lib64/libxml2.so.2.9.7)                                                                                    
==12573==    by 0x6CB6100: xmlKeepBlanksDefault (in /usr/lib64/libxml2.so.2.9.7)
==12573==    by 0x53CB4C8: virDomainDefParse (domain_conf.c:21402)
==12573==    by 0x53CB5D2: virDomainDefCopy (domain_conf.c:29291)
==12573==    by 0x53CB647: virDomainObjSetDefTransient (domain_conf.c:3680)
==12573==    by 0x53CB647: virDomainObjSetDefTransient (domain_conf.c:3668)
==12573==    by 0x3325060C: qemuProcessInit (qemu_process.c:5569)
==12573==    by 0x33252074: qemuProcessStart (qemu_process.c:7096)
==12573==    by 0x332B275F: qemuDomainObjStart.constprop.50 (qemu_driver.c:7496)
==12573==    by 0x332B2D48: qemuDomainCreateWithFlags (qemu_driver.c:7549)
==12573==    by 0x550F816: virDomainCreate (libvirt-domain.c:6520)

Comment 6 Han Han 2019-07-19 10:26:21 UTC
Created attachment 1591945 [details]
The full leak report of comment5

Comment 7 Peter Krempa 2019-07-19 12:07:33 UTC
The first hunk is consistent with what the patch fixes. The rest does not seem to be related to blockjobs.

Comment 9 Peter Krempa 2019-07-19 12:47:45 UTC
As I believe that the commit mentioned in Comment #2 fixes the issue and it is pushed upstream I'm moving this to POST.

v5.5.0-204-gf0430d069a

Comment 11 yisun 2019-08-21 06:43:15 UTC
Created attachment 1606405 [details]
20000 rounds of blockcommit

Comment 12 yisun 2019-08-21 06:43:42 UTC
verified on libvirt-5.6.0-1.module+el8.1.0+3890+4d3d259c.x86_64
with 20000 cycle, and libvirtd's memory is stable. As attachment attachment 1606405 [details]

Comment 14 errata-xmlrpc 2019-11-06 07:17:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3723


Note You need to log in before you can comment on or make changes to this bug.