Bug 707155

Summary: Libvirt is taking 2.4G of memory after running 62 domains for 5 days
Product: Red Hat Enterprise Linux 6 Reporter: Moran Goldboim <mgoldboi>
Component: libvirtAssignee: Michal Privoznik <mprivozn>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.1CC: dallan, dpal, dyuan, gren, gsun, rwu, vbian, veillard, yupzhang
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-0.9.4-rc2-1.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-06 11:09:40 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
leak ods none

Description Moran Goldboim 2011-05-24 08:53:07 UTC
Description of problem:
after running libvirt with around 62 domains for a couple of days memory comes up to 2.5G (2.4 in this case but on other servers reached to 2.9) - looks like libvirt has some memory leak. no actions has been done on the domains beside possible migration initiated by rhevm.

top - 11:28:17 up 4 days, 19:43,  1 user,  load average: 0.20, 0.37, 0.36
Tasks: 1879 total,   4 running, 1875 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.1%us,  1.9%sy,  0.0%ni, 97.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  66092068k total, 33201216k used, 32890852k free,   166648k buffers
Swap: 68108280k total,        0k used, 68108280k free,  9813308k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                       
 9195 root      20   0 2929m 2.4g 4720 S 19.3  3.8 847:20.03 libvirtd   

pgrep qemu | wc -l
62


ps -o etime `pgrep libvirt`
    ELAPSED
 4-19:53:37

after restart memory is reduced to 23m:
40916 root      20   0  692m  23m 5364 S 35.8  0.0   6:40.15 libvirtd                         


Version-Release number of selected component (if applicable):
libvirt-0.8.7-18.el6.x86_64

How reproducible:
happened on many servers 

Steps to Reproduce:
1.run 90 vms on a host
2.leave it running for 4-5 days
3.
  
Actual results:


Expected results:


Additional info:
logs are extensive for this long period - anyhow host are available for debug

Comment 3 Michal Privoznik 2011-06-07 10:37:24 UTC
Moran,

I need to get picture of libvirt as whole for that 5 days. Especially, which APIs were called. So could you please set this in /etc/libvirt/libvirtd.conf:
log_level = 3
log_filters="1:libvirt"
log_outputs="1:file:/var/log/libvirtd_debug.log"

run libvirt for desired time and then attach /var/log/libvirtd_debug.log?

Thanks.

Comment 8 Dave Allan 2011-07-14 02:25:37 UTC
Michal proposed a fix upstream today which received some feedback requiring v2, but no major objections.

https://www.redhat.com/archives/libvir-list/2011-July/msg00752.html

Comment 9 Michal Privoznik 2011-07-14 10:59:35 UTC
sent v2:

https://www.redhat.com/archives/libvir-list/2011-July/msg00812.html

Comment 10 Michal Privoznik 2011-07-14 14:42:17 UTC
Pushed upstream:

commit 85aa40e26d00a64453653c32dc08d25b65e851d5
Author: Michal Privoznik <mprivozn>
Date:   Thu Jul 14 12:53:45 2011 +0200

    storage: Avoid memory leak on metadata fetching
    
    Getting metadata on storage allocates a memory (path) which need to
    be freed after use otherwise it gets leaked. This means after use of
    virStorageFileGetMetadataFromFD or virStorageFileGetMetadata one
    must call virStorageFileFreeMetadata to free it. This function frees
    structure internals and structure itself.

v0.9.3-138-g85aa40e

Comment 13 Vivian Bian 2011-07-24 00:46:19 UTC
Consumption:
1. with the libvirt-client-0.9.3-5.el6.x86_64.rpm , keep 62 domains running 4 days , didn't encounter this bug 

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
 17500 root     20   0  829m  35m 4092 S 0.0  0.0   739:12.03 libvirtd  

2. with the libvirt-0.8.7-18.el6.x86_64.rpm , keep 62 domains running 3 days , didn't reproduce this bug as well . 

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
 3897 root      20   0  672m  12m 4092 S  0.0  0.0   0:04.22 libvirtd  

Question:
1. Is 5 days the threshold to reproduce this bug ?
2. Is 62 domains the standard for this bug ? Can we keep more domains to accelerate the reproduction for this bug ?
3. As we'd not got this bug reproduced with new build during 4 days, might we set this bug to VERIFIED ? 
4. Can you help try with the vdsm test environment ?

Comment 14 Michal Privoznik 2011-07-25 09:43:53 UTC
Vivian,

just running domains do not reproduce this bug. To do so, you need to create backing store volume and query for vol-info. I think vdsm would be helpful here as you don't need to create any reproducer.

To answer yout questions:
1.) No. This bug shows from the very first start of vdsm, and consumes memory a lot. The more domains are running, the more vol-info commands are issued by vdsm, the more memory is leaked.

2. You don't need to run so much domains. I'd suggest to run ~10 domains for an hour (with vdsm) and you should see the bug immediatelly: 'top -p $(pgrep libvirtd)' and notice the mem usage changes.

Since I think a little help from vdsm is needed here (at least to setup vdsm) I am not clearing needinfo flag.

Comment 16 dyuan 2011-07-27 05:51:13 UTC
Hi, mprivozn

I checked the mem leak with the following steps, there is no leak using the libvirt-0.9.3-5/6/8.el6 and there is mem leak for libvirt-0.9.3-2.el6.
Is it the expected result for your fix in this bug ? or more leak checking should be conducted ?

# qemu-img create -f qcow2 test.qcow2 -o backing_file=foo.qcow2 +10M

# qemu-img info /var/lib/libvirt/images/test.qcow2 
image: /var/lib/libvirt/images/test.qcow2
file format: qcow2
virtual size: 10M (10485760 bytes)
disk size: 140K
cluster_size: 65536
backing file: foo.qcow2 (actual path: /var/lib/libvirt/images/foo.qcow2)

# valgrind -v --leak-check=full virsh vol-info /var/lib/libvirt/images/test.qcow2

Thanks
dyuan

Comment 17 Michal Privoznik 2011-07-27 08:10:12 UTC
The mem leak should be in daemon, not virsh.

Moran, can you please help dyuan to set up vdsm so he can verify this bug?

Comment 18 Moran Goldboim 2011-07-27 12:46:31 UTC
just tested it with libvirt-0.9.3-8.el6.x86_64 seems to be a 144KB avg leak every 30 sec running 40 vms - 4KB a second 102B/vm/second- see attached ods

Comment 19 Moran Goldboim 2011-07-27 12:47:17 UTC
Created attachment 515513 [details]
leak ods

Comment 20 Dave Allan 2011-07-27 16:24:08 UTC
(In reply to comment #18)
> just tested it with libvirt-0.9.3-8.el6.x86_64 seems to be a 144KB avg leak
> every 30 sec running 40 vms - 4KB a second 102B/vm/second- see attached ods

Didn't this fix pass testing previously?  Are we looking at a new leak?

Comment 21 Moran Goldboim 2011-07-28 06:31:19 UTC
Michal and myself took a look at the patch which seemed to work, however on the build we are looking at a leak which it's rate is suitable to the original one.

Comment 22 Michal Privoznik 2011-07-28 14:02:47 UTC
Moving to POST:

commit 09d7eba99d95b887bd284b3418ea21438f6af277
Author: Michal Privoznik <mprivozn>
Date:   Thu Jul 28 15:42:57 2011 +0200

    qemu: Fix memory leak on metadata fetching
    
    As written in virStorageFileGetMetadataFromFD decription, caller
    must free metadata after use. Qemu driver miss this and therefore
    leak metadata which can grow to huge mem leak if somebody query
    for blockInfo a lot.


v0.9.4-rc1-36-g09d7eba

Comment 24 yuping zhang 2011-08-01 10:54:15 UTC
Tested this issue with: 

libvirt-0.9.4-0rc2.el6.x86_64
vdsm-4.9-86.el6.x86_64
qemu-kvm-0.12.1.2-2.172.el6.x86_64

Running 75 domains nearly 3 days:

#top -p $(pgrep libvirt)
 top - 18:47:24 up 2 days, 21:23,  1 user,  load average: 10.78, 10.71, 10.71
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
Cpu(s): 13.6%us,  7.6%sy,  0.0%ni, 78.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  1041251268k total, 46389468k used, 994861800k free,   220356k buffers
Swap:  1048568k total,        0k used,  1048568k free,  1418964k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                               
86707 root      20   0  593m  15m 6676 S 15.5  0.0 918:17.33 libvirtd 

The memory stays at 15m.So change status to VERIFIED.

Comment 25 errata-xmlrpc 2011-12-06 11:09:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1513.html