Bug 891324
| Summary: | libvirtd memory leak, process grows to 10 GB | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Dave Allan <dallan> | ||||||
| Component: | libvirt | Assignee: | John Ferlan <jferlan> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 7.0 | CC: | acathrow, ajia, cwei, dyuan, gren, mzhan, rjones, ydu | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | libvirt-1.0.2-1.el7 | Doc Type: | Bug Fix | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | 890039 | ||||||||
| : | 903203 903280 (view as bug list) | Environment: | |||||||
| Last Closed: | 2014-06-13 12:28:09 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | 890039, 903203, 903280 | ||||||||
| Bug Blocks: | |||||||||
| Attachments: |
|
||||||||
|
Description
Dave Allan
2013-01-02 15:45:27 UTC
I'm trying to reproduce this bug in a rhel6.4 host, but can't get the memory leak issue. $ rpm -q libvirt libvirt-0.10.2-14.el6.x86_64 I start a session libvirtd and start/destroy 4 VMs in a loop, after running a whole night, the result is(same with initial status): PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2471 ydu 20 0 1003m 12m 4624 S 0.0 0.2 36:18.89 libvirtd (In reply to comment #1) > I'm trying to reproduce this bug in a rhel6.4 host, but can't get the memory > leak issue. > > $ rpm -q libvirt > libvirt-0.10.2-14.el6.x86_64 > > I start a session libvirtd and start/destroy 4 VMs in a loop, after running > a whole night, the result is(same with initial status): > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > 2471 ydu 20 0 1003m 12m 4624 S 0.0 0.2 36:18.89 libvirtd That is a RHEL6 test result, i will try it on rhel7 later. On the rhel7 host $ rpm -q libvirt libvirt-1.0.1-1.el7.x86_64 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 21812 ydu 20 0 920m 13m 6188 S 0.0 0.2 0:00.46 libvirtd ------ PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 21812 ydu 20 0 920m 15m 6216 S 1.3 0.2 2:46.92 libvirtd ------ PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 21812 ydu 20 0 920m 22m 6216 S 2.3 0.3 26:34.30 libvirtd (In reply to comment #3) > On the rhel7 host > $ rpm -q libvirt > libvirt-1.0.1-1.el7.x86_64 > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > 21812 ydu 20 0 920m 13m 6188 S 0.0 0.2 0:00.46 libvirtd > ------ > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 21812 ydu 20 0 920m 15m 6216 S 1.3 0.2 2:46.92 libvirtd > ------ > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 21812 ydu 20 0 920m 22m 6216 S 2.3 0.3 26:34.30 libvirtd Hi, Richard I can't reproduce this bug on the rehl7 host, is there any thing i can do to encounter it ? Please help, thanks! (In reply to comment #4) > (In reply to comment #3) > > On the rhel7 host > > $ rpm -q libvirt > > libvirt-1.0.1-1.el7.x86_64 > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > > > 21812 ydu 20 0 920m 13m 6188 S 0.0 0.2 0:00.46 libvirtd > > ------ > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > 21812 ydu 20 0 920m 15m 6216 S 1.3 0.2 2:46.92 libvirtd > > ------ > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > 21812 ydu 20 0 920m 22m 6216 S 2.3 0.3 26:34.30 libvirtd > > Hi, Richard > I can't reproduce this bug on the rehl7 host, is there any thing i can do > to encounter it ? Please help, thanks! The fact that it's growing (slightly) even in your test is interesting. I am usually running the libguestfs test suite over and over again, and that is likely what caused the memory leak. However I do not have any specific reproducer. Just from the core dump file, it is hard to analyze the memory leak via gdb. I am going to use vlgrind to examine memory leak next step. It seems to be a leak in a very specific type of string (SELinux contexts). Have you looked at the code in libvirt or dependent libs which could allocate this kind of string? I have another libvirtd process (same machine) which could do with going on a diet: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 17542 rjones 20 0 5983m 5.1g 4.1g S 0.0 33.1 5:21.45 libvirtd I'm going to assume this is the same problem. The only thing this machine has been doing involving libvirt AFAIK is running the libguestfs test suite, so this's probably your reproducer: cd libguestfs make && make check Created attachment 683423 [details]
make check log
I collected the statistics after running libguestfs test suit once on F18 as follows. I am curious that the size of virtual images of libvirtd process didn't grow up after hitting the 1051m more or less. And I attached the "make check" log, seeing whether Richard could help review the log. In the meantime, the longevity testing is ongoing. libguestfs is upstream git head. libvirtd is libvirt-0.10.2.2-3 $ time make check > check.log 2>&1 real 19m23.452s user 1m53.100s sys 1m11.198s PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 32273 gren 20 0 1051m 55m 9756 S 0.0 1.4 0:27.81 libvirtd The 'make check' log looks fine from what I can tell.
It doesn't look as if you're hitting the bug at all. On
my machine I see the 'RES' size MUCH larger than 55m. For
example after freshly starting libvirtd:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
27566 rjones 20 0 702m 23m 16m S 0.0 0.1 0:00.31 libvirtd
Then after running the test suite once (libguestfs.git):
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
27566 rjones 20 0 2334m 1.3g 1.0g S 0.0 8.2 0:37.25 libvirtd
Let's assume this is not libvirt, but some dependent library.
What versions do you have of the following?
audit-libs-2.2.2-3.fc18.x86_64
avahi-libs-0.6.31-6.fc18.x86_64
cyrus-sasl-lib-2.1.25-2.fc18.x86_64
dbus-libs-1.6.8-2.fc18.x86_64
device-mapper-libs-1.02.77-4.fc18.x86_64
glibc-2.16-28.fc18.x86_64
gnutls-2.12.22-1.fc18.x86_64
keyutils-libs-1.5.5-3.fc18.x86_64
krb5-libs-1.10.3-5.fc18.x86_64
libcap-ng-0.7.3-1.fc18.x86_64
libcom_err-1.42.5-1.fc18.x86_64
libcurl-7.27.0-5.fc18.x86_64
libgcc-4.7.2-8.fc18.x86_64
libgcrypt-1.5.0-8.fc18.x86_64
libgpg-error-1.10-3.fc18.x86_64
libidn-1.26-1.fc18.x86_64
libnl3-3.2.14-1.fc18.x86_64
libselinux-2.1.12-7.fc18.x86_64
libsepol-2.1.8-2.fc18.x86_64
libssh2-1.4.3-1.fc18.x86_64
libtasn1-2.14-1.fc18.x86_64
libvirt-client-0.10.2.2-3.fc18.x86_64
libwsman1-2.3.6-1.fc18.x86_64
libxml2-2.9.0-3.fc18.x86_64
nspr-4.9.4-1.fc18.x86_64
nss-3.14.1-3.fc18.x86_64
nss-softokn-freebl-3.14.1-5.fc18.x86_64
nss-util-3.14.1-2.fc18.x86_64
numactl-libs-2.0.7-7.fc18.x86_64
openldap-2.4.33-3.fc18.x86_64
openssl-libs-1.0.1c-7.fc18.x86_64
p11-kit-0.14-1.fc18.x86_64
pcre-8.31-4.fc18.x86_64
systemd-libs-197-1.fc18.1.x86_64
xz-libs-5.1.2-2alpha.fc18.x86_64
yajl-2.0.4-1.fc18.x86_64
zlib-1.2.7-9.fc18.x86_64
(The output of:
ldd /usr/sbin/libvirtd | grep '=>' | awk '{print $3}' | \
xargs -n 1 rpm -qf | sort -u
)
$ diff myrpm yourrpm 1c1 < audit-libs-2.2.2-2.fc18.x86_64 --- > audit-libs-2.2.2-3.fc18.x86_64 3c3 < cyrus-sasl-lib-2.1.23-36.fc18.x86_64 --- > cyrus-sasl-lib-2.1.25-2.fc18.x86_64 7c7 < gnutls-2.12.21-1.fc18.x86_64 --- > gnutls-2.12.22-1.fc18.x86_64 12c12 < libcurl-7.27.0-4.fc18.x86_64 --- > libcurl-7.27.0-5.fc18.x86_64 16c16 < libidn-1.25-3.fc18.x86_64 --- > libidn-1.26-1.fc18.x86_64 33,34c33,34 < pcre-8.31-3.fc18.x86_64 < systemd-libs-195-15.fc18.x86_64 --- > pcre-8.31-4.fc18.x86_64 > systemd-libs-197-1.fc18.1.x86_64 Let me upgrade version of these rpms, then try "make check" again. BTW, the longevity testing(24h) was fine: 32273 gren 20 0 1059m 99m 8804 S 0.0 2.6 40:37.10 libvirtd Unfortunately, the memory leak issue still couldn't be able to be reproduced on my testing machine even after upgrading these libraries which libvirtd loads. I used both libguestfs.git and libguestfs-1.21.3 in orders. One thing I need to mention, I disable compliling for object with --disable-gobject ./autogen.sh \ --prefix=/usr \ --with-default-attach-method=libvirt \ --enable-gcc-warnings \ --enable-gtk-doc \ --disable-gobject \ -C gobject shouldn't be involved at all. I wonder if this is a kernel issue? Or a configuration issue? Because I haven't rebooted for a while my running kernel is: Linux choo.home.annexia.org 3.6.9-4.fc18.x86_64 #1 SMP Tue Dec 4 14:12:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux I have no ~/.cache/libvirt/libvirtd.conf file at all, and my /etc/libvirt/libvirtd.conf file is not changed. Prelink says ... prelink: /usr/sbin/libvirtd: at least one of file's dependencies has changed since prelinking S.?...... /usr/sbin/libvirtd whatever that is supposed to mean. 31360 rjones 20 0 25.2g 24g 19g S 0.0 159.0 100:07.64 libvirtd
This was caused by just running the following script overnight:
#!/bin/bash -
set -e
#set -x
cd /tmp
rm -f test1.img
while true; do
echo -n .
output=$(
guestfish <<EOF
sparse test1.img 1G
run
mkfs ext2 /dev/sda
vfs-type /dev/sda
EOF
)
if [ "$output" != "ext2" ]; then
echo "error: output is not 'ext2'"
echo "$output"
exit 1
fi
done
gren: Is your SELinux Enforcing? Mine is. (In reply to comment #15) > gren: Is your SELinux Enforcing? Mine is. Yes, I ran "make check" with selinux enforcing for sure, as I saw your suspicion on selinux before. Okay, let me try the shell script. Rebooting did NOT fix this issue. After a fresh boot libvirtd is still leaking memory like crazy. 5526 rjones 20 0 2648m 1.4g 1.1g S 0.0 9.0 0:41.74 libvirtd Rich, there's a fix for a leak that was discovered by coverity that is on list if not committed upstream; can you confirm that the leak is still there with the git HEAD? John, since you're deep in the weeds with the coverity reported stuff, can you work with Rich on this? Sorry chaps, it DOES occur with libvirt.git, although qualitatively I suspect the leak may be different(!) It certainly leaks a little bit less than before, but libvirtd is still growing substantially after a single run of the test suite: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 31717 rjones 20 0 2522m 1.4g 1.1g S 0.0 9.0 0:09.49 lt-libvirtd --- John, here is a link to some documentation about how to run the libguestfs test suite: http://oirase.annexia.org/running-libguestfs-test-suite.txt Created attachment 685857 [details]
excerpt from valgrind.log
I just ran a single run of libguestfs-test-tool against
libvirtd from git and I can see loads of memory leaks, some
in obvious SELinux-related code.
Maybe checking /proc/$(libvirtd-PID)/maps is a way of looking in the memory areas of a libvirtd process. Methodology: I'm testing libvirt.git (at time of writing, commit bf62e9953c3dde35551a0c2a91d30a294516609a). I have applied my patch to libselinux to fix bug 903203. I use the following command (from the libvirt.git directory) to run libvirtd under valgrind: killall lt-libvirtd libvirtd ./run valgrind \ --trace-children=no \ --child-silent-after-fork=yes \ --log-file=/tmp/valgrind.log \ --leak-check=full \ --suppressions=./tests/.valgrind.supp \ ./daemon/libvirtd --timeout 30 while at the same time running the following command from the libguestfs.git directory: while true; do echo -n .; ../libvirt/run ./run ./fish/guestfish -N fs exit; done This creates lots of transient domains, serially. After some time I ^C the while loop and wait 30s for libvirtd to exit, then I examine the /tmp/valgrind.log file. I also did the same but without using valgrind on libvirtd. Observations: The libvirtd process still grows unbounded (without valgrind). Valgrind shows some reachable blocks, but no significant unreachable blocks, indicating there is no memory leak. Examining /proc/$pid/maps shows that the number of memory mappings is growing like crazy: $ wc -l /proc/589/maps 822 /proc/589/maps $ wc -l /proc/589/maps 837 /proc/589/maps $ wc -l /proc/589/maps 852 /proc/589/maps $ wc -l /proc/589/maps 867 /proc/589/maps $ wc -l /proc/589/maps 942 /proc/589/maps $ wc -l /proc/589/maps 1032 /proc/589/maps Examination of /proc/$pid/maps appears to point to libselinux again: $ awk '{print $6}' /proc/589/maps | sort | uniq -c | sort -nr | head 340 /etc/selinux/targeted/contexts/files/file_contexts.local.bin 340 /etc/selinux/targeted/contexts/files/file_contexts.homedirs.bin 340 /etc/selinux/targeted/contexts/files/file_contexts.bin 67 4 /usr/lib64/sasl2/libsasldb.so.2.0.25 4 /usr/lib64/sasl2/libplain.so.2.0.25 4 /usr/lib64/sasl2/liblogin.so.2.0.25 4 /usr/lib64/sasl2/libgssapiv2.so.2.0.25 4 /usr/lib64/sasl2/libdigestmd5.so.2.0.25 4 /usr/lib64/sasl2/libcrammd5.so.2.0.25 I've now fixed all of the memory leaks, so that I can run the whole libguestfs test suite without any noticable growth in the size of libvirtd. There is one patch waiting to go upstream in libvirt: https://www.redhat.com/archives/libvir-list/2013-January/msg01730.html There is one patch upstream in libvirt which probably needs to be backported to F18: http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=05cc03518987fa0f8399930d14c1d635591ca49b There are two bugs that need to be fixed in libselinux: bug 903203, bug 903280. I checked the libselinux-2.1.12-7 source code, it tries to mmap three binary file /etc/selinux/targeted/contexts/files/file_contexts.bin /etc/selinux/targeted/contexts/files/file_contexts.homedirs.bin /etc/selinux/targeted/contexts/files/file_contexts.bin But I don't find a place where it munmaps them, so probably it is mmap leak. There are no these .bin files on my testing machine, possibly it is the reason why I can not reproduce the issue. Afer recompiling the selinux-policy-targeted, it generated .bin files. and on my testing machine, it can reproduce the bug too :). I'll put this bug in POST so Cole knows that we need the patches in Fedora 18 and RHEL 7. http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=05cc03518987fa0f8399930d14c1d635591ca49b http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=6159710ca1eecefa7c81335612c8141c88fc35a9 Thanks guannan's help, now i can reproduce this bug on RHEL7 # uname -r 3.7.0-0.31.el7.x86_64 # rpm -q libvirt libvirt-1.0.1-1.el7.x86_64 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 15057 root 20 0 2695284 1.868g 1.518g S 2.659 24.96 0:15.48 libvirtd (In reply to comment #28) > Thanks guannan's help, now i can reproduce this bug on RHEL7 > # uname -r > 3.7.0-0.31.el7.x86_64 > # rpm -q libvirt > libvirt-1.0.1-1.el7.x86_64 > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > 15057 root 20 0 2695284 1.868g 1.518g S 2.659 24.96 0:15.48 libvirtd Also look at /proc/15057/maps. Does it map the same /etc/selinux/... files over and over again? (see also comment 23) (In reply to comment #29) > (In reply to comment #28) > > Thanks guannan's help, now i can reproduce this bug on RHEL7 > > # uname -r > > 3.7.0-0.31.el7.x86_64 > > # rpm -q libvirt > > libvirt-1.0.1-1.el7.x86_64 > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > > > 15057 root 20 0 2695284 1.868g 1.518g S 2.659 24.96 0:15.48 libvirtd > > Also look at /proc/15057/maps. Does it map the same > /etc/selinux/... files over and over again? (see also comment 23) Yes, it is. Hi yanbing We need to verify the bug is fixed already in RHEL7 rather than just being able to reproduce it. So probably we need to wait some time for the libselinux with fix on RHEL7. I just saw no such release for RHEL. For fedora, libselinux-2.1.12-7.1.fc18 with the fix is released already. the http://koji.fedoraproject.org/koji/buildinfo?buildID=380977 (In reply to comment #31) > Hi yanbing > > We need to verify the bug is fixed already in RHEL7 rather than just being > able to reproduce it. So probably we need to wait some time for the > libselinux with fix on RHEL7. > I just saw no such release for RHEL. Yes, and both bug 903203 and bug 903280 for fedora 18, so should we clone a new one to against rhel7 and backport these fixed patches ? > For fedora, libselinux-2.1.12-7.1.fc18 with the fix is released already. > the http://koji.fedoraproject.org/koji/buildinfo?buildID=380977 Test with libvirt-1.0.5-2.el7.x86_64, and can't reproduce this bug.
# ps axu|grep libvirtd
root 21581 0.0 0.0 1072100 21280 ? Ssl Jun03 0:01 /usr/sbin/libvirtd
# wc -l /proc/21581/maps
443 /proc/21581/maps
# wc -l /proc/21581/maps
443 /proc/21581/maps
# awk '{print $6}' /proc/21581/maps | sort | uniq -c | sort -nr | head
65
4 /usr/lib64/sasl2/libsasldb.so.3.0.0
4 /usr/lib64/sasl2/libplain.so.3.0.0
4 /usr/lib64/sasl2/liblogin.so.3.0.0
4 /usr/lib64/sasl2/libdigestmd5.so.3.0.0
4 /usr/lib64/sasl2/libcrammd5.so.3.0.0
4 /usr/lib64/sasl2/libanonymous.so.3.0.0
4 /usr/lib64/pkcs11/p11-kit-trust.so
4 /usr/lib64/pkcs11/gnome-keyring-pkcs11.so
4 /usr/lib64/libz.so.1.2.7
# ll /etc/selinux/targeted/contexts/files/*.bin
-rw-------. 1 root root 1171276 May 21 13:08 /etc/selinux/targeted/contexts/files/file_contexts.bin
-rw-------. 1 root root 36550 May 21 13:08 /etc/selinux/targeted/contexts/files/file_contexts.homedirs.bin
-rw-------. 1 root root 16 May 21 13:08 /etc/selinux/targeted/contexts/files/file_contexts.local.bin
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21581 root 20 0 1072100 22128 10656 S 41.81 0.017 0:25.56 libvirtd
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21581 root 20 0 1072100 21744 10656 S 42.48 0.017 0:31.97 libvirtd
And talk with gren, the bug can be VERIFIED.
This request was resolved in Red Hat Enterprise Linux 7.0. Contact your manager or support representative in case you have further questions about the request. |