Hide Forgot
+++ This bug was initially created as a clone of Bug #722648 +++ Description of problem: Changes to the remote driver have introduced a file descriptor leak on client libvirt socket connections. Repeated opening and closing of libvirt connections will cause the client program to run out of file descriptors. Version-Release number of selected component (if applicable): v0.9.3 How reproducible: Using ruby-libvirt. (Although you get the same results with the python library). Steps to Reproduce: 1. Running irb, require 'libvirt' 2. conn=Libvirt.open('') 3. conn.close 4. conn=Libvirt.open('') Actual results: Checking the process fd list, the socket and pipe set opened for a connection are not closed, nor are they reused by the second open (which creates another socket and pipe set) Expected results: on v0.9.2 and below the socket and pipe set disappear when closed. Additional info: Debugging this as far as I can it looks like the 'localUses' reference counter in remote_driver.c is getting out of sync somehow. The lack of VIR_DEBUG statements in the close sequence makes tracking it a little difficult. Certainly virReleaseConnect is being called correctly so it's in the driver somewhere. --- Additional comment from ajia on 2011-07-19 02:37:20 EDT --- Hi Neil, I can't reproduce this bug on libvirt-0.9.1-1.el6.x86_64, the following is my steps, please correct me if wrong: 1. check current open fd of libvirtd # ll /proc/$(pidof libvirtd)/fd total 0 lr-x------. 1 root root 64 Jul 19 13:50 0 -> /dev/null l-wx------. 1 root root 64 Jul 19 13:50 1 -> /dev/null lrwx------. 1 root root 64 Jul 19 13:50 10 -> socket:[17482635] lrwx------. 1 root root 64 Jul 19 13:50 11 -> socket:[17482636] lrwx------. 1 root root 64 Jul 19 13:50 12 -> socket:[17482638] lrwx------. 1 root root 64 Jul 19 13:50 13 -> socket:[17482641] lrwx------. 1 root root 64 Jul 19 13:50 14 -> socket:[17482640] l-wx------. 1 root root 64 Jul 19 13:50 15 -> /proc/mtrr lrwx------. 1 root root 64 Jul 19 13:50 16 -> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/config lrwx------. 1 root root 64 Jul 19 13:50 17 -> socket:[17482796] lrwx------. 1 root root 64 Jul 19 13:50 18 -> socket:[17503915] lrwx------. 1 root root 64 Jul 19 13:50 19 -> socket:[17503916] l-wx------. 1 root root 64 Jul 19 13:50 2 -> /dev/null lrwx------. 1 root root 64 Jul 19 13:50 20 -> socket:[17503917] lr-x------. 1 root root 64 Jul 19 13:50 3 -> /dev/urandom lrwx------. 1 root root 64 Jul 19 13:50 4 -> socket:[17482648] lr-x------. 1 root root 64 Jul 19 13:50 5 -> pipe:[17482633] l-wx------. 1 root root 64 Jul 19 13:50 6 -> pipe:[17482633] lr-x------. 1 root root 64 Jul 19 13:50 7 -> pipe:[17482634] l-wx------. 1 root root 64 Jul 19 13:50 8 -> pipe:[17482634] l-wx------. 1 root root 64 Jul 19 13:50 9 -> /var/log/libvirt/libvirtd.log Note: socket total: 10 pipe: 4 2. open another terminal then run interactive python, and close connection of libvirtd: # python Python 2.6.6 (r266:84292, Apr 11 2011, 15:50:32) [GCC 4.4.4 20100726 (Red Hat 4.4.4-13)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import libvirt >>> con = libvirt.open('') >>> con.close() 0 3. repeat 1: # ll /proc/$(pidof libvirtd)/fd total 0 lr-x------. 1 root root 64 Jul 19 13:50 0 -> /dev/null l-wx------. 1 root root 64 Jul 19 13:50 1 -> /dev/null lrwx------. 1 root root 64 Jul 19 13:50 10 -> socket:[17482635] lrwx------. 1 root root 64 Jul 19 13:50 11 -> socket:[17482636] lrwx------. 1 root root 64 Jul 19 13:50 12 -> socket:[17482638] lrwx------. 1 root root 64 Jul 19 13:50 13 -> socket:[17482641] lrwx------. 1 root root 64 Jul 19 13:50 14 -> socket:[17482640] l-wx------. 1 root root 64 Jul 19 13:50 15 -> /proc/mtrr lrwx------. 1 root root 64 Jul 19 13:50 16 -> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/config lrwx------. 1 root root 64 Jul 19 13:50 17 -> socket:[17482796] l-wx------. 1 root root 64 Jul 19 13:50 2 -> /dev/null lr-x------. 1 root root 64 Jul 19 13:50 3 -> /dev/urandom lrwx------. 1 root root 64 Jul 19 13:50 4 -> socket:[17482648] lr-x------. 1 root root 64 Jul 19 13:50 5 -> pipe:[17482633] l-wx------. 1 root root 64 Jul 19 13:50 6 -> pipe:[17482633] lr-x------. 1 root root 64 Jul 19 13:50 7 -> pipe:[17482634] l-wx------. 1 root root 64 Jul 19 13:50 8 -> pipe:[17482634] l-wx------. 1 root root 64 Jul 19 13:50 9 -> /var/log/libvirt/libvirtd.log Note: socket total: 7 pipe: 4 On v0.9.1 the socket and pipe set still exists when closed connection, so I haven't reproduced this issue. In addition, libvirtd is a daemon service programming, if libvirtd is running, I think socket may not disappear. However, fds leak indeed exist, I cant catch it using valgrind: ==1254== ==1254== FILE DESCRIPTORS: 6 open at exit. ==1254== Open file descriptor 5: ==1254== at 0x30736D9D47: pipe2 (syscall-template.S:82) ==1254== by 0x4DD6267: rpl_pipe2 (pipe2.c:61) ==1254== by 0x4C4C1C5: virEventPollInit (event_poll.c:648) ==1254== by 0x4C4AA94: virEventRegisterDefaultImpl (event.c:208) ==1254== by 0x42150C: main (virsh.c:13790) ==1254== ==1254== Open file descriptor 4: ==1254== at 0x30736D9D47: pipe2 (syscall-template.S:82) ==1254== by 0x4DD6267: rpl_pipe2 (pipe2.c:61) ==1254== by 0x4C4C1C5: virEventPollInit (event_poll.c:648) ==1254== by 0x4C4AA94: virEventRegisterDefaultImpl (event.c:208) ==1254== by 0x42150C: main (virsh.c:13790) ==1254== I'm trying figure out reason of leak, which is leaded by in virEventPollInit function from ./src/util/event_poll.c: 640 int virEventPollInit(void) 641 { 642 if (virMutexInit(&eventLoop.lock) < 0) { 643 virReportSystemError(errno, "%s", 644 _("Unable to initialize mutex")); 645 return -1; 646 } 647 648 if (pipe2(eventLoop.wakeupfd, O_CLOEXEC | O_NONBLOCK) < 0) { 649 virReportSystemError(errno, "%s", 650 _("Unable to setup wakeup pipe")); 651 return -1; 652 } Need to close created pipe in above codes whether failure or not. I will attach a patch in here, and wait for developer review, but it seems this is another issue. anyway, please developers help us confirm this. Alex --- Additional comment from ajia on 2011-07-19 02:46:23 EDT --- Created attachment 513716 [details] util-fix-fds-leak.patch --- Additional comment from neil.uk on 2011-07-19 03:03:44 EDT --- No, you won't be able to. It appears as a regression in v0.9.3 only due to reorganisation in the remote driver code (or something) 0.9.2 and below behave correctly. And it is not 'libvirtd' that is leaking, it is the client program (the process running either the python interpreter, or the ruby interpreter in the case of the example above). I perhaps didn't make the latter clear in my report, for which I apologise. --- Additional comment from neil.uk on 2011-07-19 03:07:28 EDT --- And just to be absolutely clear the leak is not in the interpreter but in the libvirt client shared libraries pulled in by 'require libvirt' or 'import libvirt'. I run my interpreter with LIBVIRT_DEBUG=1 irb so you can see when the library gets activated. --- Additional comment from ajia on 2011-07-19 03:21:45 EDT --- Hi Neil, Thanks for your rapid reply and nice comments, I clearly know you mean now. I have ever met this issue by using valgrind, I think it's the same issue: https://bugzilla.redhat.com/attachment.cgi?id=487500&action=edit: ==13864== 10 bytes in 1 blocks are definitely lost in loss record 10 of 34 ==13864== at 0x4A05FDE: malloc (vg_replace_malloc.c:236) ==13864== by 0x308587FD91: strdup (in /lib64/libc-2.12.so) ==13864== by 0x3BB68BA0A3: doRemoteOpen (remote_driver.c:451) ==13864== by 0x3BB68BCFCF: remoteOpen (remote_driver.c:1111) ==13864== by 0x3BB689A0FF: do_open (libvirt.c:1290) ==13864== by 0x3BB689ADD5: virConnectOpenAuth (libvirt.c:1545) ==13864== by 0x41F659: main (virsh.c:11613) ==13864== ==13864== 16 bytes in 1 blocks are definitely lost in loss record 11 of 34 ==13864== at 0x4A04A28: calloc (vg_replace_malloc.c:467) ==13864== by 0x3BB6839B8D: virAlloc (memory.c:102) ==13864== by 0x3BB68BA803: doRemoteOpen (remote_driver.c:912) ==13864== by 0x3BB68BCFCF: remoteOpen (remote_driver.c:1111) ==13864== by 0x3BB689A0FF: do_open (libvirt.c:1290) ==13864== by 0x3BB689ADD5: virConnectOpenAuth (libvirt.c:1545) ==13864== by 0x41F659: main (virsh.c:11613) ==13864== ==13864== 16 bytes in 1 blocks are definitely lost in loss record 12 of 34 ==13864== at 0x4A04A28: calloc (vg_replace_malloc.c:467) ==13864== by 0x3BB6839B8D: virAlloc (memory.c:102) ==13864== by 0x3BB68BA821: doRemoteOpen (remote_driver.c:917) ==13864== by 0x3BB68BCFCF: remoteOpen (remote_driver.c:1111) ==13864== by 0x3BB689A0FF: do_open (libvirt.c:1290) ==13864== by 0x3BB689ADD5: virConnectOpenAuth (libvirt.c:1545) ==13864== by 0x41F659: main (virsh.c:11613) Could you try libvirt-0.9.3-5.el6.x86_64 or upstream? it should be okay now. Thanks, Alex --- Additional comment from jyang on 2011-07-19 05:26:28 EDT --- (In reply to comment #5) > Hi Neil, > Thanks for your rapid reply and nice comments, I clearly know you mean now. > I have ever met this issue by using valgrind, I think it's the same issue: > https://bugzilla.redhat.com/attachment.cgi?id=487500&action=edit: Are you two really talking about the same thing? As far as I understand, Neil is talking about the fd leak of libvirtd, and the comments following of alex is talking about the memory leak. I can't see the problem said by Neil, the number of fds opened by libvirtd process doesn't increase when creating connection again and again. And the patch by Alex doesn't fix the problem from Neil if it exists. --- Additional comment from neil.uk on 2011-07-19 05:30:32 EDT --- Osier, As I said above. The problem is *not* in libvirtd. It is in the libvirt client libraries. You are looking at the wrong process. --- Additional comment from ajia on 2011-07-19 06:37:15 EDT --- (In reply to comment #6) > (In reply to comment #5) > > Hi Neil, > > Thanks for your rapid reply and nice comments, I clearly know you mean now. > > I have ever met this issue by using valgrind, I think it's the same issue: > > https://bugzilla.redhat.com/attachment.cgi?id=487500&action=edit: > > Are you two really talking about the same thing? As far as I understand, Neil > is talking about the fd leak of libvirtd, and the comments following of alex is > talking about the memory leak. Yeah, it's different. > > I can't see the problem said by Neil, the number of fds opened by libvirtd > process doesn't increase when creating connection again and again. > > And the patch by Alex doesn't fix the problem from Neil if it exists. Yeah, it's another issue. I will attach latest information in here, please help me check it. Osier, Thanks for your comment. Alex --- Additional comment from ajia on 2011-07-19 06:38:14 EDT --- (In reply to comment #7) > Osier, > > As I said above. The problem is *not* in libvirtd. It is in the libvirt client > libraries. > > You are looking at the wrong process. I see, I will attach latest result in here. Thanks, Alex --- Additional comment from ajia on 2011-07-19 06:42:40 EDT --- Created attachment 513759 [details] libvirt-0.9.1-1.el6.x86_64 remote driver unchanged, I haven't found any fds are opened, so it's okay for this version as Neil said. --- Additional comment from ajia on 2011-07-19 06:49:38 EDT --- Created attachment 513762 [details] libvirt-0.9.3-1.el6.x86_64 remote driver changed. I have found fds is opened firstly using con = libvirt.open(''), for instance: virNetSocketNew:107 : localAddr=0x7fffd09b6990 remoteAddr=0x7fffd09b6900 fd=3 errfd=-1 pid=0 and when I disconnect using con.close(), I haven't found any operation to close this fds (fd=3): 18:34:35.177: 27014: debug : virConnectClose:1320 : conn=0x9c0aa0 18:34:35.177: 27014: debug : virUnrefConnect:145 : unref connection 0x9c0aa0 1 18:34:35.177: 27014: debug : virReleaseConnect:94 : release connection 0x9c0aa0 18:34:35.177: 27014: debug : virNetMessageNew:44 : msg=0xa177d0 18:34:35.177: 27014: debug : virNetMessageEncodePayload:255 : Encode length as 28 18:34:35.177: 27014: debug : virNetClientIO:1006 : program=536903814 version=1 serial=3 proc=2 type=0 length=28 dispatch=(nil) 18:34:35.177: 27014: debug : virNetClientIO:1074 : We have the buck 0xa048d0 0xa048d0 18:34:35.177: 27014: debug : virNetSocketUpdateIOCallback:1083 : Watch not registered on socket 0xa02620 18:34:35.177: 27014: debug : virNetClientIOEventLoop:950 : Giving up the buck due to I/O error 0xa048d0 (nil) 18:34:35.177: 27014: debug : virNetSocketUpdateIOCallback:1083 : Watch not registered on socket 0xa02620 18:34:35.177: 27014: debug : virNetClientIO:1101 : All done with our call (nil) 0xa048d0 -1 when I open connection secondly: virNetSocketNew:107 : localAddr=0x7fffd09b6990 remoteAddr=0x7fffd09b6900 fd=6 errfd=-1 pid=0 Note: open a new fds (fd=6)at above. disconnect secondly: same as above. we indeed haven't closed fds when disconnect libvirt client in this version. --- Additional comment from ajia on 2011-07-19 06:52:32 EDT --- Created attachment 513763 [details] libvirt-0.9.3-5.el6.x86_64 It's okay for libvirt-0.9.3-5.el6.x86_64, when I disconnect libvirt client, I can see this: >>> con.close() 18:11:37.250: 26351: debug : virConnectClose:1320 : conn=0x11bcaa0 18:11:37.250: 26351: debug : virUnrefConnect:145 : unref connection 0x11bcaa0 1 18:11:37.250: 26351: debug : virReleaseConnect:94 : release connection 0x11bcaa0 18:11:37.250: 26351: debug : virNetMessageNew:44 : msg=0x12137d0 18:11:37.250: 26351: debug : virNetMessageEncodePayload:255 : Encode length as 28 18:11:37.250: 26351: debug : virNetClientIO:1006 : program=536903814 version=1 serial=3 proc=2 type=0 length=28 dispatch=(nil) 18:11:37.250: 26351: debug : virNetClientIO:1074 : We have the buck 0x11bce90 0x11bce90 18:11:37.250: 26351: debug : virNetSocketUpdateIOCallback:1083 : Watch not registered on socket 0x11fe810 18:11:37.250: 26351: debug : virNetMessageDecodeLength:125 : Got length, now need 28 total (24 more) 18:11:37.250: 26351: debug : virNetClientCallDispatch:631 : Incoming message prog 536903814 vers 1 proc 2 type 1 status 0 serial 3 18:11:37.250: 26351: debug : virNetClientIOEventLoop:929 : Giving up the buck 0x11bce90 (nil) 18:11:37.250: 26351: debug : virNetSocketUpdateIOCallback:1083 : Watch not registered on socket 0x11fe810 18:11:37.250: 26351: debug : virNetClientIO:1101 : All done with our call (nil) 0x11bce90 0 18:11:37.250: 26351: debug : virNetSocketRemoveIOCallback:1093 : Watch not registered on socket 0x11fe810 18:11:37.250: 26351: debug : virNetSocketFree:627 : sock=0x11fe810 fd=3 virNetSocketFree is closing this fd, and the fd is always 3 when I open and close connection twice.
I have confirmed that this bug is not present in libvirt-0.9.4-11.el6. While it probably did exist in 0.9.3, we have had a lot of RPC fixes in the meantime, such that this does not affect RHEL. Given the comments that this appeared to be fixed as soon as libvirt-0.9.3-5.el6, I'm assuming that this commit may have been the one that made the difference: commit 155cfcf5a39ea186312725acc416e0065b65c56f Author: Daniel P. Berrange <berrange> Date: Fri Jul 8 12:54:29 2011 +0100 Fix sending of reply to final RPC message The dispatch for the CLOSE RPC call was invoking the method virNetServerClientClose(). This caused the client connection to be immediately terminated. This meant the reply to the final RPC message was never sent. Prior to the RPC rewrite we merely flagged the connection for closing, and actually closed it when the next RPC call dispatch had completed.
tested with libvirt-0.9.4-12.el6.x86_64 Steps: [terminal 1] # service libvirtd stop # python >>> import libvirt >>> con=libvirt.open('') >>> con.close() 0 >>> con=libvirt.open('') [terminal 2] root@dhcp-93-106 ~]# ll /proc/$(pidof libvirtd)/fd total 0 lrwx------. 1 root root 64 Sep 22 09:22 0 -> /dev/null lrwx------. 1 root root 64 Sep 22 09:22 1 -> /dev/null lr-x------. 1 root root 64 Sep 22 09:21 10 -> socket:[21308] lrwx------. 1 root root 64 Sep 22 09:22 11 -> socket:[21310] lrwx------. 1 root root 64 Sep 22 09:22 12 -> socket:[21312] lrwx------. 1 root root 64 Sep 22 09:22 13 -> socket:[21317] l-wx------. 1 root root 64 Sep 22 09:22 14 -> /proc/mtrr lrwx------. 1 root root 64 Sep 22 09:22 15 -> /sys/devices/pci0000:00/0000:00:1e.0/0000:03:00.0/config lrwx------. 1 root root 64 Sep 22 09:22 16 -> socket:[21451] lrwx------. 1 root root 64 Sep 22 09:22 17 -> socket:[21791] lrwx------. 1 root root 64 Sep 22 09:22 18 -> socket:[21792] lrwx------. 1 root root 64 Sep 22 09:22 19 -> socket:[21793] lrwx------. 1 root root 64 Sep 22 09:22 2 -> /dev/null lr-x------. 1 root root 64 Sep 22 09:22 3 -> /dev/urandom lrwx------. 1 root root 64 Sep 22 09:22 4 -> socket:[21304] lr-x------. 1 root root 64 Sep 22 09:22 5 -> pipe:[21305] l-wx------. 1 root root 64 Sep 22 09:22 6 -> pipe:[21305] lr-x------. 1 root root 64 Sep 22 09:22 7 -> pipe:[21306] l-wx------. 1 root root 64 Sep 22 09:22 8 -> pipe:[21306] lrwx------. 1 root root 64 Sep 22 09:22 9 -> socket:[21307] 10 socket files , and 4 pipe files [root@dhcp-93-106 ~]# ll /proc/$(pidof libvirtd)/fd total 0 lrwx------. 1 root root 64 Sep 22 09:22 0 -> /dev/null lrwx------. 1 root root 64 Sep 22 09:22 1 -> /dev/null lr-x------. 1 root root 64 Sep 22 09:21 10 -> socket:[21308] lrwx------. 1 root root 64 Sep 22 09:22 11 -> socket:[21310] lrwx------. 1 root root 64 Sep 22 09:22 12 -> socket:[21312] lrwx------. 1 root root 64 Sep 22 09:22 13 -> socket:[21317] l-wx------. 1 root root 64 Sep 22 09:22 14 -> /proc/mtrr lrwx------. 1 root root 64 Sep 22 09:22 15 -> /sys/devices/pci0000:00/0000:00:1e.0/0000:03:00.0/config lrwx------. 1 root root 64 Sep 22 09:22 16 -> socket:[21451] lrwx------. 1 root root 64 Sep 22 09:22 2 -> /dev/null lr-x------. 1 root root 64 Sep 22 09:22 3 -> /dev/urandom lrwx------. 1 root root 64 Sep 22 09:22 4 -> socket:[21304] lr-x------. 1 root root 64 Sep 22 09:22 5 -> pipe:[21305] l-wx------. 1 root root 64 Sep 22 09:22 6 -> pipe:[21305] lr-x------. 1 root root 64 Sep 22 09:22 7 -> pipe:[21306] l-wx------. 1 root root 64 Sep 22 09:22 8 -> pipe:[21306] lrwx------. 1 root root 64 Sep 22 09:22 9 -> socket:[21307] 7 socket files , and 4 pipe files [root@dhcp-93-106 ~]# ll /proc/$(pidof libvirtd)/fd total 0 lrwx------. 1 root root 64 Sep 22 09:22 0 -> /dev/null lrwx------. 1 root root 64 Sep 22 09:22 1 -> /dev/null lr-x------. 1 root root 64 Sep 22 09:21 10 -> socket:[21308] lrwx------. 1 root root 64 Sep 22 09:22 11 -> socket:[21310] lrwx------. 1 root root 64 Sep 22 09:22 12 -> socket:[21312] lrwx------. 1 root root 64 Sep 22 09:22 13 -> socket:[21317] l-wx------. 1 root root 64 Sep 22 09:22 14 -> /proc/mtrr lrwx------. 1 root root 64 Sep 22 09:22 15 -> /sys/devices/pci0000:00/0000:00:1e.0/0000:03:00.0/config lrwx------. 1 root root 64 Sep 22 09:22 16 -> socket:[21451] lrwx------. 1 root root 64 Sep 22 09:22 17 -> socket:[22138] lrwx------. 1 root root 64 Sep 22 09:22 18 -> socket:[22139] lrwx------. 1 root root 64 Sep 22 09:22 19 -> socket:[22140] lrwx------. 1 root root 64 Sep 22 09:22 2 -> /dev/null lr-x------. 1 root root 64 Sep 22 09:22 3 -> /dev/urandom lrwx------. 1 root root 64 Sep 22 09:22 4 -> socket:[21304] lr-x------. 1 root root 64 Sep 22 09:22 5 -> pipe:[21305] l-wx------. 1 root root 64 Sep 22 09:22 6 -> pipe:[21305] lr-x------. 1 root root 64 Sep 22 09:22 7 -> pipe:[21306] l-wx------. 1 root root 64 Sep 22 09:22 8 -> pipe:[21306] lrwx------. 1 root root 64 Sep 22 09:22 9 -> socket:[21307] [root@dhcp-93-106 ~]# rpm -q libvirt qemu-kvm kernel 10 socket files and 4 pipe files, the old socket files are reused . So set bug status to VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1513.html