Description of problem: Changes to the remote driver have introduced a file descriptor leak on client libvirt socket connections. Repeated opening and closing of libvirt connections will cause the client program to run out of file descriptors. Version-Release number of selected component (if applicable): v0.9.3 How reproducible: Using ruby-libvirt. (Although you get the same results with the python library). Steps to Reproduce: 1. Running irb, require 'libvirt' 2. conn=Libvirt.open('') 3. conn.close 4. conn=Libvirt.open('') Actual results: Checking the process fd list, the socket and pipe set opened for a connection are not closed, nor are they reused by the second open (which creates another socket and pipe set) Expected results: on v0.9.2 and below the socket and pipe set disappear when closed. Additional info: Debugging this as far as I can it looks like the 'localUses' reference counter in remote_driver.c is getting out of sync somehow. The lack of VIR_DEBUG statements in the close sequence makes tracking it a little difficult. Certainly virReleaseConnect is being called correctly so it's in the driver somewhere.
Hi Neil, I can't reproduce this bug on libvirt-0.9.1-1.el6.x86_64, the following is my steps, please correct me if wrong: 1. check current open fd of libvirtd # ll /proc/$(pidof libvirtd)/fd total 0 lr-x------. 1 root root 64 Jul 19 13:50 0 -> /dev/null l-wx------. 1 root root 64 Jul 19 13:50 1 -> /dev/null lrwx------. 1 root root 64 Jul 19 13:50 10 -> socket:[17482635] lrwx------. 1 root root 64 Jul 19 13:50 11 -> socket:[17482636] lrwx------. 1 root root 64 Jul 19 13:50 12 -> socket:[17482638] lrwx------. 1 root root 64 Jul 19 13:50 13 -> socket:[17482641] lrwx------. 1 root root 64 Jul 19 13:50 14 -> socket:[17482640] l-wx------. 1 root root 64 Jul 19 13:50 15 -> /proc/mtrr lrwx------. 1 root root 64 Jul 19 13:50 16 -> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/config lrwx------. 1 root root 64 Jul 19 13:50 17 -> socket:[17482796] lrwx------. 1 root root 64 Jul 19 13:50 18 -> socket:[17503915] lrwx------. 1 root root 64 Jul 19 13:50 19 -> socket:[17503916] l-wx------. 1 root root 64 Jul 19 13:50 2 -> /dev/null lrwx------. 1 root root 64 Jul 19 13:50 20 -> socket:[17503917] lr-x------. 1 root root 64 Jul 19 13:50 3 -> /dev/urandom lrwx------. 1 root root 64 Jul 19 13:50 4 -> socket:[17482648] lr-x------. 1 root root 64 Jul 19 13:50 5 -> pipe:[17482633] l-wx------. 1 root root 64 Jul 19 13:50 6 -> pipe:[17482633] lr-x------. 1 root root 64 Jul 19 13:50 7 -> pipe:[17482634] l-wx------. 1 root root 64 Jul 19 13:50 8 -> pipe:[17482634] l-wx------. 1 root root 64 Jul 19 13:50 9 -> /var/log/libvirt/libvirtd.log Note: socket total: 10 pipe: 4 2. open another terminal then run interactive python, and close connection of libvirtd: # python Python 2.6.6 (r266:84292, Apr 11 2011, 15:50:32) [GCC 4.4.4 20100726 (Red Hat 4.4.4-13)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import libvirt >>> con = libvirt.open('') >>> con.close() 0 3. repeat 1: # ll /proc/$(pidof libvirtd)/fd total 0 lr-x------. 1 root root 64 Jul 19 13:50 0 -> /dev/null l-wx------. 1 root root 64 Jul 19 13:50 1 -> /dev/null lrwx------. 1 root root 64 Jul 19 13:50 10 -> socket:[17482635] lrwx------. 1 root root 64 Jul 19 13:50 11 -> socket:[17482636] lrwx------. 1 root root 64 Jul 19 13:50 12 -> socket:[17482638] lrwx------. 1 root root 64 Jul 19 13:50 13 -> socket:[17482641] lrwx------. 1 root root 64 Jul 19 13:50 14 -> socket:[17482640] l-wx------. 1 root root 64 Jul 19 13:50 15 -> /proc/mtrr lrwx------. 1 root root 64 Jul 19 13:50 16 -> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/config lrwx------. 1 root root 64 Jul 19 13:50 17 -> socket:[17482796] l-wx------. 1 root root 64 Jul 19 13:50 2 -> /dev/null lr-x------. 1 root root 64 Jul 19 13:50 3 -> /dev/urandom lrwx------. 1 root root 64 Jul 19 13:50 4 -> socket:[17482648] lr-x------. 1 root root 64 Jul 19 13:50 5 -> pipe:[17482633] l-wx------. 1 root root 64 Jul 19 13:50 6 -> pipe:[17482633] lr-x------. 1 root root 64 Jul 19 13:50 7 -> pipe:[17482634] l-wx------. 1 root root 64 Jul 19 13:50 8 -> pipe:[17482634] l-wx------. 1 root root 64 Jul 19 13:50 9 -> /var/log/libvirt/libvirtd.log Note: socket total: 7 pipe: 4 On v0.9.1 the socket and pipe set still exists when closed connection, so I haven't reproduced this issue. In addition, libvirtd is a daemon service programming, if libvirtd is running, I think socket may not disappear. However, fds leak indeed exist, I cant catch it using valgrind: ==1254== ==1254== FILE DESCRIPTORS: 6 open at exit. ==1254== Open file descriptor 5: ==1254== at 0x30736D9D47: pipe2 (syscall-template.S:82) ==1254== by 0x4DD6267: rpl_pipe2 (pipe2.c:61) ==1254== by 0x4C4C1C5: virEventPollInit (event_poll.c:648) ==1254== by 0x4C4AA94: virEventRegisterDefaultImpl (event.c:208) ==1254== by 0x42150C: main (virsh.c:13790) ==1254== ==1254== Open file descriptor 4: ==1254== at 0x30736D9D47: pipe2 (syscall-template.S:82) ==1254== by 0x4DD6267: rpl_pipe2 (pipe2.c:61) ==1254== by 0x4C4C1C5: virEventPollInit (event_poll.c:648) ==1254== by 0x4C4AA94: virEventRegisterDefaultImpl (event.c:208) ==1254== by 0x42150C: main (virsh.c:13790) ==1254== I'm trying figure out reason of leak, which is leaded by in virEventPollInit function from ./src/util/event_poll.c: 640 int virEventPollInit(void) 641 { 642 if (virMutexInit(&eventLoop.lock) < 0) { 643 virReportSystemError(errno, "%s", 644 _("Unable to initialize mutex")); 645 return -1; 646 } 647 648 if (pipe2(eventLoop.wakeupfd, O_CLOEXEC | O_NONBLOCK) < 0) { 649 virReportSystemError(errno, "%s", 650 _("Unable to setup wakeup pipe")); 651 return -1; 652 } Need to close created pipe in above codes whether failure or not. I will attach a patch in here, and wait for developer review, but it seems this is another issue. anyway, please developers help us confirm this. Alex
Created attachment 513716 [details] util-fix-fds-leak.patch
No, you won't be able to. It appears as a regression in v0.9.3 only due to reorganisation in the remote driver code (or something) 0.9.2 and below behave correctly. And it is not 'libvirtd' that is leaking, it is the client program (the process running either the python interpreter, or the ruby interpreter in the case of the example above). I perhaps didn't make the latter clear in my report, for which I apologise.
And just to be absolutely clear the leak is not in the interpreter but in the libvirt client shared libraries pulled in by 'require libvirt' or 'import libvirt'. I run my interpreter with LIBVIRT_DEBUG=1 irb so you can see when the library gets activated.
Hi Neil, Thanks for your rapid reply and nice comments, I clearly know you mean now. I have ever met this issue by using valgrind, I think it's the same issue: https://bugzilla.redhat.com/attachment.cgi?id=487500&action=edit: ==13864== 10 bytes in 1 blocks are definitely lost in loss record 10 of 34 ==13864== at 0x4A05FDE: malloc (vg_replace_malloc.c:236) ==13864== by 0x308587FD91: strdup (in /lib64/libc-2.12.so) ==13864== by 0x3BB68BA0A3: doRemoteOpen (remote_driver.c:451) ==13864== by 0x3BB68BCFCF: remoteOpen (remote_driver.c:1111) ==13864== by 0x3BB689A0FF: do_open (libvirt.c:1290) ==13864== by 0x3BB689ADD5: virConnectOpenAuth (libvirt.c:1545) ==13864== by 0x41F659: main (virsh.c:11613) ==13864== ==13864== 16 bytes in 1 blocks are definitely lost in loss record 11 of 34 ==13864== at 0x4A04A28: calloc (vg_replace_malloc.c:467) ==13864== by 0x3BB6839B8D: virAlloc (memory.c:102) ==13864== by 0x3BB68BA803: doRemoteOpen (remote_driver.c:912) ==13864== by 0x3BB68BCFCF: remoteOpen (remote_driver.c:1111) ==13864== by 0x3BB689A0FF: do_open (libvirt.c:1290) ==13864== by 0x3BB689ADD5: virConnectOpenAuth (libvirt.c:1545) ==13864== by 0x41F659: main (virsh.c:11613) ==13864== ==13864== 16 bytes in 1 blocks are definitely lost in loss record 12 of 34 ==13864== at 0x4A04A28: calloc (vg_replace_malloc.c:467) ==13864== by 0x3BB6839B8D: virAlloc (memory.c:102) ==13864== by 0x3BB68BA821: doRemoteOpen (remote_driver.c:917) ==13864== by 0x3BB68BCFCF: remoteOpen (remote_driver.c:1111) ==13864== by 0x3BB689A0FF: do_open (libvirt.c:1290) ==13864== by 0x3BB689ADD5: virConnectOpenAuth (libvirt.c:1545) ==13864== by 0x41F659: main (virsh.c:11613) Could you try libvirt-0.9.3-5.el6.x86_64 or upstream? it should be okay now. Thanks, Alex
(In reply to comment #5) > Hi Neil, > Thanks for your rapid reply and nice comments, I clearly know you mean now. > I have ever met this issue by using valgrind, I think it's the same issue: > https://bugzilla.redhat.com/attachment.cgi?id=487500&action=edit: Are you two really talking about the same thing? As far as I understand, Neil is talking about the fd leak of libvirtd, and the comments following of alex is talking about the memory leak. I can't see the problem said by Neil, the number of fds opened by libvirtd process doesn't increase when creating connection again and again. And the patch by Alex doesn't fix the problem from Neil if it exists.
Osier, As I said above. The problem is *not* in libvirtd. It is in the libvirt client libraries. You are looking at the wrong process.
(In reply to comment #6) > (In reply to comment #5) > > Hi Neil, > > Thanks for your rapid reply and nice comments, I clearly know you mean now. > > I have ever met this issue by using valgrind, I think it's the same issue: > > https://bugzilla.redhat.com/attachment.cgi?id=487500&action=edit: > > Are you two really talking about the same thing? As far as I understand, Neil > is talking about the fd leak of libvirtd, and the comments following of alex is > talking about the memory leak. Yeah, it's different. > > I can't see the problem said by Neil, the number of fds opened by libvirtd > process doesn't increase when creating connection again and again. > > And the patch by Alex doesn't fix the problem from Neil if it exists. Yeah, it's another issue. I will attach latest information in here, please help me check it. Osier, Thanks for your comment. Alex
(In reply to comment #7) > Osier, > > As I said above. The problem is *not* in libvirtd. It is in the libvirt client > libraries. > > You are looking at the wrong process. I see, I will attach latest result in here. Thanks, Alex
Created attachment 513759 [details] libvirt-0.9.1-1.el6.x86_64 remote driver unchanged, I haven't found any fds are opened, so it's okay for this version as Neil said.
Created attachment 513762 [details] libvirt-0.9.3-1.el6.x86_64 remote driver changed. I have found fds is opened firstly using con = libvirt.open(''), for instance: virNetSocketNew:107 : localAddr=0x7fffd09b6990 remoteAddr=0x7fffd09b6900 fd=3 errfd=-1 pid=0 and when I disconnect using con.close(), I haven't found any operation to close this fds (fd=3): 18:34:35.177: 27014: debug : virConnectClose:1320 : conn=0x9c0aa0 18:34:35.177: 27014: debug : virUnrefConnect:145 : unref connection 0x9c0aa0 1 18:34:35.177: 27014: debug : virReleaseConnect:94 : release connection 0x9c0aa0 18:34:35.177: 27014: debug : virNetMessageNew:44 : msg=0xa177d0 18:34:35.177: 27014: debug : virNetMessageEncodePayload:255 : Encode length as 28 18:34:35.177: 27014: debug : virNetClientIO:1006 : program=536903814 version=1 serial=3 proc=2 type=0 length=28 dispatch=(nil) 18:34:35.177: 27014: debug : virNetClientIO:1074 : We have the buck 0xa048d0 0xa048d0 18:34:35.177: 27014: debug : virNetSocketUpdateIOCallback:1083 : Watch not registered on socket 0xa02620 18:34:35.177: 27014: debug : virNetClientIOEventLoop:950 : Giving up the buck due to I/O error 0xa048d0 (nil) 18:34:35.177: 27014: debug : virNetSocketUpdateIOCallback:1083 : Watch not registered on socket 0xa02620 18:34:35.177: 27014: debug : virNetClientIO:1101 : All done with our call (nil) 0xa048d0 -1 when I open connection secondly: virNetSocketNew:107 : localAddr=0x7fffd09b6990 remoteAddr=0x7fffd09b6900 fd=6 errfd=-1 pid=0 Note: open a new fds (fd=6)at above. disconnect secondly: same as above. we indeed haven't closed fds when disconnect libvirt client in this version.
Created attachment 513763 [details] libvirt-0.9.3-5.el6.x86_64 It's okay for libvirt-0.9.3-5.el6.x86_64, when I disconnect libvirt client, I can see this: >>> con.close() 18:11:37.250: 26351: debug : virConnectClose:1320 : conn=0x11bcaa0 18:11:37.250: 26351: debug : virUnrefConnect:145 : unref connection 0x11bcaa0 1 18:11:37.250: 26351: debug : virReleaseConnect:94 : release connection 0x11bcaa0 18:11:37.250: 26351: debug : virNetMessageNew:44 : msg=0x12137d0 18:11:37.250: 26351: debug : virNetMessageEncodePayload:255 : Encode length as 28 18:11:37.250: 26351: debug : virNetClientIO:1006 : program=536903814 version=1 serial=3 proc=2 type=0 length=28 dispatch=(nil) 18:11:37.250: 26351: debug : virNetClientIO:1074 : We have the buck 0x11bce90 0x11bce90 18:11:37.250: 26351: debug : virNetSocketUpdateIOCallback:1083 : Watch not registered on socket 0x11fe810 18:11:37.250: 26351: debug : virNetMessageDecodeLength:125 : Got length, now need 28 total (24 more) 18:11:37.250: 26351: debug : virNetClientCallDispatch:631 : Incoming message prog 536903814 vers 1 proc 2 type 1 status 0 serial 3 18:11:37.250: 26351: debug : virNetClientIOEventLoop:929 : Giving up the buck 0x11bce90 (nil) 18:11:37.250: 26351: debug : virNetSocketUpdateIOCallback:1083 : Watch not registered on socket 0x11fe810 18:11:37.250: 26351: debug : virNetClientIO:1101 : All done with our call (nil) 0x11bce90 0 18:11:37.250: 26351: debug : virNetSocketRemoveIOCallback:1093 : Watch not registered on socket 0x11fe810 18:11:37.250: 26351: debug : virNetSocketFree:627 : sock=0x11fe810 fd=3 virNetSocketFree is closing this fd, and the fd is always 3 when I open and close connection twice.
Given comment 12, the root cause of the problem has been fixed by patches backported into 0.9.3-5.el6 from upstream, so I'm closing this as completed upstream. At any rate, I could not reproduce it with 0.9.4.