Bug 722648 - Libvirt connections leak File Descriptors in v0.9.3
Summary: Libvirt connections leak File Descriptors in v0.9.3
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Virtualization Tools
Classification: Community
Component: libvirt
Version: unspecified
Hardware: All
OS: All
unspecified
urgent
Target Milestone: ---
Assignee: Osier Yang
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 731157
TreeView+ depends on / blocked
 
Reported: 2011-07-16 06:57 UTC by Neil Wilson
Modified: 2014-03-27 01:02 UTC (History)
6 users (show)

Fixed In Version:
Clone Of:
: 731157 (view as bug list)
Environment:
Last Closed: 2011-09-20 22:09:49 UTC
Embargoed:


Attachments (Terms of Use)
util-fix-fds-leak.patch (2.03 KB, text/plain)
2011-07-19 06:46 UTC, Alex Jia
no flags Details
libvirt-0.9.1-1.el6.x86_64 (11.72 KB, text/plain)
2011-07-19 10:42 UTC, Alex Jia
no flags Details
libvirt-0.9.3-1.el6.x86_64 (16.90 KB, text/plain)
2011-07-19 10:49 UTC, Alex Jia
no flags Details
libvirt-0.9.3-5.el6.x86_64 (17.73 KB, text/plain)
2011-07-19 10:52 UTC, Alex Jia
no flags Details

Description Neil Wilson 2011-07-16 06:57:05 UTC
Description of problem:

Changes to the remote driver have introduced a file descriptor leak on client libvirt socket connections. Repeated opening and closing of libvirt connections will cause the client program to run out of file descriptors. 

Version-Release number of selected component (if applicable):

v0.9.3


How reproducible:

Using ruby-libvirt. (Although you get the same results with the python library).


Steps to Reproduce:
1. Running irb, require 'libvirt'
2. conn=Libvirt.open('')
3. conn.close
4. conn=Libvirt.open('')
  
Actual results:

Checking the process fd list, the socket and pipe set opened for a connection are not closed, nor are they reused by the second open (which creates another socket and pipe set)


Expected results:

on v0.9.2 and below the socket and pipe set disappear when closed.


Additional info:

Debugging this as far as I can it looks like the 'localUses' reference counter in remote_driver.c is getting out of sync somehow. The lack of VIR_DEBUG statements in the close sequence makes tracking it a little difficult.

Certainly virReleaseConnect is being called correctly so it's in the driver somewhere.

Comment 1 Alex Jia 2011-07-19 06:37:20 UTC
Hi Neil,

I can't reproduce this bug on libvirt-0.9.1-1.el6.x86_64, the following is my steps, please correct me if wrong:

1. check current open fd of libvirtd
# ll /proc/$(pidof libvirtd)/fd
total 0
lr-x------. 1 root root 64 Jul 19 13:50 0 -> /dev/null
l-wx------. 1 root root 64 Jul 19 13:50 1 -> /dev/null
lrwx------. 1 root root 64 Jul 19 13:50 10 -> socket:[17482635]
lrwx------. 1 root root 64 Jul 19 13:50 11 -> socket:[17482636]
lrwx------. 1 root root 64 Jul 19 13:50 12 -> socket:[17482638]
lrwx------. 1 root root 64 Jul 19 13:50 13 -> socket:[17482641]
lrwx------. 1 root root 64 Jul 19 13:50 14 -> socket:[17482640]
l-wx------. 1 root root 64 Jul 19 13:50 15 -> /proc/mtrr
lrwx------. 1 root root 64 Jul 19 13:50 16 -> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/config
lrwx------. 1 root root 64 Jul 19 13:50 17 -> socket:[17482796]
lrwx------. 1 root root 64 Jul 19 13:50 18 -> socket:[17503915]
lrwx------. 1 root root 64 Jul 19 13:50 19 -> socket:[17503916]
l-wx------. 1 root root 64 Jul 19 13:50 2 -> /dev/null
lrwx------. 1 root root 64 Jul 19 13:50 20 -> socket:[17503917]
lr-x------. 1 root root 64 Jul 19 13:50 3 -> /dev/urandom
lrwx------. 1 root root 64 Jul 19 13:50 4 -> socket:[17482648]
lr-x------. 1 root root 64 Jul 19 13:50 5 -> pipe:[17482633]
l-wx------. 1 root root 64 Jul 19 13:50 6 -> pipe:[17482633]
lr-x------. 1 root root 64 Jul 19 13:50 7 -> pipe:[17482634]
l-wx------. 1 root root 64 Jul 19 13:50 8 -> pipe:[17482634]
l-wx------. 1 root root 64 Jul 19 13:50 9 -> /var/log/libvirt/libvirtd.log

Note: socket total: 10  pipe: 4

2. open another terminal then run interactive python, and close connection of libvirtd:
# python
Python 2.6.6 (r266:84292, Apr 11 2011, 15:50:32) 
[GCC 4.4.4 20100726 (Red Hat 4.4.4-13)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import libvirt
>>> con = libvirt.open('')
>>> con.close()
0

3. repeat 1:
# ll /proc/$(pidof libvirtd)/fd
total 0
lr-x------. 1 root root 64 Jul 19 13:50 0 -> /dev/null
l-wx------. 1 root root 64 Jul 19 13:50 1 -> /dev/null
lrwx------. 1 root root 64 Jul 19 13:50 10 -> socket:[17482635]
lrwx------. 1 root root 64 Jul 19 13:50 11 -> socket:[17482636]
lrwx------. 1 root root 64 Jul 19 13:50 12 -> socket:[17482638]
lrwx------. 1 root root 64 Jul 19 13:50 13 -> socket:[17482641]
lrwx------. 1 root root 64 Jul 19 13:50 14 -> socket:[17482640]
l-wx------. 1 root root 64 Jul 19 13:50 15 -> /proc/mtrr
lrwx------. 1 root root 64 Jul 19 13:50 16 -> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/config
lrwx------. 1 root root 64 Jul 19 13:50 17 -> socket:[17482796]
l-wx------. 1 root root 64 Jul 19 13:50 2 -> /dev/null
lr-x------. 1 root root 64 Jul 19 13:50 3 -> /dev/urandom
lrwx------. 1 root root 64 Jul 19 13:50 4 -> socket:[17482648]
lr-x------. 1 root root 64 Jul 19 13:50 5 -> pipe:[17482633]
l-wx------. 1 root root 64 Jul 19 13:50 6 -> pipe:[17482633]
lr-x------. 1 root root 64 Jul 19 13:50 7 -> pipe:[17482634]
l-wx------. 1 root root 64 Jul 19 13:50 8 -> pipe:[17482634]
l-wx------. 1 root root 64 Jul 19 13:50 9 -> /var/log/libvirt/libvirtd.log

Note: socket total: 7  pipe: 4

On v0.9.1 the socket and pipe set still exists when closed connection,  so I haven't reproduced this issue.

In addition, libvirtd is a daemon service programming, if libvirtd is running, I think socket may not disappear.

However, fds leak indeed exist, I cant catch it using valgrind:
==1254==
==1254== FILE DESCRIPTORS: 6 open at exit.
==1254== Open file descriptor 5:
==1254==    at 0x30736D9D47: pipe2 (syscall-template.S:82)
==1254==    by 0x4DD6267: rpl_pipe2 (pipe2.c:61)
==1254==    by 0x4C4C1C5: virEventPollInit (event_poll.c:648)
==1254==    by 0x4C4AA94: virEventRegisterDefaultImpl (event.c:208)
==1254==    by 0x42150C: main (virsh.c:13790)
==1254==
==1254== Open file descriptor 4:
==1254==    at 0x30736D9D47: pipe2 (syscall-template.S:82)
==1254==    by 0x4DD6267: rpl_pipe2 (pipe2.c:61)
==1254==    by 0x4C4C1C5: virEventPollInit (event_poll.c:648)
==1254==    by 0x4C4AA94: virEventRegisterDefaultImpl (event.c:208)
==1254==    by 0x42150C: main (virsh.c:13790)
==1254==

I'm trying figure out reason of leak, which is leaded by in virEventPollInit function from ./src/util/event_poll.c:

640 int virEventPollInit(void)
641 {
642     if (virMutexInit(&eventLoop.lock) < 0) {
643         virReportSystemError(errno, "%s",
644                              _("Unable to initialize mutex"));
645         return -1;
646     }
647 
648     if (pipe2(eventLoop.wakeupfd, O_CLOEXEC | O_NONBLOCK) < 0) {
649         virReportSystemError(errno, "%s",
650                              _("Unable to setup wakeup pipe"));
651         return -1;
652     }

Need to close created pipe in above codes whether failure or not. I will attach a patch in here, and wait for developer review, but it seems this is another issue. anyway, please developers help us confirm this.


Alex

Comment 2 Alex Jia 2011-07-19 06:46:23 UTC
Created attachment 513716 [details]
util-fix-fds-leak.patch

Comment 3 Neil Wilson 2011-07-19 07:03:44 UTC
No, you won't be able to.

It appears as a regression in v0.9.3 only due to reorganisation in the remote driver code (or something)

0.9.2 and below behave correctly.

And it is not 'libvirtd' that is leaking, it is the client program (the process running either the python interpreter, or the ruby interpreter in the case of the example above). 

I perhaps didn't make the latter clear in my report, for which I apologise.

Comment 4 Neil Wilson 2011-07-19 07:07:28 UTC
And just to be absolutely clear the leak is not in the interpreter but in the libvirt client shared libraries pulled in by 'require libvirt' or 'import libvirt'.

I run my interpreter with

LIBVIRT_DEBUG=1 irb

so you can see when the library gets activated.

Comment 5 Alex Jia 2011-07-19 07:21:45 UTC
Hi Neil,
Thanks for your rapid reply and nice comments, I clearly know you mean now.
I have ever met this issue by using valgrind, I think it's the same issue:
https://bugzilla.redhat.com/attachment.cgi?id=487500&action=edit:

==13864== 10 bytes in 1 blocks are definitely lost in loss record 10 of 34
==13864==    at 0x4A05FDE: malloc (vg_replace_malloc.c:236)
==13864==    by 0x308587FD91: strdup (in /lib64/libc-2.12.so)
==13864==    by 0x3BB68BA0A3: doRemoteOpen (remote_driver.c:451)
==13864==    by 0x3BB68BCFCF: remoteOpen (remote_driver.c:1111)
==13864==    by 0x3BB689A0FF: do_open (libvirt.c:1290)
==13864==    by 0x3BB689ADD5: virConnectOpenAuth (libvirt.c:1545)
==13864==    by 0x41F659: main (virsh.c:11613)
==13864== 
==13864== 16 bytes in 1 blocks are definitely lost in loss record 11 of 34
==13864==    at 0x4A04A28: calloc (vg_replace_malloc.c:467)
==13864==    by 0x3BB6839B8D: virAlloc (memory.c:102)
==13864==    by 0x3BB68BA803: doRemoteOpen (remote_driver.c:912)
==13864==    by 0x3BB68BCFCF: remoteOpen (remote_driver.c:1111)
==13864==    by 0x3BB689A0FF: do_open (libvirt.c:1290)
==13864==    by 0x3BB689ADD5: virConnectOpenAuth (libvirt.c:1545)
==13864==    by 0x41F659: main (virsh.c:11613)
==13864== 
==13864== 16 bytes in 1 blocks are definitely lost in loss record 12 of 34
==13864==    at 0x4A04A28: calloc (vg_replace_malloc.c:467)
==13864==    by 0x3BB6839B8D: virAlloc (memory.c:102)
==13864==    by 0x3BB68BA821: doRemoteOpen (remote_driver.c:917)
==13864==    by 0x3BB68BCFCF: remoteOpen (remote_driver.c:1111)
==13864==    by 0x3BB689A0FF: do_open (libvirt.c:1290)
==13864==    by 0x3BB689ADD5: virConnectOpenAuth (libvirt.c:1545)
==13864==    by 0x41F659: main (virsh.c:11613)


Could you try libvirt-0.9.3-5.el6.x86_64 or upstream? it should be okay now.

Thanks,
Alex

Comment 6 Osier Yang 2011-07-19 09:26:28 UTC
(In reply to comment #5)
> Hi Neil,
> Thanks for your rapid reply and nice comments, I clearly know you mean now.
> I have ever met this issue by using valgrind, I think it's the same issue:
> https://bugzilla.redhat.com/attachment.cgi?id=487500&action=edit:

Are you two really talking about the same thing? As far as I understand, Neil is talking about the fd leak of libvirtd, and the comments following of alex is talking about the memory leak.

I can't see the problem said by Neil, the number of fds opened by libvirtd process doesn't increase when creating connection again and again.

And the patch by Alex doesn't fix the problem from Neil if it exists.

Comment 7 Neil Wilson 2011-07-19 09:30:32 UTC
Osier,

As I said above. The problem is *not* in libvirtd. It is in the libvirt client libraries.

You are looking at the wrong process.

Comment 8 Alex Jia 2011-07-19 10:37:15 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > Hi Neil,
> > Thanks for your rapid reply and nice comments, I clearly know you mean now.
> > I have ever met this issue by using valgrind, I think it's the same issue:
> > https://bugzilla.redhat.com/attachment.cgi?id=487500&action=edit:
> 
> Are you two really talking about the same thing? As far as I understand, Neil
> is talking about the fd leak of libvirtd, and the comments following of alex is
> talking about the memory leak.
Yeah, it's different.
> 
> I can't see the problem said by Neil, the number of fds opened by libvirtd
> process doesn't increase when creating connection again and again.
> 
> And the patch by Alex doesn't fix the problem from Neil if it exists.

Yeah, it's another issue.

I will attach latest information in here, please help me check it.

Osier, Thanks for your comment.

Alex

Comment 9 Alex Jia 2011-07-19 10:38:14 UTC
(In reply to comment #7)
> Osier,
> 
> As I said above. The problem is *not* in libvirtd. It is in the libvirt client
> libraries.
> 
> You are looking at the wrong process.

I see, I will attach latest result in here.

Thanks,
Alex

Comment 10 Alex Jia 2011-07-19 10:42:40 UTC
Created attachment 513759 [details]
libvirt-0.9.1-1.el6.x86_64

remote driver unchanged, I haven't found any fds are opened, so it's okay for this version as Neil said.

Comment 11 Alex Jia 2011-07-19 10:49:38 UTC
Created attachment 513762 [details]
libvirt-0.9.3-1.el6.x86_64

remote driver changed. I have found fds is opened firstly using con = libvirt.open(''), for instance:
virNetSocketNew:107 : localAddr=0x7fffd09b6990 remoteAddr=0x7fffd09b6900 fd=3 errfd=-1 pid=0

and when I disconnect using con.close(), I haven't found any operation to close this fds (fd=3):

18:34:35.177: 27014: debug : virConnectClose:1320 : conn=0x9c0aa0
18:34:35.177: 27014: debug : virUnrefConnect:145 : unref connection 0x9c0aa0 1
18:34:35.177: 27014: debug : virReleaseConnect:94 : release connection 0x9c0aa0
18:34:35.177: 27014: debug : virNetMessageNew:44 : msg=0xa177d0
18:34:35.177: 27014: debug : virNetMessageEncodePayload:255 : Encode length as 28
18:34:35.177: 27014: debug : virNetClientIO:1006 : program=536903814 version=1 serial=3 proc=2 type=0 length=28 dispatch=(nil)
18:34:35.177: 27014: debug : virNetClientIO:1074 : We have the buck 0xa048d0 0xa048d0
18:34:35.177: 27014: debug : virNetSocketUpdateIOCallback:1083 : Watch not registered on socket 0xa02620
18:34:35.177: 27014: debug : virNetClientIOEventLoop:950 : Giving up the buck due to I/O error 0xa048d0 (nil)
18:34:35.177: 27014: debug : virNetSocketUpdateIOCallback:1083 : Watch not registered on socket 0xa02620
18:34:35.177: 27014: debug : virNetClientIO:1101 : All done with our call (nil) 0xa048d0 -1

when I open connection secondly:
virNetSocketNew:107 : localAddr=0x7fffd09b6990 remoteAddr=0x7fffd09b6900 fd=6 errfd=-1 pid=0

Note: open a new fds (fd=6)at above.

disconnect secondly:
same as above.

we indeed haven't closed fds when disconnect libvirt client in this version.

Comment 12 Alex Jia 2011-07-19 10:52:32 UTC
Created attachment 513763 [details]
libvirt-0.9.3-5.el6.x86_64

It's okay for libvirt-0.9.3-5.el6.x86_64, when I disconnect libvirt client, I can see this:

>>> con.close()
18:11:37.250: 26351: debug : virConnectClose:1320 : conn=0x11bcaa0
18:11:37.250: 26351: debug : virUnrefConnect:145 : unref connection 0x11bcaa0 1
18:11:37.250: 26351: debug : virReleaseConnect:94 : release connection 0x11bcaa0
18:11:37.250: 26351: debug : virNetMessageNew:44 : msg=0x12137d0
18:11:37.250: 26351: debug : virNetMessageEncodePayload:255 : Encode length as 28
18:11:37.250: 26351: debug : virNetClientIO:1006 : program=536903814 version=1 serial=3 proc=2 type=0 length=28 dispatch=(nil)
18:11:37.250: 26351: debug : virNetClientIO:1074 : We have the buck 0x11bce90 0x11bce90
18:11:37.250: 26351: debug : virNetSocketUpdateIOCallback:1083 : Watch not registered on socket 0x11fe810
18:11:37.250: 26351: debug : virNetMessageDecodeLength:125 : Got length, now need 28 total (24 more)
18:11:37.250: 26351: debug : virNetClientCallDispatch:631 : Incoming message prog 536903814 vers 1 proc 2 type 1 status 0 serial 3
18:11:37.250: 26351: debug : virNetClientIOEventLoop:929 : Giving up the buck 0x11bce90 (nil)
18:11:37.250: 26351: debug : virNetSocketUpdateIOCallback:1083 : Watch not registered on socket 0x11fe810
18:11:37.250: 26351: debug : virNetClientIO:1101 : All done with our call (nil) 0x11bce90 0
18:11:37.250: 26351: debug : virNetSocketRemoveIOCallback:1093 : Watch not registered on socket 0x11fe810
18:11:37.250: 26351: debug : virNetSocketFree:627 : sock=0x11fe810 fd=3

virNetSocketFree is closing this fd, and the fd is always 3 when I open and close connection twice.

Comment 13 Eric Blake 2011-09-20 22:09:49 UTC
Given comment 12, the root cause of the problem has been fixed by patches backported into 0.9.3-5.el6 from upstream, so I'm closing this as completed upstream.  At any rate, I could not reproduce it with 0.9.4.


Note You need to log in before you can comment on or make changes to this bug.