Bug 761906 (GLUSTER-174)

Summary: booster: fd_ts, they are a leakin
Product: [Community] GlusterFS Reporter: Shehjar Tikoo <shehjart>
Component: boosterAssignee: Shehjar Tikoo <shehjart>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: low    
Version: mainlineCC: gluster-bugs, lakshmipathi
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: RTNR Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Shehjar Tikoo 2009-07-29 10:18:40 UTC
See inline comments:


(In reply to comment #0)
> =================
> root@indus:statcache# gdb ./fstatcache 
> GNU gdb 6.8-debian
> Copyright (C) 2008 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu"...
> (gdb) source ldpreload-booster_lib.gdb 
> Function "booster_init" not defined.
> Breakpoint 1 (booster_init) pending.
> [Thread debugging using libthread_db enabled]
> [New Thread 0x2baa2bfa2dd0 (LWP 24639)]
> [Switching to Thread 0x2baa2bfa2dd0 (LWP 24639)]
> 
> Breakpoint 1, booster_init () at booster.c:993
> 993            char    *booster_conf_path = NULL;
> Breakpoint 2 at 0x2baa2b3f5330
> 
> Breakpoint 2, 0x00002baa2b3f5330 in getenv () from /lib/libc.so.6
> Single stepping until exit from function getenv, 
> which has no line number information.
> booster_init () at booster.c:1016
> 1016            if (booster_conf_path != NULL) {
> (gdb) b traverse_dir 
> Breakpoint 3 at 0x400a7e: file fstatcache.c, line 22.
> (gdb) c
> Continuing.
> 
> Breakpoint 2, 0x00002baa2b3f5330 in getenv () from /lib/libc.so.6
> (gdb)  d d2
> warning: bad breakpoint number at or near 'd2'
> (gdb)  d 2
> (gdb) c
> Continuing.
> [New Thread 0x2baa2d28f950 (LWP 24642)]
> [New Thread 0x2baa2da90950 (LWP 24643)]
> Test: path /testpath/, delay 0: 
> Breakpoint 3, traverse_dir (path=0x7fff7fb2a935 "/testpath/") at
> fstatcache.c:22
> 22            struct dirent           *dire = NULL;
> (gdb) n
> 23            DIR                     *dh = NULL;
> (gdb) 
> 24            int                     fd = 0;
> (gdb) 
> 25            int                     ret = 0;
> (gdb) 
> 28            if ((dh = opendir (path)) == NULL) {
> (gdb) 
> 34            while ((dire = readdir (dh)) != NULL) {
> (gdb) 
> 35                    if (strcmp (dire->d_name, ".") == 0)
> (gdb) 
> 38                    if (strcmp (dire->d_name, "..") == 0)
> (gdb) 
> 40                    sprintf (statpath, "%s/%s",path, dire->d_name);
> (gdb) 
> 42                    fprintf (stderr, "Stat'ing %s\n", statpath);
> (gdb) 
> Stat'ing /testpath//mbptestfile1.1.0
> 
> ###############################################################
> @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> This is where I start breaking at fd_refs and unrefs to check whether the
> unrefs are called as many times as the refs between an open-fstat-close
> sequence.
> @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> ###############################################################
> 
> 44                    if ((fd = open (statpath, O_RDONLY)) < 0) {
> (gdb) b _fd_ref
> Breakpoint 4 at 0x2baa2b74a4c8: file fd.c, line 346.
> (gdb) b _fd_unref
> Breakpoint 5 at 0x2baa2b74a594: file fd.c, line 371.
> (gdb) c
> Continuing.
> 
> Breakpoint 4, _fd_ref (fd=0x60b2c0) at fd.c:346
> 346        ++fd->refcount;
> (gdb) bt
> #0  _fd_ref (fd=0x60b2c0) at fd.c:346
> #1  0x00002baa2b74ab66 in fd_create (inode=0x60b220, pid=0) at fd.c:494
> #2  0x00002baa2b96da7f in glusterfs_glh_open (handle=0x603080, path=0x602ee0
> "//mbptestfile1.1.0", 
>     flags=0) at libglusterfsclient.c:2559
> #3  0x00002baa2b96dfbe in glusterfs_open (path=0x7fff7fb29c70
> "/testpath//mbptestfile1.1.0", 
>     flags=0) at libglusterfsclient.c:2645
> #4  0x00002baa2b1a1410 in vmp_open (pathname=0x7fff7fb29c70
> "/testpath//mbptestfile1.1.0", flags=0)
>     at booster.c:367
> #5  0x00002baa2b1a1641 in booster_open (pathname=0x7fff7fb29c70
> "/testpath//mbptestfile1.1.0", 
>     use64=0, flags=0) at booster.c:423
> #6  0x00002baa2b1a196a in *open (pathname=0x7fff7fb29c70
> "/testpath//mbptestfile1.1.0", flags=0)
>     at booster.c:490
> #7  0x0000000000400b77 in traverse_dir (path=0x7fff7fb2a935 "/testpath/") at
> fstatcache.c:44
> #8  0x0000000000400c60 in stat_cache_test (path=0x7fff7fb2a935 "/testpath/") at
> fstatcache.c:74
> #9  0x0000000000400dde in main (argc=5, argv=0x7fff7fb2a208) at
> fstatcache.c:114
> (gdb) c
> Continuing.
> 
> Breakpoint 4, _fd_ref (fd=0x60b2c0) at fd.c:346
> 346        ++fd->refcount;
> (gdb) bt
> #0  _fd_ref (fd=0x60b2c0) at fd.c:346
> #1  0x00002baa2b74a565 in fd_ref (fd=0x60b2c0) at fd.c:362
> #2  0x00002baa2b73fc55 in fop_open_cbk_stub (frame=0x60ab10, fn=0, op_ret=0,
> op_errno=0, 
>     fd=0x60b2c0) at call-stub.c:1044
> #3  0x00002baa2b96cb2f in libgf_client_open_cbk (frame=0x60ab10,
> cookie=0x609300, this=0x604400, 
>     op_ret=0, op_errno=0, fd=0x60b2c0) at libglusterfsclient.c:2373
> #4  0x00002baa2c81293d in posix_open (frame=0x609300, this=0x6086e0,
> loc=0x7fff7fb29650, flags=0, 
>     fd=0x60b2c0) at posix.c:1492
> #5  0x00002baa2b96cd7f in libgf_client_open (ctx=0x603080, loc=0x7fff7fb29650,
> fd=0x60b2c0, flags=0)
>     at libglusterfsclient.c:2391
> #6  0x00002baa2b96dc00 in glusterfs_glh_open (handle=0x603080, path=0x602ee0
> "//mbptestfile1.1.0", 
>     flags=0) at libglusterfsclient.c:2581
> #7  0x00002baa2b96dfbe in glusterfs_open (path=0x7fff7fb29c70
> "/testpath//mbptestfile1.1.0", 
>     flags=0) at libglusterfsclient.c:2645
> #8  0x00002baa2b1a1410 in vmp_open (pathname=0x7fff7fb29c70
> "/testpath//mbptestfile1.1.0", flags=0)
>     at booster.c:367
> #9  0x00002baa2b1a1641 in booster_open (pathname=0x7fff7fb29c70
> "/testpath//mbptestfile1.1.0", 
>     use64=0, flags=0) at booster.c:423
> #10 0x00002baa2b1a196a in *open (pathname=0x7fff7fb29c70
> "/testpath//mbptestfile1.1.0", flags=0)
>     at booster.c:490
> #11 0x0000000000400b77 in traverse_dir (path=0x7fff7fb2a935 "/testpath/") at
> fstatcache.c:44
> #12 0x0000000000400c60 in stat_cache_test (path=0x7fff7fb2a935 "/testpath/") at
> fstatcache.c:74
> #13 0x0000000000400dde in main (argc=5, argv=0x7fff7fb2a208) at
> fstatcache.c:114
> (gdb) c
> Continuing.
> 
> Breakpoint 5, _fd_unref (fd=0x60b2c0) at fd.c:371
> 371        assert (fd->refcount);
> (gdb) bt
> #0  _fd_unref (fd=0x60b2c0) at fd.c:371
> #1  0x00002baa2b74a92e in fd_unref (fd=0x60b2c0) at fd.c:441
> #2  0x00002baa2b748e36 in call_stub_destroy_unwind (stub=0x60b580) at
> call-stub.c:3909
> #3  0x00002baa2b749283 in call_stub_destroy (stub=0x60b580) at call-stub.c:4152
> #4  0x00002baa2b96ce38 in libgf_client_open (ctx=0x603080, loc=0x7fff7fb29650,
> fd=0x60b2c0, flags=0)
>     at libglusterfsclient.c:2396
> #5  0x00002baa2b96dc00 in glusterfs_glh_open (handle=0x603080, path=0x602ee0
> "//mbptestfile1.1.0", 
>     flags=0) at libglusterfsclient.c:2581
> #6  0x00002baa2b96dfbe in glusterfs_open (path=0x7fff7fb29c70
> "/testpath//mbptestfile1.1.0", 
>     flags=0) at libglusterfsclient.c:2645
> #7  0x00002baa2b1a1410 in vmp_open (pathname=0x7fff7fb29c70
> "/testpath//mbptestfile1.1.0", flags=0)
>     at booster.c:367
> #8  0x00002baa2b1a1641 in booster_open (pathname=0x7fff7fb29c70
> "/testpath//mbptestfile1.1.0", 
>     use64=0, flags=0) at booster.c:423
> #9  0x00002baa2b1a196a in *open (pathname=0x7fff7fb29c70
> "/testpath//mbptestfile1.1.0", flags=0)
>     at booster.c:490
> #10 0x0000000000400b77 in traverse_dir (path=0x7fff7fb2a935 "/testpath/") at
> fstatcache.c:44
> #11 0x0000000000400c60 in stat_cache_test (path=0x7fff7fb2a935 "/testpath/") at
> fstatcache.c:74
> #12 0x0000000000400dde in main (argc=5, argv=0x7fff7fb2a208) at
> fstatcache.c:114
> (gdb) c
> Continuing.
> 
> Breakpoint 4, _fd_ref (fd=0x60b2c0) at fd.c:346
> 346        ++fd->refcount;
> (gdb) bt
> #0  _fd_ref (fd=0x60b2c0) at fd.c:346
> #1  0x00002baa2b74a565 in fd_ref (fd=0x60b2c0) at fd.c:362
> #2  0x00002baa2b1a692a in booster_fd_unused_get (fdtable=0x602040,
> fdptr=0x60b2c0, fd=13)
>     at booster-fd.c:221
> #3  0x00002baa2b1a1453 in vmp_open (pathname=0x7fff7fb29c70
> "/testpath//mbptestfile1.1.0", flags=0)
>     at booster.c:376
> #4  0x00002baa2b1a1641 in booster_open (pathname=0x7fff7fb29c70
> "/testpath//mbptestfile1.1.0", 
>     use64=0, flags=0) at booster.c:423
> #5  0x00002baa2b1a196a in *open (pathname=0x7fff7fb29c70
> "/testpath//mbptestfile1.1.0", flags=0)
>     at booster.c:490
> #6  0x0000000000400b77 in traverse_dir (path=0x7fff7fb2a935 "/testpath/") at
> fstatcache.c:44
> #7  0x0000000000400c60 in stat_cache_test (path=0x7fff7fb2a935 "/testpath/") at
> fstatcache.c:74
> #8  0x0000000000400dde in main (argc=5, argv=0x7fff7fb2a208) at
> fstatcache.c:114
> (gdb) c
> Continuing.
> 
> Breakpoint 4, _fd_ref (fd=0x60b2c0) at fd.c:346
> 346        ++fd->refcount;
> (gdb) bt
> #0  _fd_ref (fd=0x60b2c0) at fd.c:346
> #1  0x00002baa2b74a565 in fd_ref (fd=0x60b2c0) at fd.c:362
> #2  0x00002baa2b1a6bc8 in booster_fdptr_get (fdtable=0x602040, fd=13) at
> booster-fd.c:279
> #3  0x00002baa2b1a3548 in booster_fxstat (ver=1, fd=13, buf=0x7fff7fb29be0) at
> booster.c:1586
> #4  0x00002baa2b1a5039 in __fxstat (ver=1, fd=13, buf=0x7fff7fb29be0) at
> booster_stat.c:103
> #5  0x0000000000400bc3 in traverse_dir (path=0x7fff7fb2a935 "/testpath/") at
> fstatcache.c:50
> #6  0x0000000000400c60 in stat_cache_test (path=0x7fff7fb2a935 "/testpath/") at
> fstatcache.c:74
> #7  0x0000000000400dde in main (argc=5, argv=0x7fff7fb2a208) at
> fstatcache.c:114
> (gdb) c
> Continuing.
> 
> Breakpoint 4, _fd_ref (fd=0x60b2c0) at fd.c:346
> 346        ++fd->refcount;
> (gdb) bt
> #0  _fd_ref (fd=0x60b2c0) at fd.c:346
> #1  0x00002baa2b74a565 in fd_ref (fd=0x60b2c0) at fd.c:362
> #2  0x00002baa2b1a6bc8 in booster_fdptr_get (fdtable=0x602040, fd=13) at
> booster-fd.c:279
> #3  0x00002baa2b1a2125 in close (fd=13) at booster.c:763
> #4  0x0000000000400c0a in traverse_dir (path=0x7fff7fb2a935 "/testpath/") at
> fstatcache.c:58
> #5  0x0000000000400c60 in stat_cache_test (path=0x7fff7fb2a935 "/testpath/") at
> fstatcache.c:74
> #6  0x0000000000400dde in main (argc=5, argv=0x7fff7fb2a208) at
> fstatcache.c:114
> (gdb) c
> Continuing.

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
If you look at the 2 backtraces above, we see that there is a ref on the fd on entering booster_fxstat and without an unref on this ref'd fd, we're directly jumping to the fd_ref call in close. And looking at the source in booster_fxstat, we do see that there is a call to booster_fdptr_put, right after glusterfs_fstat. So where did this call to booster_fdptr_put->fd_unref go.
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

> 
> Breakpoint 5, _fd_unref (fd=0x60b2c0) at fd.c:371
> 371        assert (fd->refcount);
> (gdb) p fd->refcount
> $1 = 4
> (gdb) bt
> #0  _fd_unref (fd=0x60b2c0) at fd.c:371
> #1  0x00002baa2b74a92e in fd_unref (fd=0x60b2c0) at fd.c:441
> #2  0x00002baa2b1a6ab0 in booster_fd_put (fdtable=0x602040, fd=13) at
> booster-fd.c:255
> #3  0x00002baa2b1a213f in close (fd=13) at booster.c:766
> #4  0x0000000000400c0a in traverse_dir (path=0x7fff7fb2a935 "/testpath/") at
> fstatcache.c:58
> #5  0x0000000000400c60 in stat_cache_test (path=0x7fff7fb2a935 "/testpath/") at
> fstatcache.c:74
> #6  0x0000000000400dde in main (argc=5, argv=0x7fff7fb2a208) at
> fstatcache.c:114
> (gdb) c
> Continuing.
> 
> Breakpoint 5, _fd_unref (fd=0x60b2c0) at fd.c:371
> 371        assert (fd->refcount);
> (gdb) bt
> #0  _fd_unref (fd=0x60b2c0) at fd.c:371
> #1  0x00002baa2b74a92e in fd_unref (fd=0x60b2c0) at fd.c:441
> #2  0x00002baa2b96e492 in glusterfs_close (fd=0x60b2c0) at
> libglusterfsclient.c:2737
> #3  0x00002baa2b1a2148 in close (fd=13) at booster.c:767
> #4  0x0000000000400c0a in traverse_dir (path=0x7fff7fb2a935 "/testpath/") at
> fstatcache.c:58
> #5  0x0000000000400c60 in stat_cache_test (path=0x7fff7fb2a935 "/testpath/") at
> fstatcache.c:74
> #6  0x0000000000400dde in main (argc=5, argv=0x7fff7fb2a208) at
> fstatcache.c:114
> (gdb) c
> Continuing.
> Stat'ing /testpath//testd
> 
> Breakpoint 4, _fd_ref (fd=0x60b410) at fd.c:346
> 346        ++fd->refcount;


@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Similary, in the previous two breakpoints, we see that right after the glusterfs_close->fd_unref, we're moving to fd_ref'ing the next fd being opened for /testpath//testd. Again, the question is in booster.c:close(), why isnt the booster_fdptr_put->fd_unref being called after glusterfs_close.

Both these two missing fd_unrefs explain why we have an outstanding refcount of 2 at the end of the sequence.

They unrefs are missing because of the following bug:
void    
booster_fdptr_put (fd_t *booster_fd)
{       
        if (!booster_fd) ####### should be if (booster_fd) ########
                fd_unref (booster_fd);
}               

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@


> (gdb) bt
> #0  _fd_ref (fd=0x60b410) at fd.c:346
> #1  0x00002baa2b74ab66 in fd_create (inode=0x60aa60, pid=0) at fd.c:494
> #2  0x00002baa2b96da7f in glusterfs_glh_open (handle=0x603080, path=0x60b340
> "//testd", flags=0)
>     at libglusterfsclient.c:2559
> #3  0x00002baa2b96dfbe in glusterfs_open (path=0x7fff7fb29c70
> "/testpath//testd", flags=0)
>     at libglusterfsclient.c:2645
> #4  0x00002baa2b1a1410 in vmp_open (pathname=0x7fff7fb29c70 "/testpath//testd",
> flags=0)
>     at booster.c:367
> #5  0x00002baa2b1a1641 in booster_open (pathname=0x7fff7fb29c70
> "/testpath//testd", use64=0, 
>     flags=0) at booster.c:423
> #6  0x00002baa2b1a196a in *open (pathname=0x7fff7fb29c70 "/testpath//testd",
> flags=0)
>     at booster.c:490
> #7  0x0000000000400b77 in traverse_dir (path=0x7fff7fb2a935 "/testpath/") at
> fstatcache.c:44
> #8  0x0000000000400c60 in stat_cache_test (path=0x7fff7fb2a935 "/testpath/") at
> fstatcache.c:74
> #9  0x0000000000400dde in main (argc=5, argv=0x7fff7fb2a208) at
> fstatcache.c:114
> 
> ====================================
> 
> If you notice carefully, the fd ref/unref sequence and the corresponding
> refcount is something like below:
> 
> ref - 1
> ref - 2
> unref - 1
> ref - 2
> ref - 3
> ref - 4
> unref - 3
> unref - 2
> 
> That still leaves two uncalled fd_unrefs, resulting in a leak.

Comment 1 Shehjar Tikoo 2009-07-29 13:02:32 UTC
Avati reported that a there is a fd leak in a booster/libglusterfsclient setup so I set out to find it. The gdb trace below is from a custom tool I have that goes through a directory using readdir, opening each file/dir and calling fstat on the fd.

=================
root@indus:statcache# gdb ./fstatcache 
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu"...
(gdb) source ldpreload-booster_lib.gdb 
Function "booster_init" not defined.
Breakpoint 1 (booster_init) pending.
[Thread debugging using libthread_db enabled]
[New Thread 0x2baa2bfa2dd0 (LWP 24639)]
[Switching to Thread 0x2baa2bfa2dd0 (LWP 24639)]

Breakpoint 1, booster_init () at booster.c:993
993	        char    *booster_conf_path = NULL;
Breakpoint 2 at 0x2baa2b3f5330

Breakpoint 2, 0x00002baa2b3f5330 in getenv () from /lib/libc.so.6
Single stepping until exit from function getenv, 
which has no line number information.
booster_init () at booster.c:1016
1016	        if (booster_conf_path != NULL) {
(gdb) b traverse_dir 
Breakpoint 3 at 0x400a7e: file fstatcache.c, line 22.
(gdb) c
Continuing.

Breakpoint 2, 0x00002baa2b3f5330 in getenv () from /lib/libc.so.6
(gdb)  d d2
warning: bad breakpoint number at or near 'd2'
(gdb)  d 2
(gdb) c
Continuing.
[New Thread 0x2baa2d28f950 (LWP 24642)]
[New Thread 0x2baa2da90950 (LWP 24643)]
Test: path /testpath/, delay 0: 
Breakpoint 3, traverse_dir (path=0x7fff7fb2a935 "/testpath/") at fstatcache.c:22
22	        struct dirent           *dire = NULL;
(gdb) n
23	        DIR                     *dh = NULL;
(gdb) 
24	        int                     fd = 0;
(gdb) 
25	        int                     ret = 0;
(gdb) 
28	        if ((dh = opendir (path)) == NULL) {
(gdb) 
34	        while ((dire = readdir (dh)) != NULL) {
(gdb) 
35	                if (strcmp (dire->d_name, ".") == 0)
(gdb) 
38	                if (strcmp (dire->d_name, "..") == 0)
(gdb) 
40	                sprintf (statpath, "%s/%s",path, dire->d_name);
(gdb) 
42	                fprintf (stderr, "Stat'ing %s\n", statpath);
(gdb) 
Stat'ing /testpath//mbptestfile1.1.0

###############################################################
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
This is where I start breaking at fd_refs and unrefs to check whether the unrefs are called as many times as the refs between an open-fstat-close sequence.
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
###############################################################

44	                if ((fd = open (statpath, O_RDONLY)) < 0) {
(gdb) b _fd_ref
Breakpoint 4 at 0x2baa2b74a4c8: file fd.c, line 346.
(gdb) b _fd_unref
Breakpoint 5 at 0x2baa2b74a594: file fd.c, line 371.
(gdb) c
Continuing.

Breakpoint 4, _fd_ref (fd=0x60b2c0) at fd.c:346
346		++fd->refcount;
(gdb) bt
#0  _fd_ref (fd=0x60b2c0) at fd.c:346
#1  0x00002baa2b74ab66 in fd_create (inode=0x60b220, pid=0) at fd.c:494
#2  0x00002baa2b96da7f in glusterfs_glh_open (handle=0x603080, path=0x602ee0 "//mbptestfile1.1.0", 
    flags=0) at libglusterfsclient.c:2559
#3  0x00002baa2b96dfbe in glusterfs_open (path=0x7fff7fb29c70 "/testpath//mbptestfile1.1.0", 
    flags=0) at libglusterfsclient.c:2645
#4  0x00002baa2b1a1410 in vmp_open (pathname=0x7fff7fb29c70 "/testpath//mbptestfile1.1.0", flags=0)
    at booster.c:367
#5  0x00002baa2b1a1641 in booster_open (pathname=0x7fff7fb29c70 "/testpath//mbptestfile1.1.0", 
    use64=0, flags=0) at booster.c:423
#6  0x00002baa2b1a196a in *open (pathname=0x7fff7fb29c70 "/testpath//mbptestfile1.1.0", flags=0)
    at booster.c:490
#7  0x0000000000400b77 in traverse_dir (path=0x7fff7fb2a935 "/testpath/") at fstatcache.c:44
#8  0x0000000000400c60 in stat_cache_test (path=0x7fff7fb2a935 "/testpath/") at fstatcache.c:74
#9  0x0000000000400dde in main (argc=5, argv=0x7fff7fb2a208) at fstatcache.c:114
(gdb) c
Continuing.

Breakpoint 4, _fd_ref (fd=0x60b2c0) at fd.c:346
346		++fd->refcount;
(gdb) bt
#0  _fd_ref (fd=0x60b2c0) at fd.c:346
#1  0x00002baa2b74a565 in fd_ref (fd=0x60b2c0) at fd.c:362
#2  0x00002baa2b73fc55 in fop_open_cbk_stub (frame=0x60ab10, fn=0, op_ret=0, op_errno=0, 
    fd=0x60b2c0) at call-stub.c:1044
#3  0x00002baa2b96cb2f in libgf_client_open_cbk (frame=0x60ab10, cookie=0x609300, this=0x604400, 
    op_ret=0, op_errno=0, fd=0x60b2c0) at libglusterfsclient.c:2373
#4  0x00002baa2c81293d in posix_open (frame=0x609300, this=0x6086e0, loc=0x7fff7fb29650, flags=0, 
    fd=0x60b2c0) at posix.c:1492
#5  0x00002baa2b96cd7f in libgf_client_open (ctx=0x603080, loc=0x7fff7fb29650, fd=0x60b2c0, flags=0)
    at libglusterfsclient.c:2391
#6  0x00002baa2b96dc00 in glusterfs_glh_open (handle=0x603080, path=0x602ee0 "//mbptestfile1.1.0", 
    flags=0) at libglusterfsclient.c:2581
#7  0x00002baa2b96dfbe in glusterfs_open (path=0x7fff7fb29c70 "/testpath//mbptestfile1.1.0", 
    flags=0) at libglusterfsclient.c:2645
#8  0x00002baa2b1a1410 in vmp_open (pathname=0x7fff7fb29c70 "/testpath//mbptestfile1.1.0", flags=0)
    at booster.c:367
#9  0x00002baa2b1a1641 in booster_open (pathname=0x7fff7fb29c70 "/testpath//mbptestfile1.1.0", 
    use64=0, flags=0) at booster.c:423
#10 0x00002baa2b1a196a in *open (pathname=0x7fff7fb29c70 "/testpath//mbptestfile1.1.0", flags=0)
    at booster.c:490
#11 0x0000000000400b77 in traverse_dir (path=0x7fff7fb2a935 "/testpath/") at fstatcache.c:44
#12 0x0000000000400c60 in stat_cache_test (path=0x7fff7fb2a935 "/testpath/") at fstatcache.c:74
#13 0x0000000000400dde in main (argc=5, argv=0x7fff7fb2a208) at fstatcache.c:114
(gdb) c
Continuing.

Breakpoint 5, _fd_unref (fd=0x60b2c0) at fd.c:371
371		assert (fd->refcount);
(gdb) bt
#0  _fd_unref (fd=0x60b2c0) at fd.c:371
#1  0x00002baa2b74a92e in fd_unref (fd=0x60b2c0) at fd.c:441
#2  0x00002baa2b748e36 in call_stub_destroy_unwind (stub=0x60b580) at call-stub.c:3909
#3  0x00002baa2b749283 in call_stub_destroy (stub=0x60b580) at call-stub.c:4152
#4  0x00002baa2b96ce38 in libgf_client_open (ctx=0x603080, loc=0x7fff7fb29650, fd=0x60b2c0, flags=0)
    at libglusterfsclient.c:2396
#5  0x00002baa2b96dc00 in glusterfs_glh_open (handle=0x603080, path=0x602ee0 "//mbptestfile1.1.0", 
    flags=0) at libglusterfsclient.c:2581
#6  0x00002baa2b96dfbe in glusterfs_open (path=0x7fff7fb29c70 "/testpath//mbptestfile1.1.0", 
    flags=0) at libglusterfsclient.c:2645
#7  0x00002baa2b1a1410 in vmp_open (pathname=0x7fff7fb29c70 "/testpath//mbptestfile1.1.0", flags=0)
    at booster.c:367
#8  0x00002baa2b1a1641 in booster_open (pathname=0x7fff7fb29c70 "/testpath//mbptestfile1.1.0", 
    use64=0, flags=0) at booster.c:423
#9  0x00002baa2b1a196a in *open (pathname=0x7fff7fb29c70 "/testpath//mbptestfile1.1.0", flags=0)
    at booster.c:490
#10 0x0000000000400b77 in traverse_dir (path=0x7fff7fb2a935 "/testpath/") at fstatcache.c:44
#11 0x0000000000400c60 in stat_cache_test (path=0x7fff7fb2a935 "/testpath/") at fstatcache.c:74
#12 0x0000000000400dde in main (argc=5, argv=0x7fff7fb2a208) at fstatcache.c:114
(gdb) c
Continuing.

Breakpoint 4, _fd_ref (fd=0x60b2c0) at fd.c:346
346		++fd->refcount;
(gdb) bt
#0  _fd_ref (fd=0x60b2c0) at fd.c:346
#1  0x00002baa2b74a565 in fd_ref (fd=0x60b2c0) at fd.c:362
#2  0x00002baa2b1a692a in booster_fd_unused_get (fdtable=0x602040, fdptr=0x60b2c0, fd=13)
    at booster-fd.c:221
#3  0x00002baa2b1a1453 in vmp_open (pathname=0x7fff7fb29c70 "/testpath//mbptestfile1.1.0", flags=0)
    at booster.c:376
#4  0x00002baa2b1a1641 in booster_open (pathname=0x7fff7fb29c70 "/testpath//mbptestfile1.1.0", 
    use64=0, flags=0) at booster.c:423
#5  0x00002baa2b1a196a in *open (pathname=0x7fff7fb29c70 "/testpath//mbptestfile1.1.0", flags=0)
    at booster.c:490
#6  0x0000000000400b77 in traverse_dir (path=0x7fff7fb2a935 "/testpath/") at fstatcache.c:44
#7  0x0000000000400c60 in stat_cache_test (path=0x7fff7fb2a935 "/testpath/") at fstatcache.c:74
#8  0x0000000000400dde in main (argc=5, argv=0x7fff7fb2a208) at fstatcache.c:114
(gdb) c
Continuing.

Breakpoint 4, _fd_ref (fd=0x60b2c0) at fd.c:346
346		++fd->refcount;
(gdb) bt
#0  _fd_ref (fd=0x60b2c0) at fd.c:346
#1  0x00002baa2b74a565 in fd_ref (fd=0x60b2c0) at fd.c:362
#2  0x00002baa2b1a6bc8 in booster_fdptr_get (fdtable=0x602040, fd=13) at booster-fd.c:279
#3  0x00002baa2b1a3548 in booster_fxstat (ver=1, fd=13, buf=0x7fff7fb29be0) at booster.c:1586
#4  0x00002baa2b1a5039 in __fxstat (ver=1, fd=13, buf=0x7fff7fb29be0) at booster_stat.c:103
#5  0x0000000000400bc3 in traverse_dir (path=0x7fff7fb2a935 "/testpath/") at fstatcache.c:50
#6  0x0000000000400c60 in stat_cache_test (path=0x7fff7fb2a935 "/testpath/") at fstatcache.c:74
#7  0x0000000000400dde in main (argc=5, argv=0x7fff7fb2a208) at fstatcache.c:114
(gdb) c
Continuing.

Breakpoint 4, _fd_ref (fd=0x60b2c0) at fd.c:346
346		++fd->refcount;
(gdb) bt
#0  _fd_ref (fd=0x60b2c0) at fd.c:346
#1  0x00002baa2b74a565 in fd_ref (fd=0x60b2c0) at fd.c:362
#2  0x00002baa2b1a6bc8 in booster_fdptr_get (fdtable=0x602040, fd=13) at booster-fd.c:279
#3  0x00002baa2b1a2125 in close (fd=13) at booster.c:763
#4  0x0000000000400c0a in traverse_dir (path=0x7fff7fb2a935 "/testpath/") at fstatcache.c:58
#5  0x0000000000400c60 in stat_cache_test (path=0x7fff7fb2a935 "/testpath/") at fstatcache.c:74
#6  0x0000000000400dde in main (argc=5, argv=0x7fff7fb2a208) at fstatcache.c:114
(gdb) c
Continuing.

Breakpoint 5, _fd_unref (fd=0x60b2c0) at fd.c:371
371		assert (fd->refcount);
(gdb) p fd->refcount
$1 = 4
(gdb) bt
#0  _fd_unref (fd=0x60b2c0) at fd.c:371
#1  0x00002baa2b74a92e in fd_unref (fd=0x60b2c0) at fd.c:441
#2  0x00002baa2b1a6ab0 in booster_fd_put (fdtable=0x602040, fd=13) at booster-fd.c:255
#3  0x00002baa2b1a213f in close (fd=13) at booster.c:766
#4  0x0000000000400c0a in traverse_dir (path=0x7fff7fb2a935 "/testpath/") at fstatcache.c:58
#5  0x0000000000400c60 in stat_cache_test (path=0x7fff7fb2a935 "/testpath/") at fstatcache.c:74
#6  0x0000000000400dde in main (argc=5, argv=0x7fff7fb2a208) at fstatcache.c:114
(gdb) c
Continuing.

Breakpoint 5, _fd_unref (fd=0x60b2c0) at fd.c:371
371		assert (fd->refcount);
(gdb) bt
#0  _fd_unref (fd=0x60b2c0) at fd.c:371
#1  0x00002baa2b74a92e in fd_unref (fd=0x60b2c0) at fd.c:441
#2  0x00002baa2b96e492 in glusterfs_close (fd=0x60b2c0) at libglusterfsclient.c:2737
#3  0x00002baa2b1a2148 in close (fd=13) at booster.c:767
#4  0x0000000000400c0a in traverse_dir (path=0x7fff7fb2a935 "/testpath/") at fstatcache.c:58
#5  0x0000000000400c60 in stat_cache_test (path=0x7fff7fb2a935 "/testpath/") at fstatcache.c:74
#6  0x0000000000400dde in main (argc=5, argv=0x7fff7fb2a208) at fstatcache.c:114
(gdb) c
Continuing.
Stat'ing /testpath//testd

Breakpoint 4, _fd_ref (fd=0x60b410) at fd.c:346
346		++fd->refcount;
(gdb) bt
#0  _fd_ref (fd=0x60b410) at fd.c:346
#1  0x00002baa2b74ab66 in fd_create (inode=0x60aa60, pid=0) at fd.c:494
#2  0x00002baa2b96da7f in glusterfs_glh_open (handle=0x603080, path=0x60b340 "//testd", flags=0)
    at libglusterfsclient.c:2559
#3  0x00002baa2b96dfbe in glusterfs_open (path=0x7fff7fb29c70 "/testpath//testd", flags=0)
    at libglusterfsclient.c:2645
#4  0x00002baa2b1a1410 in vmp_open (pathname=0x7fff7fb29c70 "/testpath//testd", flags=0)
    at booster.c:367
#5  0x00002baa2b1a1641 in booster_open (pathname=0x7fff7fb29c70 "/testpath//testd", use64=0, 
    flags=0) at booster.c:423
#6  0x00002baa2b1a196a in *open (pathname=0x7fff7fb29c70 "/testpath//testd", flags=0)
    at booster.c:490
#7  0x0000000000400b77 in traverse_dir (path=0x7fff7fb2a935 "/testpath/") at fstatcache.c:44
#8  0x0000000000400c60 in stat_cache_test (path=0x7fff7fb2a935 "/testpath/") at fstatcache.c:74
#9  0x0000000000400dde in main (argc=5, argv=0x7fff7fb2a208) at fstatcache.c:114

====================================

If you notice carefully, the fd ref/unref sequence and the corresponding refcount is something like below:

ref - 1
ref - 2
unref - 1
ref - 2
ref - 3
ref - 4
unref - 3
unref - 2

That still leaves two uncalled fd_unrefs, resulting in a leak.

Comment 2 Shehjar Tikoo 2009-07-29 13:40:29 UTC
Here is a sample of the change in number of fds used by a storage/posix
used by libglusterfsclient:

The tool used for test goes through about 100000 files, doing a open-fstat-close
sequence. At the end of the test we should see minimal fds being used by the process which is used booster as LD_PRELOADed. storage/posix is the only translator being used by libglusterfsclient for the purpose of this test.

Before this fix:
root@indus:glusterfs# ls /proc/17267/fd|wc -l
100013

After:
root@indus:glusterfs# ls /proc/26182/fd|wc -l
7

Comment 3 Anand Avati 2009-07-29 19:23:01 UTC
PATCH: http://patches.gluster.com/patch/833 in master (booster: Fix fd leak due to incorrect NULL check)

Comment 4 Anand Avati 2009-07-29 19:23:05 UTC
PATCH: http://patches.gluster.com/patch/834 in release-2.0 (booster: Fix fd leak due to incorrect NULL check)

Comment 6 Anand Avati 2009-09-09 10:39:00 UTC
PATCH: http://patches.gluster.com/patch/1283 in master (booster: Fix fd_t leak in pread64)

Comment 7 Anand Avati 2009-09-09 10:39:25 UTC
PATCH: http://patches.gluster.com/patch/1288 in release-2.0 (booster: Fix fd_t leak in pread64)