Description of problem: Some of the files goes into .glusterfs/unlink folder after execution of pynfs test suite. Version-Release number of selected component (if applicable): Seen on hotfix build provided: [root@dhcp47-149 ~]# rpm -qa|grep glusterfs glusterfs-cli-3.7.9-12.el7rhgs.1.HOTFIX.Case01698221.BZ1374166.x86_64 glusterfs-client-xlators-3.7.9-12.el7rhgs.1.HOTFIX.Case01698221.BZ1374166.x86_64 glusterfs-debuginfo-3.7.9-12.el7rhgs.1.HOTFIX.Case01698221.BZ1374166.x86_64 glusterfs-api-3.7.9-12.el7rhgs.1.HOTFIX.Case01698221.BZ1374166.x86_64 glusterfs-ganesha-3.7.9-12.el7rhgs.1.HOTFIX.Case01698221.BZ1374166.x86_64 glusterfs-libs-3.7.9-12.el7rhgs.1.HOTFIX.Case01698221.BZ1374166.x86_64 glusterfs-fuse-3.7.9-12.el7rhgs.1.HOTFIX.Case01698221.BZ1374166.x86_64 glusterfs-geo-replication-3.7.9-12.el7rhgs.1.HOTFIX.Case01698221.BZ1374166.x86_64 glusterfs-3.7.9-12.el7rhgs.1.HOTFIX.Case01698221.BZ1374166.x86_64 glusterfs-rdma-3.7.9-12.el7rhgs.1.HOTFIX.Case01698221.BZ1374166.x86_64 glusterfs-server-3.7.9-12.el7rhgs.1.HOTFIX.Case01698221.BZ1374166.x86_64 How reproducible: Twice Steps to Reproduce: 1. Create nfs-ganesha cluster. 2. Create a volume, start it and mount it on a client. 3. Start executing pynfs test suite from the client. 4. Observe that some of the tests related to WRT18 and WRT19 fails: WRT18 st_write.testChangeGranularityWrite : FAILURE consecutive SETATTR(mode)'s don't all change change attribute WRT19 st_write.testStolenStateid : FAILURE WRITE with stolen stateid should return NFS4ERR_ACCESS or NFS4ERR_PERM, instead got NFS4_OK 5. Once the pynfs test suite finishes, following are the remaining data on the mount point: [root@Client2 nfs1]# ls tmp tree [root@Client2 nfs1]# ls -lR .: total 8 drwxr-xr-x. 2 nobody nobody 4096 Sep 26 16:52 tmp drwxr-xr-x. 3 nobody nobody 4096 Sep 26 16:33 tree ./tmp: total 0 ./tree: total 5 brwxr-xr-x. 1 nobody nobody 1, 2 Sep 26 16:33 block crwxr-xr-x. 1 nobody nobody 1, 2 Sep 26 16:33 char drwxr-xr-x. 2 nobody nobody 4096 Sep 26 16:33 dir prwxr-xr-x. 1 nobody nobody 0 Sep 26 16:33 fifo -rwxr-xr-x. 1 nobody nobody 27 Sep 26 16:33 file lrwxrwxrwx. 1 nobody nobody 8 Sep 26 16:33 link -> /etc/X11 srwxr-xr-x. 1 nobody nobody 0 Sep 26 16:33 socket ./tree/dir: total 0 6. Check the brick path under .glusterfs/unlink on all nodes. 7. Observe that some of the files are present under .glusterfs/unlink folder once pynfs test suite finishes. [root@dhcp47-149 unlink]# cd /bricks/brick2/b2/.glusterfs/unlink/ [root@dhcp47-149 unlink]# ls aec616bd-f438-4e06-bf79-f79ba616ba14 f9db2469-dd7e-47d6-8c4c-e69ab9c447c3 [root@dhcp47-149 unlink]# [root@dhcp47-139 unlink]# cd /bricks/brick2/b2/.glusterfs/unlink/ [root@dhcp47-139 unlink]# ls 6b802efc-2ea4-4f2b-a5ed-538d53c800b2 8. Even after removing the data from the mount point, these files remains in the same path. [root@Client2 nfs1]# ls tmp tree [root@Client2 nfs1]# rm -rf * [root@Client2 nfs1]# ls [root@Client2 nfs1]# [root@dhcp47-149 unlink]# cd /bricks/brick2/b2/.glusterfs/unlink/ [root@dhcp47-149 unlink]# ls 2c75c428-a0a5-4d02-93b2-5829d8689c3c ac856802-9640-4a59-bbd8-988fa275bdff [root@dhcp47-139 unlink]# cd /bricks/brick2/b2/.glusterfs/unlink/ [root@dhcp47-139 unlink]# ls 6b1ac1e2-da55-4b76-b223-a1b36f54ffa1 Actual results: Some of the files goes into .glusterfs/unlink folder after execution of pynfs test suite. Expected results: files created during pynfs test suite should not go under .glusterfs/unlink folder Additional info: Same issue is seen with 3.1.3 builds as well without the hotfix. Following errors are seen in ganesha.log 26/09/2016 07:08:05 : epoch 9c020000 : dhcp47-149.lab.eng.blr.redhat.com : ganesha.nfsd-22172[work-2] nfs4_Compound :NFS4 :MAJ :An empty COMPOUND (no operation in it) was received 26/09/2016 07:08:05 : epoch 9c020000 : dhcp47-149.lab.eng.blr.redhat.com : ganesha.nfsd-22172[work-6] nfs4_Compound :NFS4 :CRIT :Bad Minor Version 50 26/09/2016 07:08:05 : epoch 9c020000 : dhcp47-149.lab.eng.blr.redhat.com : ganesha.nfsd-22172[work-14] nfs4_Compound :NFS4 :MAJ :A COMPOUND with too many operations (150) was received 26/09/2016 07:12:46 : epoch 9c020000 : dhcp47-149.lab.eng.blr.redhat.com : ganesha.nfsd-22172[work-6] nfs4_op_open_downgrade :STATE :EVENT :Failed to open downgrade: (share access or deny never seen before) 26/09/2016 07:12:46 : epoch 9c020000 : dhcp47-149.lab.eng.blr.redhat.com : ganesha.nfsd-22172[work-13] nfs4_op_open_downgrade :STATE :EVENT :Failed to open downgrade: (share access or deny never seen before) ~
With 3.1.3, the entries inside .glusterfs/unlink were present for almost all the files even after removal but with 3.1.3 + hotfix, the entries are only seen for 3 files. Also, running the write tests individually which includes the failed cases of WRT18 and WRT19, i don't see this issue. Which means these failed cases are not the cause of these entries inside .glusterfs/unlink
Maybe there is still some ref leak happening in certain code path. @Shashank, To further narrow down if possible could you please run for specific codes and flags and check which specific test led to this leak? To list possible codes and flags, you could try below command (may be in an automated test script) - ./testserver.py --showflags ./testserver.py --showcodes
The test LOCKCHGD and LOCKMRG under lock is causing this issue. ******************************************************************** ./testserver.py 10.70.44.92:/testvolume -v --outfile ~/pynfs.run --maketree --showomit --rundep LOCKCHGD > /tmp/pynfs-hotfix-lock-specific.log Results: INIT st_setclientid.testValid : RUNNING INIT st_setclientid.testValid : PASS MKFILE st_open.testOpen : RUNNING MKFILE st_open.testOpen : PASS LOCK1 st_lock.testFile : RUNNING LOCK1 st_lock.testFile : PASS LOCKCHGD st_lock.testDowngrade : RUNNING LOCKCHGD st_lock.testDowngrade : PASS ************************************************** INIT st_setclientid.testValid : PASS LOCK1 st_lock.testFile : PASS LOCKCHGD st_lock.testDowngrade : PASS MKFILE st_open.testOpen : PASS ************************************************** Command line asked for 4 of 665 tests Files present in unlink path: [root@dhcp47-149 unlink]# cd /bricks/brick2/b2/.glusterfs/unlink/ [root@dhcp47-149 unlink]# ls caa03fa0-e855-4375-98fc-74e361a07cce ******************************************************************* ******************************************************************* ./testserver.py 10.70.44.92:/testvolume -v --outfile ~/pynfs.run --maketree --showomit --rundep LOCKMRG > /tmp/pynfs-hotfix-lock-specific.log Results: INIT st_setclientid.testValid : RUNNING INIT st_setclientid.testValid : PASS MKFILE st_open.testOpen : RUNNING MKFILE st_open.testOpen : PASS LOCK1 st_lock.testFile : RUNNING LOCK1 st_lock.testFile : PASS LKTOVER st_lockt.testOverlap : RUNNING LKTOVER st_lockt.testOverlap : PASS LOCKMRG st_lock.testOverlap : RUNNING LOCKMRG st_lock.testOverlap : PASS ************************************************** INIT st_setclientid.testValid : PASS LKTOVER st_lockt.testOverlap : PASS LOCK1 st_lock.testFile : PASS LOCKMRG st_lock.testOverlap : PASS MKFILE st_open.testOpen : PASS ************************************************** Command line asked for 5 of 665 tests Files present in unlink path: [root@dhcp47-149 unlink]# ls 2e76b1b5-da65-4225-936b-5e943ec2f45e caa03fa0-e855-4375-98fc-74e361a07cce ******************************************************************
[root@dhcp47-149 ~]# ls -ltr /bricks/brick2/b2/.glusterfs/unlink/bd1d4afa-6ecf-4e39-9b18-ccea0101c81a -rwxr-xr-x. 1 root root 0 Sep 26 09:36 /bricks/brick2/b2/.glusterfs/unlink/bd1d4afa-6ecf-4e39-9b18-ccea0101c81a [root@dhcp47-149 ~]# [root@dhcp47-149 ~]# [root@dhcp47-149 ~]# ps aux | grep brick2 root 22966 0.6 0.9 1893280 78536 ? Ssl Sep22 41:47 /usr/sbin/glusterfsd -s 10.70.47.149 --volfile-id testvolume.10.70.47.149.bricks-brick2-b2 -p /var/lib/glusterd/vols/testvolume/run/10.70.47.149-bricks-brick2-b2.pid -S /var/run/gluster/1701b36d8c9fed2c219d4cfac9e70b32.socket --brick-name /bricks/brick2/b2 -l /var/log/glusterfs/bricks/bricks-brick2-b2.log --xlator-option *-posix.glusterd-uuid=767feb78-378b-48c5-8f6b-a08aaae053af --brick-port 49159 --xlator-option testvolume-server.listen-port=49159 root 23585 0.0 0.0 112648 976 pts/1 S+ 03:39 0:00 grep --color=auto brick2 [root@dhcp47-149 ~]# [root@dhcp47-149 ~]# ls -ltr /proc/22966/fd | grep bd1d4afa-6ecf-4e39-9b18-ccea0101c81a lrwx------. 1 root root 64 Sep 27 02:50 32 -> /bricks/brick2/b2/.glusterfs/unlink/bd1d4afa-6ecf-4e39-9b18-ccea0101c81a [root@dhcp47-149 ~]# There is still one open fd for each of these files under unlink directory.
When checked via gdb (for test LOCKCHGD), found that the file "LOCKCHGD" is * opened (comes to FSAL_GLUSTER, makes gfapi call returning glfd) * Write Lock taken on that file * Read Lock is issued for the same offset * file is removed. But there is no close operation which reached FSAL_GLUSTER which resulted in glfd leak for that file.
On further debugging found that , Breakpoint 11, state_lock (entry=0x7fa5ac079e60, owner=owner@entry=0x7fa5cc11a1c0, state=0x7fa5cc1169c0, blocking=blocking@entry=STATE_NON_BLOCKING, block_data=block_data@entry=0x0, lock=lock@entry=0x7fa5f7ffd010, holder=holder@entry=0x7fa5f7ffcfb0, conflict=conflict@entry=0x7fa5f7ffd030) at /usr/src/debug/nfs-ganesha-2.3.1/src/SAL/state_lock.c:2495 2495 { (gdb) p *lock $38 = {lock_sle_type = FSAL_POSIX_LOCK, lock_type = FSAL_LOCK_R, lock_start = 25, lock_length = 75, lock_reclaim = false} (gdb) c Continuing. Breakpoint 15, do_lock_op (entry=entry@entry=0x7fa5ac079e60, lock_op=lock_op@entry=FSAL_OP_LOCK, owner=owner@entry=0x7fa5cc11a1c0, lock=lock@entry=0x7fa5f7ffd010, holder=0x7fa5f7ffcfb0, conflict=0x7fa5f7ffd030, overlap=0, sle_type=sle_type@entry=FSAL_POSIX_LOCK) at /usr/src/debug/nfs-ganesha-2.3.1/src/SAL/state_lock.c:2280 2280 { (gdb) Continuing. Breakpoint 14, merge_lock_entry (entry=entry@entry=0x7fa5ac079e60, lock_entry=lock_entry@entry=0x7fa5dc06d480) at /usr/src/debug/nfs-ganesha-2.3.1/src/SAL/state_lock.c:795 795 { (gdb) p *lock_entry $39 = {sle_list = {next = 0x0, prev = 0x0}, sle_owner_locks = {next = 0x7fa5cc11a1c8, prev = 0x7fa5cc11a430}, sle_client_locks = {next = 0x0, prev = 0x0}, sle_state_locks = {next = 0x7fa5cc116a30, prev = 0x7fa5cc11a450}, sle_export_locks = {next = 0x7fa627c632b0, prev = 0x7fa5cc11a460}, sle_export = 0x7fa627c63268, sle_entry = 0x7fa5ac079e60, sle_block_data = 0x0, sle_owner = 0x7fa5cc11a1c0, sle_state = 0x7fa5cc1169c0, sle_blocked = STATE_NON_BLOCKING, sle_ref_count = 1, sle_lock = {lock_sle_type = FSAL_POSIX_LOCK, lock_type = FSAL_LOCK_R, lock_start = 25, lock_length = 75, lock_reclaim = false}, sle_mutex = {__data = { __lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}} (gdb) p entry->object.file.lock_list $40 = {next = 0x7fa5cc11a420, prev = 0x7fa5cc11a420} (gdb) p &entry->object.file.lock_list $41 = (struct glist_head *) 0x7fa5ac079fd8 (gdb) n 805 glist_for_each_safe(glist, glistn, &entry->object.file.lock_list) { (gdb) 795 { (gdb) 805 glist_for_each_safe(glist, glistn, &entry->object.file.lock_list) { (gdb) 809 if (check_entry == lock_entry) (gdb) 813 (check_entry->sle_owner, lock_entry->sle_owner)) (gdb) 812 if (different_owners (gdb) 817 if (check_entry->sle_blocked != STATE_NON_BLOCKING) (gdb) 820 check_entry_end = lock_end(&check_entry->sle_lock); (gdb) 821 lock_entry_end = lock_end(&lock_entry->sle_lock); (gdb) 823 if ((check_entry_end + 1) < lock_entry->sle_lock.lock_start) (gdb) 827 if ((lock_entry_end + 1) < check_entry->sle_lock.lock_start) (gdb) 836 if ((check_entry->sle_lock.lock_type != (gdb) 838 && ((lock_entry_end < check_entry_end) (gdb) 839 || (check_entry->sle_lock.lock_start < (gdb) 908 lock_entry_end - lock_entry->sle_lock.lock_start + 1; (gdb) 911 LogEntry("Merged", lock_entry); (gdb) 912 LogEntry("Merging removing", check_entry); (gdb) 913 remove_from_locklist(check_entry); (gdb) >>> In the state_lock(), we take pin_ref for the cache_entry and this need to be unref'ed at the end unless it is the first lock entry for that file. 2512 state_status_t state_lock(cache_entry_t *entry, 2513 state_owner_t *owner, 2514 state_t *state, 2515 state_blocking_t blocking, 2516 state_block_data_t *block_data, 2517 fsal_lock_param_t *lock, 2518 state_owner_t **holder, 2519 fsal_lock_param_t *conflict) 2520 { 2521 bool allow = true, overlap = false; 2522 struct glist_head *glist; .... .... ... 2542 cache_status = cache_inode_inc_pin_ref(entry); 2543 2544 .... .... ... 2804 if (allow 2805 || fsal_export->exp_ops.fs_supports(fsal_export, 2806 fso_lock_support_async_block)) { 2807 /* Prepare to make call to FSAL for this lock */ 2808 status = do_lock_op(entry, 2809 lock_op, 2810 owner, 2811 lock, 2812 allow ? holder : NULL, 2813 allow ? conflict : NULL, 2814 overlap, 2815 FSAL_POSIX_LOCK); 2816 } else 2817 status = STATE_LOCK_BLOCKED; 2818 2819 if (status == STATE_SUCCESS) { 2820 /* Merge any touching or overlapping locks into this one */ 2821 LogEntry("FSAL lock acquired, merging locks for", 2822 found_entry); 2823 2824 merge_lock_entry(entry, found_entry); 2825 2826 /* Insert entry into lock list */ 2827 LogEntry("New lock", found_entry); 2828 2829 if (glist_empty(&entry->object.file.lock_list)) { 2830 /* List was empty so we must retain the pin reference 2831 */ 2832 unpin = false; 2833 } >>>> i.e, only if it is first lock entry, we should set unpin to false. ... ... ... 2884 out: 2885 2886 if (unpin) { 2887 /* We don't need to retain the pin and LRU ref we took at the 2888 * top because the file was already pinned for locks or we 2889 * did not add locks to the file. 2890 */ 2891 cache_inode_dec_pin_ref(entry, false); 2892 cache_inode_lru_unref(entry, LRU_FLAG_NONE); 2893 } >> So for second lock on the same file, ideally unpin should be set to true. But from the pynfs test case, looks like it got reset to false. And the reason is that in merge_lock_entry() (at line:2824), we merge the overlapping locks and remove the old lock entry. Since the list has thus become empty, unpin got set to false. This needs to be fixed. While removing last lock entry we need to decrement pin_refcnt as well.
From 57b7c69cac74957b4751e99d03362c126f85cfc3 Mon Sep 17 00:00:00 2001 From: Soumya Koduri <skoduri> Date: Tue, 27 Sep 2016 14:45:41 +0530 Subject: [PATCH] SAL: Ref leak in case of merge of lock entries In state_lock(), we pin the cache_entry before processing the lock operation. This shall be unref'ed at the end of the operation unless it is the first lock entry for that cache entry. But in case of overlapping locks, in merge_lock_entry(), once we merge the locks, we remove the older one. This could lead to empty lock list for that cache_entry, but with the earlier ref not taken out resulting in a leak. The fix is to unpin the cache entry if the lock entry removed was the last one in its lock list. Change-Id: I9ed9e9ad8115d1e3c5dcc28bcc8e733b94b82de2 Signed-off-by: Soumya Koduri <skoduri> --- src/SAL/state_lock.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/src/SAL/state_lock.c b/src/SAL/state_lock.c index bba2bf7..cb628a5 100644 --- a/src/SAL/state_lock.c +++ b/src/SAL/state_lock.c @@ -924,6 +924,13 @@ static void merge_lock_entry(cache_entry_t *entry, LogEntry("Merged", lock_entry); LogEntry("Merging removing", check_entry); remove_from_locklist(check_entry); + + /* if check_entry was the last lock entry, unpin + * the cache entry */ + if (glist_empty(&entry->object.file.lock_list)) { + cache_inode_dec_pin_ref(entry, false); + cache_inode_lru_unref(entry, LRU_FLAG_NONE); + } } } -- 2.5.0 [root@dhcp35-197 nfs-ganesha]# Above patch seem to have fixed the issue. But we need to check if its applicable to latest 2.4 nfs-ganesha code-path as well and get it reviewed upstream.
This fix doesn't seem to be applicable to RHGS 3.2/nfs-ganesha 2.4 codebase.
The fix is available with RHGS 3.2 nfs-ganesha rpms
Verified this bug on RHGS 3.2, nfs-ganesha-2.4.1-1.el7rhgs.x86_64 build. Observed that some of the files are present under .glusterfs/unlink folder once pynfs test suite finishes, but when removing the data from the mount point, these files are getting deleted from the .glusterfs/unlink path.
Edited the doc text for the errata.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2017-0493.html