Bug 1379329
Summary: | Some of the files goes into .glusterfs/unlink folder after execution of pynfs test suite. | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Shashank Raj <sraj> |
Component: | nfs-ganesha | Assignee: | Soumya Koduri <skoduri> |
Status: | CLOSED ERRATA | QA Contact: | Arthy Loganathan <aloganat> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | rhgs-3.1 | CC: | aloganat, bmohanra, jthottan, kkeithle, mzywusko, ndevos, rcyriac, rhinduja, rhs-bugs, skoduri, storage-qa-internal |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | RHGS 3.2.0 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | nfs-ganesha-2.4.0-2 | Doc Type: | Bug Fix |
Doc Text: |
Previously, there was an fd-leak on the file on which lock operations have been performed from a nfs-mount of the volume that is exported via NFS-Ganesha server. When that file is deleted, it was not removed from .glusterfs/unlink folder at the backend consuming memory. With this fix, all the files that are removed from the mount point shall be removed completely from the backend as well.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2017-03-23 06:23:09 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1351528 |
Description
Shashank Raj
2016-09-26 11:45:48 UTC
With 3.1.3, the entries inside .glusterfs/unlink were present for almost all the files even after removal but with 3.1.3 + hotfix, the entries are only seen for 3 files. Also, running the write tests individually which includes the failed cases of WRT18 and WRT19, i don't see this issue. Which means these failed cases are not the cause of these entries inside .glusterfs/unlink Maybe there is still some ref leak happening in certain code path. @Shashank, To further narrow down if possible could you please run for specific codes and flags and check which specific test led to this leak? To list possible codes and flags, you could try below command (may be in an automated test script) - ./testserver.py --showflags ./testserver.py --showcodes The test LOCKCHGD and LOCKMRG under lock is causing this issue. ******************************************************************** ./testserver.py 10.70.44.92:/testvolume -v --outfile ~/pynfs.run --maketree --showomit --rundep LOCKCHGD > /tmp/pynfs-hotfix-lock-specific.log Results: INIT st_setclientid.testValid : RUNNING INIT st_setclientid.testValid : PASS MKFILE st_open.testOpen : RUNNING MKFILE st_open.testOpen : PASS LOCK1 st_lock.testFile : RUNNING LOCK1 st_lock.testFile : PASS LOCKCHGD st_lock.testDowngrade : RUNNING LOCKCHGD st_lock.testDowngrade : PASS ************************************************** INIT st_setclientid.testValid : PASS LOCK1 st_lock.testFile : PASS LOCKCHGD st_lock.testDowngrade : PASS MKFILE st_open.testOpen : PASS ************************************************** Command line asked for 4 of 665 tests Files present in unlink path: [root@dhcp47-149 unlink]# cd /bricks/brick2/b2/.glusterfs/unlink/ [root@dhcp47-149 unlink]# ls caa03fa0-e855-4375-98fc-74e361a07cce ******************************************************************* ******************************************************************* ./testserver.py 10.70.44.92:/testvolume -v --outfile ~/pynfs.run --maketree --showomit --rundep LOCKMRG > /tmp/pynfs-hotfix-lock-specific.log Results: INIT st_setclientid.testValid : RUNNING INIT st_setclientid.testValid : PASS MKFILE st_open.testOpen : RUNNING MKFILE st_open.testOpen : PASS LOCK1 st_lock.testFile : RUNNING LOCK1 st_lock.testFile : PASS LKTOVER st_lockt.testOverlap : RUNNING LKTOVER st_lockt.testOverlap : PASS LOCKMRG st_lock.testOverlap : RUNNING LOCKMRG st_lock.testOverlap : PASS ************************************************** INIT st_setclientid.testValid : PASS LKTOVER st_lockt.testOverlap : PASS LOCK1 st_lock.testFile : PASS LOCKMRG st_lock.testOverlap : PASS MKFILE st_open.testOpen : PASS ************************************************** Command line asked for 5 of 665 tests Files present in unlink path: [root@dhcp47-149 unlink]# ls 2e76b1b5-da65-4225-936b-5e943ec2f45e caa03fa0-e855-4375-98fc-74e361a07cce ****************************************************************** [root@dhcp47-149 ~]# ls -ltr /bricks/brick2/b2/.glusterfs/unlink/bd1d4afa-6ecf-4e39-9b18-ccea0101c81a -rwxr-xr-x. 1 root root 0 Sep 26 09:36 /bricks/brick2/b2/.glusterfs/unlink/bd1d4afa-6ecf-4e39-9b18-ccea0101c81a [root@dhcp47-149 ~]# [root@dhcp47-149 ~]# [root@dhcp47-149 ~]# ps aux | grep brick2 root 22966 0.6 0.9 1893280 78536 ? Ssl Sep22 41:47 /usr/sbin/glusterfsd -s 10.70.47.149 --volfile-id testvolume.10.70.47.149.bricks-brick2-b2 -p /var/lib/glusterd/vols/testvolume/run/10.70.47.149-bricks-brick2-b2.pid -S /var/run/gluster/1701b36d8c9fed2c219d4cfac9e70b32.socket --brick-name /bricks/brick2/b2 -l /var/log/glusterfs/bricks/bricks-brick2-b2.log --xlator-option *-posix.glusterd-uuid=767feb78-378b-48c5-8f6b-a08aaae053af --brick-port 49159 --xlator-option testvolume-server.listen-port=49159 root 23585 0.0 0.0 112648 976 pts/1 S+ 03:39 0:00 grep --color=auto brick2 [root@dhcp47-149 ~]# [root@dhcp47-149 ~]# ls -ltr /proc/22966/fd | grep bd1d4afa-6ecf-4e39-9b18-ccea0101c81a lrwx------. 1 root root 64 Sep 27 02:50 32 -> /bricks/brick2/b2/.glusterfs/unlink/bd1d4afa-6ecf-4e39-9b18-ccea0101c81a [root@dhcp47-149 ~]# There is still one open fd for each of these files under unlink directory. When checked via gdb (for test LOCKCHGD), found that the file "LOCKCHGD" is * opened (comes to FSAL_GLUSTER, makes gfapi call returning glfd) * Write Lock taken on that file * Read Lock is issued for the same offset * file is removed. But there is no close operation which reached FSAL_GLUSTER which resulted in glfd leak for that file. On further debugging found that , Breakpoint 11, state_lock (entry=0x7fa5ac079e60, owner=owner@entry=0x7fa5cc11a1c0, state=0x7fa5cc1169c0, blocking=blocking@entry=STATE_NON_BLOCKING, block_data=block_data@entry=0x0, lock=lock@entry=0x7fa5f7ffd010, holder=holder@entry=0x7fa5f7ffcfb0, conflict=conflict@entry=0x7fa5f7ffd030) at /usr/src/debug/nfs-ganesha-2.3.1/src/SAL/state_lock.c:2495 2495 { (gdb) p *lock $38 = {lock_sle_type = FSAL_POSIX_LOCK, lock_type = FSAL_LOCK_R, lock_start = 25, lock_length = 75, lock_reclaim = false} (gdb) c Continuing. Breakpoint 15, do_lock_op (entry=entry@entry=0x7fa5ac079e60, lock_op=lock_op@entry=FSAL_OP_LOCK, owner=owner@entry=0x7fa5cc11a1c0, lock=lock@entry=0x7fa5f7ffd010, holder=0x7fa5f7ffcfb0, conflict=0x7fa5f7ffd030, overlap=0, sle_type=sle_type@entry=FSAL_POSIX_LOCK) at /usr/src/debug/nfs-ganesha-2.3.1/src/SAL/state_lock.c:2280 2280 { (gdb) Continuing. Breakpoint 14, merge_lock_entry (entry=entry@entry=0x7fa5ac079e60, lock_entry=lock_entry@entry=0x7fa5dc06d480) at /usr/src/debug/nfs-ganesha-2.3.1/src/SAL/state_lock.c:795 795 { (gdb) p *lock_entry $39 = {sle_list = {next = 0x0, prev = 0x0}, sle_owner_locks = {next = 0x7fa5cc11a1c8, prev = 0x7fa5cc11a430}, sle_client_locks = {next = 0x0, prev = 0x0}, sle_state_locks = {next = 0x7fa5cc116a30, prev = 0x7fa5cc11a450}, sle_export_locks = {next = 0x7fa627c632b0, prev = 0x7fa5cc11a460}, sle_export = 0x7fa627c63268, sle_entry = 0x7fa5ac079e60, sle_block_data = 0x0, sle_owner = 0x7fa5cc11a1c0, sle_state = 0x7fa5cc1169c0, sle_blocked = STATE_NON_BLOCKING, sle_ref_count = 1, sle_lock = {lock_sle_type = FSAL_POSIX_LOCK, lock_type = FSAL_LOCK_R, lock_start = 25, lock_length = 75, lock_reclaim = false}, sle_mutex = {__data = { __lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}} (gdb) p entry->object.file.lock_list $40 = {next = 0x7fa5cc11a420, prev = 0x7fa5cc11a420} (gdb) p &entry->object.file.lock_list $41 = (struct glist_head *) 0x7fa5ac079fd8 (gdb) n 805 glist_for_each_safe(glist, glistn, &entry->object.file.lock_list) { (gdb) 795 { (gdb) 805 glist_for_each_safe(glist, glistn, &entry->object.file.lock_list) { (gdb) 809 if (check_entry == lock_entry) (gdb) 813 (check_entry->sle_owner, lock_entry->sle_owner)) (gdb) 812 if (different_owners (gdb) 817 if (check_entry->sle_blocked != STATE_NON_BLOCKING) (gdb) 820 check_entry_end = lock_end(&check_entry->sle_lock); (gdb) 821 lock_entry_end = lock_end(&lock_entry->sle_lock); (gdb) 823 if ((check_entry_end + 1) < lock_entry->sle_lock.lock_start) (gdb) 827 if ((lock_entry_end + 1) < check_entry->sle_lock.lock_start) (gdb) 836 if ((check_entry->sle_lock.lock_type != (gdb) 838 && ((lock_entry_end < check_entry_end) (gdb) 839 || (check_entry->sle_lock.lock_start < (gdb) 908 lock_entry_end - lock_entry->sle_lock.lock_start + 1; (gdb) 911 LogEntry("Merged", lock_entry); (gdb) 912 LogEntry("Merging removing", check_entry); (gdb) 913 remove_from_locklist(check_entry); (gdb) >>> In the state_lock(), we take pin_ref for the cache_entry and this need to be unref'ed at the end unless it is the first lock entry for that file. 2512 state_status_t state_lock(cache_entry_t *entry, 2513 state_owner_t *owner, 2514 state_t *state, 2515 state_blocking_t blocking, 2516 state_block_data_t *block_data, 2517 fsal_lock_param_t *lock, 2518 state_owner_t **holder, 2519 fsal_lock_param_t *conflict) 2520 { 2521 bool allow = true, overlap = false; 2522 struct glist_head *glist; .... .... ... 2542 cache_status = cache_inode_inc_pin_ref(entry); 2543 2544 .... .... ... 2804 if (allow 2805 || fsal_export->exp_ops.fs_supports(fsal_export, 2806 fso_lock_support_async_block)) { 2807 /* Prepare to make call to FSAL for this lock */ 2808 status = do_lock_op(entry, 2809 lock_op, 2810 owner, 2811 lock, 2812 allow ? holder : NULL, 2813 allow ? conflict : NULL, 2814 overlap, 2815 FSAL_POSIX_LOCK); 2816 } else 2817 status = STATE_LOCK_BLOCKED; 2818 2819 if (status == STATE_SUCCESS) { 2820 /* Merge any touching or overlapping locks into this one */ 2821 LogEntry("FSAL lock acquired, merging locks for", 2822 found_entry); 2823 2824 merge_lock_entry(entry, found_entry); 2825 2826 /* Insert entry into lock list */ 2827 LogEntry("New lock", found_entry); 2828 2829 if (glist_empty(&entry->object.file.lock_list)) { 2830 /* List was empty so we must retain the pin reference 2831 */ 2832 unpin = false; 2833 } >>>> i.e, only if it is first lock entry, we should set unpin to false. ... ... ... 2884 out: 2885 2886 if (unpin) { 2887 /* We don't need to retain the pin and LRU ref we took at the 2888 * top because the file was already pinned for locks or we 2889 * did not add locks to the file. 2890 */ 2891 cache_inode_dec_pin_ref(entry, false); 2892 cache_inode_lru_unref(entry, LRU_FLAG_NONE); 2893 } >> So for second lock on the same file, ideally unpin should be set to true. But from the pynfs test case, looks like it got reset to false. And the reason is that in merge_lock_entry() (at line:2824), we merge the overlapping locks and remove the old lock entry. Since the list has thus become empty, unpin got set to false. This needs to be fixed. While removing last lock entry we need to decrement pin_refcnt as well. From 57b7c69cac74957b4751e99d03362c126f85cfc3 Mon Sep 17 00:00:00 2001 From: Soumya Koduri <skoduri> Date: Tue, 27 Sep 2016 14:45:41 +0530 Subject: [PATCH] SAL: Ref leak in case of merge of lock entries In state_lock(), we pin the cache_entry before processing the lock operation. This shall be unref'ed at the end of the operation unless it is the first lock entry for that cache entry. But in case of overlapping locks, in merge_lock_entry(), once we merge the locks, we remove the older one. This could lead to empty lock list for that cache_entry, but with the earlier ref not taken out resulting in a leak. The fix is to unpin the cache entry if the lock entry removed was the last one in its lock list. Change-Id: I9ed9e9ad8115d1e3c5dcc28bcc8e733b94b82de2 Signed-off-by: Soumya Koduri <skoduri> --- src/SAL/state_lock.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/src/SAL/state_lock.c b/src/SAL/state_lock.c index bba2bf7..cb628a5 100644 --- a/src/SAL/state_lock.c +++ b/src/SAL/state_lock.c @@ -924,6 +924,13 @@ static void merge_lock_entry(cache_entry_t *entry, LogEntry("Merged", lock_entry); LogEntry("Merging removing", check_entry); remove_from_locklist(check_entry); + + /* if check_entry was the last lock entry, unpin + * the cache entry */ + if (glist_empty(&entry->object.file.lock_list)) { + cache_inode_dec_pin_ref(entry, false); + cache_inode_lru_unref(entry, LRU_FLAG_NONE); + } } } -- 2.5.0 [root@dhcp35-197 nfs-ganesha]# Above patch seem to have fixed the issue. But we need to check if its applicable to latest 2.4 nfs-ganesha code-path as well and get it reviewed upstream. This fix doesn't seem to be applicable to RHGS 3.2/nfs-ganesha 2.4 codebase. The fix is available with RHGS 3.2 nfs-ganesha rpms Verified this bug on RHGS 3.2, nfs-ganesha-2.4.1-1.el7rhgs.x86_64 build. Observed that some of the files are present under .glusterfs/unlink folder once pynfs test suite finishes, but when removing the data from the mount point, these files are getting deleted from the .glusterfs/unlink path. Edited the doc text for the errata. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2017-0493.html |