+++ This bug was initially created as a clone of Bug #1311002 +++ +++ This bug was initially created as a clone of Bug #1306194 +++ on a 16 node setup, with ec volume, I started IOs from 3 different clients. While IOs were going on I attached a tier to the volume, and the IOs were hung. I tried this twice and both times IOs got hung. In 3.7.5-17 there used to be a temporary pause(about 5 min) when attach tier was issued. But in this build 3.7.5-19 the IOs have hung for more than 2 Hours volinfo before and after attach tier: gluster v create npcvol disperse 12 disperse-data 8 10.70.37.202:/bricks/brick1/npcvol 10.70.37.195:/bricks/brick1/npcvol 10.70.35.133:/bricks/brick1/npcvol 10.70.35.239:/bricks/brick1/npcvol 10.70.35.225:/bricks/brick1/npcvol 10.70.35.11:/bricks/brick1/npcvol 10.70.35.10:/bricks/brick1/npcvol 10.70.35.231:/bricks/brick1/npcvol 10.70.35.176:/bricks/brick1/npcvol 10.70.35.232:/bricks/brick1/npcvol 10.70.35.173:/bricks/brick1/npcvol 10.70.35.163:/bricks/brick1/npcvol 10.70.37.101:/bricks/brick1/npcvol 10.70.37.69:/bricks/brick1/npcvol 10.70.37.60:/bricks/brick1/npcvol 10.70.37.120:/bricks/brick1/npcvol 10.70.37.202:/bricks/brick2/npcvol 10.70.37.195:/bricks/brick2/npcvol 10.70.35.133:/bricks/brick2/npcvol 10.70.35.239:/bricks/brick2/npcvol 10.70.35.225:/bricks/brick2/npcvol 10.70.35.11:/bricks/brick2/npcvol 10.70.35.10:/bricks/brick2/npcvol 10.70.35.231:/bricks/brick2/npcvol gluster volume tier npcvol attach rep 2 10.70.35.176:/bricks/brick7/npcvol_hot 10.70.35.232:/bricks/brick7/npcvol_hot 10.70.35.173:/bricks/brick7/npcvol_hot 10.70.35.163:/bricks/brick7/npcvol_hot 10.70.37.101:/bricks/brick7/npcvol_hot 10.70.37.69:/bricks/brick7/npcvol_hot 10.70.37.60:/bricks/brick7/npcvol_hot 10.70.37.120:/bricks/brick7/npcvol_hot 10.70.37.195:/bricks/brick7/npcvol_hot 10.70.37.202:/bricks/brick7/npcvol_hot 10.70.35.133:/bricks/brick7/npcvol_hot 10.70.35.239:/bricks/brick7/npcvol_hot [root@dhcp37-202 ~]# gluster v status npcvol Status of volume: npcvol Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.37.202:/bricks/brick1/npcvol 49161 0 Y 628 Brick 10.70.37.195:/bricks/brick1/npcvol 49161 0 Y 30704 Brick 10.70.35.133:/bricks/brick1/npcvol 49158 0 Y 24148 Brick 10.70.35.239:/bricks/brick1/npcvol 49158 0 Y 24128 Brick 10.70.35.225:/bricks/brick1/npcvol 49157 0 Y 24467 Brick 10.70.35.11:/bricks/brick1/npcvol 49157 0 Y 24272 Brick 10.70.35.10:/bricks/brick1/npcvol 49160 0 Y 24369 Brick 10.70.35.231:/bricks/brick1/npcvol 49160 0 Y 32189 Brick 10.70.35.176:/bricks/brick1/npcvol 49161 0 Y 1392 Brick 10.70.35.232:/bricks/brick1/npcvol 49161 0 Y 26630 Brick 10.70.35.173:/bricks/brick1/npcvol 49161 0 Y 28493 Brick 10.70.35.163:/bricks/brick1/npcvol 49161 0 Y 28592 Brick 10.70.37.101:/bricks/brick1/npcvol 49161 0 Y 28410 Brick 10.70.37.69:/bricks/brick1/npcvol 49161 0 Y 357 Brick 10.70.37.60:/bricks/brick1/npcvol 49161 0 Y 31071 Brick 10.70.37.120:/bricks/brick1/npcvol 49176 0 Y 1311 Brick 10.70.37.202:/bricks/brick2/npcvol 49162 0 Y 651 Brick 10.70.37.195:/bricks/brick2/npcvol 49162 0 Y 30723 Brick 10.70.35.133:/bricks/brick2/npcvol 49159 0 Y 24167 Brick 10.70.35.239:/bricks/brick2/npcvol 49159 0 Y 24148 Brick 10.70.35.225:/bricks/brick2/npcvol 49158 0 Y 24486 Brick 10.70.35.11:/bricks/brick2/npcvol 49158 0 Y 24291 Brick 10.70.35.10:/bricks/brick2/npcvol 49161 0 Y 24388 Brick 10.70.35.231:/bricks/brick2/npcvol 49161 0 Y 32208 Snapshot Daemon on localhost 49163 0 Y 810 NFS Server on localhost 2049 0 Y 818 Self-heal Daemon on localhost N/A N/A Y 686 Quota Daemon on localhost N/A N/A Y 859 Snapshot Daemon on 10.70.37.101 49162 0 Y 28538 NFS Server on 10.70.37.101 2049 0 Y 28546 Self-heal Daemon on 10.70.37.101 N/A N/A Y 28439 Quota Daemon on 10.70.37.101 N/A N/A Y 28576 Snapshot Daemon on 10.70.37.195 49163 0 Y 30851 NFS Server on 10.70.37.195 2049 0 Y 30859 Self-heal Daemon on 10.70.37.195 N/A N/A Y 30751 Quota Daemon on 10.70.37.195 N/A N/A Y 30889 Snapshot Daemon on 10.70.37.120 49177 0 Y 1438 NFS Server on 10.70.37.120 2049 0 Y 1446 Self-heal Daemon on 10.70.37.120 N/A N/A Y 1339 Quota Daemon on 10.70.37.120 N/A N/A Y 1477 Snapshot Daemon on 10.70.37.69 49162 0 Y 492 NFS Server on 10.70.37.69 2049 0 Y 500 Self-heal Daemon on 10.70.37.69 N/A N/A Y 385 Quota Daemon on 10.70.37.69 N/A N/A Y 542 Snapshot Daemon on 10.70.37.60 49162 0 Y 31197 NFS Server on 10.70.37.60 2049 0 Y 31205 Self-heal Daemon on 10.70.37.60 N/A N/A Y 31099 Quota Daemon on 10.70.37.60 N/A N/A Y 31235 Snapshot Daemon on 10.70.35.239 49160 0 Y 24287 NFS Server on 10.70.35.239 2049 0 Y 24295 Self-heal Daemon on 10.70.35.239 N/A N/A Y 24176 Quota Daemon on 10.70.35.239 N/A N/A Y 24325 Snapshot Daemon on 10.70.35.231 49162 0 Y 32340 NFS Server on 10.70.35.231 2049 0 Y 32348 Self-heal Daemon on 10.70.35.231 N/A N/A Y 32236 Quota Daemon on 10.70.35.231 N/A N/A Y 32389 Snapshot Daemon on 10.70.35.176 49162 0 Y 1535 NFS Server on 10.70.35.176 2049 0 Y 1545 Self-heal Daemon on 10.70.35.176 N/A N/A Y 1420 Quota Daemon on 10.70.35.176 N/A N/A Y 1589 Snapshot Daemon on dhcp35-225.lab.eng.blr.r edhat.com 49159 0 Y 24623 NFS Server on dhcp35-225.lab.eng.blr.redhat .com 2049 0 Y 24631 Self-heal Daemon on dhcp35-225.lab.eng.blr. redhat.com N/A N/A Y 24514 Quota Daemon on dhcp35-225.lab.eng.blr.redh at.com N/A N/A Y 24661 Snapshot Daemon on 10.70.35.232 49162 0 Y 26759 NFS Server on 10.70.35.232 2049 0 Y 26767 Self-heal Daemon on 10.70.35.232 N/A N/A Y 26658 Quota Daemon on 10.70.35.232 N/A N/A Y 26805 Snapshot Daemon on 10.70.35.163 49162 0 Y 28721 NFS Server on 10.70.35.163 2049 0 Y 28729 Self-heal Daemon on 10.70.35.163 N/A N/A Y 28620 Quota Daemon on 10.70.35.163 N/A N/A Y 28760 Snapshot Daemon on 10.70.35.11 49159 0 Y 24427 NFS Server on 10.70.35.11 2049 0 Y 24435 Self-heal Daemon on 10.70.35.11 N/A N/A Y 24319 Quota Daemon on 10.70.35.11 N/A N/A Y 24465 Snapshot Daemon on 10.70.35.10 49162 0 Y 24521 NFS Server on 10.70.35.10 2049 0 Y 24529 Self-heal Daemon on 10.70.35.10 N/A N/A Y 24416 Quota Daemon on 10.70.35.10 N/A N/A Y 24560 Snapshot Daemon on 10.70.35.133 49160 0 Y 24314 NFS Server on 10.70.35.133 2049 0 Y 24322 Self-heal Daemon on 10.70.35.133 N/A N/A Y 24203 Quota Daemon on 10.70.35.133 N/A N/A Y 24352 Snapshot Daemon on 10.70.35.173 49162 0 Y 28625 NFS Server on 10.70.35.173 2049 0 Y 28633 Self-heal Daemon on 10.70.35.173 N/A N/A Y 28521 Quota Daemon on 10.70.35.173 N/A N/A Y 28671 Task Status of Volume npcvol ------------------------------------------------------------------------------ There are no active volume tasks #####after attach tier [root@dhcp37-202 ~]# gluster v status npcvol Status of volume: npcvol Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Hot Bricks: Brick 10.70.35.239:/bricks/brick7/npcvol_ho t 49161 0 Y 25252 Brick 10.70.35.133:/bricks/brick7/npcvol_ho t 49161 0 Y 25276 Brick 10.70.37.202:/bricks/brick7/npcvol_ho t 49164 0 Y 2028 Brick 10.70.37.195:/bricks/brick7/npcvol_ho t 49164 0 Y 31793 Brick 10.70.37.120:/bricks/brick7/npcvol_ho t 49178 0 Y 2504 Brick 10.70.37.60:/bricks/brick7/npcvol_hot 49163 0 Y 32188 Brick 10.70.37.69:/bricks/brick7/npcvol_hot 49163 0 Y 1548 Brick 10.70.37.101:/bricks/brick7/npcvol_ho t 49163 0 Y 29535 Brick 10.70.35.163:/bricks/brick7/npcvol_ho t 49163 0 Y 29799 Brick 10.70.35.173:/bricks/brick7/npcvol_ho t 49163 0 Y 29669 Brick 10.70.35.232:/bricks/brick7/npcvol_ho t 49163 0 Y 27813 Brick 10.70.35.176:/bricks/brick7/npcvol_ho t 49163 0 Y 2607 Cold Bricks: Brick 10.70.37.202:/bricks/brick1/npcvol 49161 0 Y 628 Brick 10.70.37.195:/bricks/brick1/npcvol 49161 0 Y 30704 Brick 10.70.35.133:/bricks/brick1/npcvol 49158 0 Y 24148 Brick 10.70.35.239:/bricks/brick1/npcvol 49158 0 Y 24128 Brick 10.70.35.225:/bricks/brick1/npcvol 49157 0 Y 24467 Brick 10.70.35.11:/bricks/brick1/npcvol 49157 0 Y 24272 Brick 10.70.35.10:/bricks/brick1/npcvol 49160 0 Y 24369 Brick 10.70.35.231:/bricks/brick1/npcvol 49160 0 Y 32189 Brick 10.70.35.176:/bricks/brick1/npcvol 49161 0 Y 1392 Brick 10.70.35.232:/bricks/brick1/npcvol 49161 0 Y 26630 Brick 10.70.35.173:/bricks/brick1/npcvol 49161 0 Y 28493 Brick 10.70.35.163:/bricks/brick1/npcvol 49161 0 Y 28592 Brick 10.70.37.101:/bricks/brick1/npcvol 49161 0 Y 28410 Brick 10.70.37.69:/bricks/brick1/npcvol 49161 0 Y 357 Brick 10.70.37.60:/bricks/brick1/npcvol 49161 0 Y 31071 Brick 10.70.37.120:/bricks/brick1/npcvol 49176 0 Y 1311 Brick 10.70.37.202:/bricks/brick2/npcvol 49162 0 Y 651 Brick 10.70.37.195:/bricks/brick2/npcvol 49162 0 Y 30723 Brick 10.70.35.133:/bricks/brick2/npcvol 49159 0 Y 24167 Brick 10.70.35.239:/bricks/brick2/npcvol 49159 0 Y 24148 Brick 10.70.35.225:/bricks/brick2/npcvol 49158 0 Y 24486 Brick 10.70.35.11:/bricks/brick2/npcvol 49158 0 Y 24291 Brick 10.70.35.10:/bricks/brick2/npcvol 49161 0 Y 24388 Brick 10.70.35.231:/bricks/brick2/npcvol 49161 0 Y 32208 Snapshot Daemon on localhost 49163 0 Y 810 NFS Server on localhost 2049 0 Y 2048 Self-heal Daemon on localhost N/A N/A Y 2056 Quota Daemon on localhost N/A N/A Y 2064 Snapshot Daemon on 10.70.37.60 49162 0 Y 31197 NFS Server on 10.70.37.60 2049 0 Y 32208 Self-heal Daemon on 10.70.37.60 N/A N/A Y 32216 Quota Daemon on 10.70.37.60 N/A N/A Y 32224 Snapshot Daemon on 10.70.37.195 49163 0 Y 30851 NFS Server on 10.70.37.195 2049 0 Y 31813 Self-heal Daemon on 10.70.37.195 N/A N/A Y 31821 Quota Daemon on 10.70.37.195 N/A N/A Y 31829 Snapshot Daemon on 10.70.37.120 49177 0 Y 1438 NFS Server on 10.70.37.120 2049 0 Y 2524 Self-heal Daemon on 10.70.37.120 N/A N/A Y 2532 Quota Daemon on 10.70.37.120 N/A N/A Y 2540 Snapshot Daemon on 10.70.37.101 49162 0 Y 28538 NFS Server on 10.70.37.101 2049 0 Y 29555 Self-heal Daemon on 10.70.37.101 N/A N/A Y 29563 Quota Daemon on 10.70.37.101 N/A N/A Y 29571 Snapshot Daemon on 10.70.37.69 49162 0 Y 492 NFS Server on 10.70.37.69 2049 0 Y 1574 Self-heal Daemon on 10.70.37.69 N/A N/A Y 1582 Quota Daemon on 10.70.37.69 N/A N/A Y 1590 Snapshot Daemon on 10.70.35.173 49162 0 Y 28625 NFS Server on 10.70.35.173 2049 0 Y 29690 Self-heal Daemon on 10.70.35.173 N/A N/A Y 29698 Quota Daemon on 10.70.35.173 N/A N/A Y 29713 Snapshot Daemon on 10.70.35.231 49162 0 Y 32340 NFS Server on 10.70.35.231 2049 0 Y 1022 Self-heal Daemon on 10.70.35.231 N/A N/A Y 1033 Quota Daemon on 10.70.35.231 N/A N/A Y 1043 Snapshot Daemon on 10.70.35.176 49162 0 Y 1535 NFS Server on 10.70.35.176 2049 0 Y 2627 Self-heal Daemon on 10.70.35.176 N/A N/A Y 2635 Quota Daemon on 10.70.35.176 N/A N/A Y 2659 Snapshot Daemon on 10.70.35.239 49160 0 Y 24287 NFS Server on 10.70.35.239 2049 0 Y 25272 Self-heal Daemon on 10.70.35.239 N/A N/A Y 25280 Quota Daemon on 10.70.35.239 N/A N/A Y 25288 Snapshot Daemon on dhcp35-225.lab.eng.blr.r edhat.com 49159 0 Y 24623 NFS Server on dhcp35-225.lab.eng.blr.redhat .com 2049 0 Y 25622 Self-heal Daemon on dhcp35-225.lab.eng.blr. redhat.com N/A N/A Y 25630 Quota Daemon on dhcp35-225.lab.eng.blr.redh at.com N/A N/A Y 25638 Snapshot Daemon on 10.70.35.11 49159 0 Y 24427 NFS Server on 10.70.35.11 2049 0 Y 25455 Self-heal Daemon on 10.70.35.11 N/A N/A Y 25463 Quota Daemon on 10.70.35.11 N/A N/A Y 25471 Snapshot Daemon on 10.70.35.133 49160 0 Y 24314 NFS Server on 10.70.35.133 2049 0 Y 25296 Self-heal Daemon on 10.70.35.133 N/A N/A Y 25304 Quota Daemon on 10.70.35.133 N/A N/A Y 25312 Snapshot Daemon on 10.70.35.10 49162 0 Y 24521 NFS Server on 10.70.35.10 2049 0 Y 25578 Self-heal Daemon on 10.70.35.10 N/A N/A Y 25586 Quota Daemon on 10.70.35.10 N/A N/A Y 25594 Snapshot Daemon on 10.70.35.232 49162 0 Y 26759 NFS Server on 10.70.35.232 2049 0 Y 27833 Self-heal Daemon on 10.70.35.232 N/A N/A Y 27841 Quota Daemon on 10.70.35.232 N/A N/A Y 27866 Snapshot Daemon on 10.70.35.163 49162 0 Y 28721 NFS Server on 10.70.35.163 2049 0 Y 29819 Self-heal Daemon on 10.70.35.163 N/A N/A Y 29827 Quota Daemon on 10.70.35.163 N/A N/A Y 29852 Task Status of Volume npcvol ------------------------------------------------------------------------------ Task : Tier migration ID : 524ad8fe-a743-47df-a4e9-edd2db05c60b Status : in progress Following is the Ios triggered before attach and were going on while attach: 1)client1:created a 300Mb file and started to copy the file to new files for i in {2..50};do cp hlfile.1 hlfile.$i;done 2)client2:created 50Mb file and initiated a rename of file continuously for i in {2..1000};do cp rename.1 rename.$i;done 3)client3: linux untar 4)copying a 3GB file to create new files in loop for i in {1..10};do cp File.mkv cheema$i.mkv;done 4)Client 4: created 10000 Zerobyte file and while then triggered remove of 5000 file so that it goes on while attach tier [root@rhs-client30 zerobyte]# rm -rf zb{5000..10000} --- Additional comment from Red Hat Bugzilla Rules Engine on 2016-02-10 04:45:45 EST --- This bug is automatically being proposed for the current z-stream release of Red Hat Gluster Storage 3 by setting the release flag 'rhgs‑3.1.z' to '?'. If this bug should be proposed for a different release, please manually change the proposed release flag. --- Additional comment from nchilaka on 2016-02-10 05:18:43 EST --- sosreports of both clients and servers available at [nchilaka@rhsqe-repo nchilaka]$ chmod -R 0777 bug.1306194 [nchilaka@rhsqe-repo nchilaka]$ pwd /home/repo/sosreports/nchilaka --- Additional comment from Mohammed Rafi KC on 2016-02-10 11:13:31 EST --- There is a blocking lock held on one of the brick, which is not released. All of the other clients are waiting on this lock. We couldn't look into the owner of the lock, because by the time ping timer is expired and lock was released. After that i/o's resumed. We need to look which client acquired the lock and why they are not releasing it. --- Additional comment from Soumya Koduri on 2016-02-11 09:00:05 EST --- When we tried to reproduce the issue, we see "Stale File Handle" errors after attach-tier. When did RCA using gdb, we found that ESTALE is returned via svc_client (which is enabled by USS). So we have disabled USS and then re-tried the test. Now we see the mount points hang. On the server side, the volume got unexported - [skoduri@skoduri ~]$ showmount -e 10.70.35.225 Export list for 10.70.35.225: [skoduri@skoduri ~]$ Tracing back from the logs and the code, [2016-02-11 13:26:02.540565] E [MSGID: 112070] [nfs3.c:896:nfs3_getattr] 0-nfs-nfsv3: Volume is disabled: finalvol [2016-02-11 13:28:02.600425] E [MSGID: 112070] [nfs3.c:896:nfs3_getattr] 0-nfs-nfsv3: Volume is disabled: finalvol [2016-02-11 13:28:02.600546] E [rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully This msg is logged when that volume is not in the list of nfs->initedxl[] list. This list will be updated as part of "nfs_startup_subvolume()" which is invoked during notify of "GF_EVENT_CHILD_UP". So suspecting that nfs xlator has not received this event which resulted in this volume being in unexported state. Attaching the nfs log for further debugging. --- Additional comment from Mohammed Rafi KC on 2016-02-11 09:55:09 EST --- During the nfs graph initialization, we do a lookup on the root. Looks like this lookup is blocked on a lock which held by another nfs process. We need to figure it out why the nfs server who acquired the lock failed to unlock it. --- Additional comment from Raghavendra G on 2016-02-12 01:43:36 EST --- Rafi reported that stale lock or unlock failures are seen even when first lookup on root is happening. Here is a most likely RCA. I am assuming a "tier-dht" has two dht subvols "hot-dht" and "cold-dht". Also stale lock is found on one of the bricks corresponding to hot-dht. 1. Lookup on / on tier-dht. 2. Lookup is wound to hashed subvol - cold-dht and is successful. 3. tier-dht figures out / is a directory and does a lookup on both hot-dht and cold-dht. 4. on hot-dht, some subvols - say c1, c2 - are down. But lookup is still successful as some other subvols (say c3, c4) are up. 5. lookup on / is successful on cold-dht. 6. tier-dht decides it needs to heal layout of "/". From here I am skipping events on cold-dht as they are irrelevant for this RCA. 7. tier-dht winds inodelk on hot-dht. hot-dht winds it to first subvol in the layout-list (Say c1 in this case). Note that subvols with 0 ranges are stored in the beginning of the list. All the subvols where lookup failed (say because of ENOTCONN) ends up with 0 ranges. The relative order of subvols with 0 ranges is undefined and depends on whose lookup failed first. 8. c1 comes up 9. hot-dht acquires lock on c1. 10. tier-dht tries to refresh its layout of /. Winds lookup on hot and cold dhts again. 11. hot-dht sees that layout's generation number is lagging behind current generation number (as c1 came after lookup on / completed). It issues a fresh lookup and reconstructs the layout for /. Since c2 is still down, it is pushed to the beginning of the subvol list of layout. 12. tier-dht is done with healing. It issues unlock on hot-dht. 13. hot-dht winds unlock call to first subvol in layout of /, which is c2. 14. unlock fails with ENOTCONN and a stale lock is left on c1. --- Additional comment from Raghavendra G on 2016-02-12 01:46:00 EST --- steps 7 and 8 can be swapped for more clarity and RCA is still valid --- Additional comment from Mohammed Rafi KC on 2016-02-12 05:04:36 EST --- Based on comment 6, it could be a an intrusive fix, that requires testing for pure dht and tier also. To recover from this hang with out any interruption for application continuity would be to restart nfs server, which can be done by volume start force. This will restart only nfs server , if there is no other process requires restart. --- Additional comment from Laura Bailey on 2016-02-14 21:45:57 EST --- Rafi, based on https://bugzilla.redhat.com/show_bug.cgi?id=1303045#c3, I shouldn't document this as a known issue, right? --- Additional comment from Mohammed Rafi KC on 2016-02-15 01:15:28 EST --- Yes, I have included this as part of the bug 1303045 . --- Additional comment from Laura Bailey on 2016-02-15 20:07:44 EST --- Thanks Rafi, removing this from the tracker bug. --- Additional comment from nchilaka on 2016-02-17 01:17:25 EST --- Workaround testing: I tested the work around by restarting volume using force. While the IOs resumed, which means the workaround is fine, but there is a small problem which has been discussed for which bz#1309186 - file creates fail with " failed to open '<filename>': Too many levels of symbolic links for file create/write when restarting NFS using vol start force has been raised --- Additional comment from Vijay Bellur on 2016-02-23 02:36:21 EST --- REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send unlock to the same) posted (#1) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2016-02-27 13:42:49 EST --- REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send unlock to the same) posted (#2) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2016-03-04 00:56:10 EST --- REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send unlock to the same) posted (#3) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2016-03-04 11:45:18 EST --- REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send unlock to the same) posted (#4) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2016-03-08 16:49:15 EST --- REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send unlock to the same) posted (#5) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2016-03-09 01:56:50 EST --- REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send unlock to the same) posted (#6) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2016-03-15 02:15:00 EDT --- REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send unlock to the same) posted (#7) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2016-03-16 08:13:52 EDT --- REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send unlock to the same) posted (#8) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2016-05-03 07:01:09 EDT --- REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send unlock to the same) posted (#9) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2016-05-04 08:42:37 EDT --- REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send unlock to the same) posted (#10) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2016-05-05 06:54:34 EDT --- REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send unlock to the same) posted (#11) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2016-05-05 09:34:58 EDT --- REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send unlock to the same) posted (#12) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2016-05-05 13:33:55 EDT --- REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send unlock to the same) posted (#13) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2016-05-06 01:16:30 EDT --- REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send unlock to the same) posted (#14) for review on master by mohammed rafi kc (rkavunga)
REVIEW: http://review.gluster.org/14236 (dht:remember locked subvol and send unlock to the same) posted (#1) for review on release-3.7 by mohammed rafi kc (rkavunga)
COMMIT: http://review.gluster.org/14236 committed in release-3.7 by Raghavendra G (rgowdapp) ------ commit fd8921b9eb03af69815bb2d7cff07b63048c2d5a Author: Mohammed Rafi KC <rkavunga> Date: Tue May 3 14:43:20 2016 +0530 dht:remember locked subvol and send unlock to the same During locking we send lock request to cached subvol, and normally we unlock to the cached subvol But with parallel fresh lookup on a directory, there is a race window where the cached subvol can change and the unlock can go into a different subvol from which we took lock. This will result in a stale lock held on one of the subvol. So we will store the details of subvol which we took the lock and will unlock from the same subvol back port of> >Change-Id: I47df99491671b10624eb37d1d17e40bacf0b15eb >BUG: 1311002 >Signed-off-by: Mohammed Rafi KC <rkavunga> >Reviewed-on: http://review.gluster.org/13492 >Reviewed-by: N Balachandran <nbalacha> >Smoke: Gluster Build System <jenkins.com> >NetBSD-regression: NetBSD Build System <jenkins.org> >Reviewed-by: Raghavendra G <rgowdapp> >CentOS-regression: Gluster Build System <jenkins.com> Change-Id: Ia847e7115d2296ae9811b14a956f3b6bf39bd86d BUG: 1333645 Signed-off-by: Mohammed Rafi KC <rkavunga> Reviewed-on: http://review.gluster.org/14236 Smoke: Gluster Build System <jenkins.com> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.com> Reviewed-by: Raghavendra G <rgowdapp>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.12, please open a new bug report. glusterfs-3.7.12 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://www.gluster.org/pipermail/gluster-devel/2016-June/049918.html [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user