Bug 1306194

Summary:	NFS+attach tier:IOs hang while attach tier is issued
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Nag Pavan Chilakam <nchilaka>
Component:	tier	Assignee:	Mohammed Rafi KC <rkavunga>
Status:	CLOSED ERRATA	QA Contact:	krishnaram Karthick <kramdoss>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	rhgs-3.1	CC:	asrivast, byarlaga, nbalacha, rcyriac, rgowdapp, rhinduja, rhs-bugs, rkavunga, sankarshan, skoduri, smohan, storage-qa-internal
Target Milestone:	---	Keywords:	ZStream
Target Release:	RHGS 3.1.3
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:	tier-fuse-nfs-samba
Fixed In Version:	glusterfs-3.7.9-4	Doc Type:	Bug Fix
Doc Text:	When attach tier occurred in parallel with I/O, it was possible for the cached subvolume to change. This meant that if an I/O lock had been set on the cached volume just before the cached subvolume changed, unlock operations were sent to the wrong brick, and the lock on the original brick was never released. The location of the last lock is now recorded so that this issue no longer occurs even if the cached subvolume does change during these simultaneous operations.	Story Points:	---
Clone Of:
Clones:	1311002 (view as bug list)		Environment:
Last Closed:	2016-06-23 05:07:24 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1299184, 1306930, 1311002, 1333645, 1347524

Description Nag Pavan Chilakam 2016-02-10 09:45:43 UTC

on a 16 node setup, with ec volume, I started IOs from 3 different clients.
While IOs were going on I attached a tier to the volume, and the IOs were hung.

I tried this twice and both times IOs got hung.

In 3.7.5-17 there used to be a temporary pause(about 5 min) when attach tier was issued. But in this build 3.7.5-19 the IOs have hung for more than 2 Hours




volinfo before and after attach tier:
gluster v create npcvol disperse 12 disperse-data 8 10.70.37.202:/bricks/brick1/npcvol 10.70.37.195:/bricks/brick1/npcvol 10.70.35.133:/bricks/brick1/npcvol 10.70.35.239:/bricks/brick1/npcvol 10.70.35.225:/bricks/brick1/npcvol 10.70.35.11:/bricks/brick1/npcvol 10.70.35.10:/bricks/brick1/npcvol 10.70.35.231:/bricks/brick1/npcvol 10.70.35.176:/bricks/brick1/npcvol 10.70.35.232:/bricks/brick1/npcvol 10.70.35.173:/bricks/brick1/npcvol 10.70.35.163:/bricks/brick1/npcvol 10.70.37.101:/bricks/brick1/npcvol 10.70.37.69:/bricks/brick1/npcvol 10.70.37.60:/bricks/brick1/npcvol 10.70.37.120:/bricks/brick1/npcvol 10.70.37.202:/bricks/brick2/npcvol 10.70.37.195:/bricks/brick2/npcvol 10.70.35.133:/bricks/brick2/npcvol 10.70.35.239:/bricks/brick2/npcvol 10.70.35.225:/bricks/brick2/npcvol 10.70.35.11:/bricks/brick2/npcvol 10.70.35.10:/bricks/brick2/npcvol 10.70.35.231:/bricks/brick2/npcvol

gluster volume tier npcvol attach rep 2 10.70.35.176:/bricks/brick7/npcvol_hot 10.70.35.232:/bricks/brick7/npcvol_hot 10.70.35.173:/bricks/brick7/npcvol_hot 10.70.35.163:/bricks/brick7/npcvol_hot 10.70.37.101:/bricks/brick7/npcvol_hot 10.70.37.69:/bricks/brick7/npcvol_hot 10.70.37.60:/bricks/brick7/npcvol_hot 10.70.37.120:/bricks/brick7/npcvol_hot 10.70.37.195:/bricks/brick7/npcvol_hot 10.70.37.202:/bricks/brick7/npcvol_hot 10.70.35.133:/bricks/brick7/npcvol_hot 10.70.35.239:/bricks/brick7/npcvol_hot



[root@dhcp37-202 ~]# gluster v status npcvol
Status of volume: npcvol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.37.202:/bricks/brick1/npcvol    49161     0          Y       628  
Brick 10.70.37.195:/bricks/brick1/npcvol    49161     0          Y       30704
Brick 10.70.35.133:/bricks/brick1/npcvol    49158     0          Y       24148
Brick 10.70.35.239:/bricks/brick1/npcvol    49158     0          Y       24128
Brick 10.70.35.225:/bricks/brick1/npcvol    49157     0          Y       24467
Brick 10.70.35.11:/bricks/brick1/npcvol     49157     0          Y       24272
Brick 10.70.35.10:/bricks/brick1/npcvol     49160     0          Y       24369
Brick 10.70.35.231:/bricks/brick1/npcvol    49160     0          Y       32189
Brick 10.70.35.176:/bricks/brick1/npcvol    49161     0          Y       1392 
Brick 10.70.35.232:/bricks/brick1/npcvol    49161     0          Y       26630
Brick 10.70.35.173:/bricks/brick1/npcvol    49161     0          Y       28493
Brick 10.70.35.163:/bricks/brick1/npcvol    49161     0          Y       28592
Brick 10.70.37.101:/bricks/brick1/npcvol    49161     0          Y       28410
Brick 10.70.37.69:/bricks/brick1/npcvol     49161     0          Y       357  
Brick 10.70.37.60:/bricks/brick1/npcvol     49161     0          Y       31071
Brick 10.70.37.120:/bricks/brick1/npcvol    49176     0          Y       1311 
Brick 10.70.37.202:/bricks/brick2/npcvol    49162     0          Y       651  
Brick 10.70.37.195:/bricks/brick2/npcvol    49162     0          Y       30723
Brick 10.70.35.133:/bricks/brick2/npcvol    49159     0          Y       24167
Brick 10.70.35.239:/bricks/brick2/npcvol    49159     0          Y       24148
Brick 10.70.35.225:/bricks/brick2/npcvol    49158     0          Y       24486
Brick 10.70.35.11:/bricks/brick2/npcvol     49158     0          Y       24291
Brick 10.70.35.10:/bricks/brick2/npcvol     49161     0          Y       24388
Brick 10.70.35.231:/bricks/brick2/npcvol    49161     0          Y       32208
Snapshot Daemon on localhost                49163     0          Y       810  
NFS Server on localhost                     2049      0          Y       818  
Self-heal Daemon on localhost               N/A       N/A        Y       686  
Quota Daemon on localhost                   N/A       N/A        Y       859  
Snapshot Daemon on 10.70.37.101             49162     0          Y       28538
NFS Server on 10.70.37.101                  2049      0          Y       28546
Self-heal Daemon on 10.70.37.101            N/A       N/A        Y       28439
Quota Daemon on 10.70.37.101                N/A       N/A        Y       28576
Snapshot Daemon on 10.70.37.195             49163     0          Y       30851
NFS Server on 10.70.37.195                  2049      0          Y       30859
Self-heal Daemon on 10.70.37.195            N/A       N/A        Y       30751
Quota Daemon on 10.70.37.195                N/A       N/A        Y       30889
Snapshot Daemon on 10.70.37.120             49177     0          Y       1438 
NFS Server on 10.70.37.120                  2049      0          Y       1446 
Self-heal Daemon on 10.70.37.120            N/A       N/A        Y       1339 
Quota Daemon on 10.70.37.120                N/A       N/A        Y       1477 
Snapshot Daemon on 10.70.37.69              49162     0          Y       492  
NFS Server on 10.70.37.69                   2049      0          Y       500  
Self-heal Daemon on 10.70.37.69             N/A       N/A        Y       385  
Quota Daemon on 10.70.37.69                 N/A       N/A        Y       542  
Snapshot Daemon on 10.70.37.60              49162     0          Y       31197
NFS Server on 10.70.37.60                   2049      0          Y       31205
Self-heal Daemon on 10.70.37.60             N/A       N/A        Y       31099
Quota Daemon on 10.70.37.60                 N/A       N/A        Y       31235
Snapshot Daemon on 10.70.35.239             49160     0          Y       24287
NFS Server on 10.70.35.239                  2049      0          Y       24295
Self-heal Daemon on 10.70.35.239            N/A       N/A        Y       24176
Quota Daemon on 10.70.35.239                N/A       N/A        Y       24325
Snapshot Daemon on 10.70.35.231             49162     0          Y       32340
NFS Server on 10.70.35.231                  2049      0          Y       32348
Self-heal Daemon on 10.70.35.231            N/A       N/A        Y       32236
Quota Daemon on 10.70.35.231                N/A       N/A        Y       32389
Snapshot Daemon on 10.70.35.176             49162     0          Y       1535 
NFS Server on 10.70.35.176                  2049      0          Y       1545 
Self-heal Daemon on 10.70.35.176            N/A       N/A        Y       1420 
Quota Daemon on 10.70.35.176                N/A       N/A        Y       1589 
Snapshot Daemon on dhcp35-225.lab.eng.blr.r
edhat.com                                   49159     0          Y       24623
NFS Server on dhcp35-225.lab.eng.blr.redhat
.com                                        2049      0          Y       24631
Self-heal Daemon on dhcp35-225.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       24514
Quota Daemon on dhcp35-225.lab.eng.blr.redh
at.com                                      N/A       N/A        Y       24661
Snapshot Daemon on 10.70.35.232             49162     0          Y       26759
NFS Server on 10.70.35.232                  2049      0          Y       26767
Self-heal Daemon on 10.70.35.232            N/A       N/A        Y       26658
Quota Daemon on 10.70.35.232                N/A       N/A        Y       26805
Snapshot Daemon on 10.70.35.163             49162     0          Y       28721
NFS Server on 10.70.35.163                  2049      0          Y       28729
Self-heal Daemon on 10.70.35.163            N/A       N/A        Y       28620
Quota Daemon on 10.70.35.163                N/A       N/A        Y       28760
Snapshot Daemon on 10.70.35.11              49159     0          Y       24427
NFS Server on 10.70.35.11                   2049      0          Y       24435
Self-heal Daemon on 10.70.35.11             N/A       N/A        Y       24319
Quota Daemon on 10.70.35.11                 N/A       N/A        Y       24465
Snapshot Daemon on 10.70.35.10              49162     0          Y       24521
NFS Server on 10.70.35.10                   2049      0          Y       24529
Self-heal Daemon on 10.70.35.10             N/A       N/A        Y       24416
Quota Daemon on 10.70.35.10                 N/A       N/A        Y       24560
Snapshot Daemon on 10.70.35.133             49160     0          Y       24314
NFS Server on 10.70.35.133                  2049      0          Y       24322
Self-heal Daemon on 10.70.35.133            N/A       N/A        Y       24203
Quota Daemon on 10.70.35.133                N/A       N/A        Y       24352
Snapshot Daemon on 10.70.35.173             49162     0          Y       28625
NFS Server on 10.70.35.173                  2049      0          Y       28633
Self-heal Daemon on 10.70.35.173            N/A       N/A        Y       28521
Quota Daemon on 10.70.35.173                N/A       N/A        Y       28671
 
Task Status of Volume npcvol
------------------------------------------------------------------------------
There are no active volume tasks
 


#####after attach tier
[root@dhcp37-202 ~]# gluster v status npcvol
Status of volume: npcvol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Hot Bricks:
Brick 10.70.35.239:/bricks/brick7/npcvol_ho
t                                           49161     0          Y       25252
Brick 10.70.35.133:/bricks/brick7/npcvol_ho
t                                           49161     0          Y       25276
Brick 10.70.37.202:/bricks/brick7/npcvol_ho
t                                           49164     0          Y       2028 
Brick 10.70.37.195:/bricks/brick7/npcvol_ho
t                                           49164     0          Y       31793
Brick 10.70.37.120:/bricks/brick7/npcvol_ho
t                                           49178     0          Y       2504 
Brick 10.70.37.60:/bricks/brick7/npcvol_hot 49163     0          Y       32188
Brick 10.70.37.69:/bricks/brick7/npcvol_hot 49163     0          Y       1548 
Brick 10.70.37.101:/bricks/brick7/npcvol_ho
t                                           49163     0          Y       29535
Brick 10.70.35.163:/bricks/brick7/npcvol_ho
t                                           49163     0          Y       29799
Brick 10.70.35.173:/bricks/brick7/npcvol_ho
t                                           49163     0          Y       29669
Brick 10.70.35.232:/bricks/brick7/npcvol_ho
t                                           49163     0          Y       27813
Brick 10.70.35.176:/bricks/brick7/npcvol_ho
t                                           49163     0          Y       2607 
Cold Bricks:
Brick 10.70.37.202:/bricks/brick1/npcvol    49161     0          Y       628  
Brick 10.70.37.195:/bricks/brick1/npcvol    49161     0          Y       30704
Brick 10.70.35.133:/bricks/brick1/npcvol    49158     0          Y       24148
Brick 10.70.35.239:/bricks/brick1/npcvol    49158     0          Y       24128
Brick 10.70.35.225:/bricks/brick1/npcvol    49157     0          Y       24467
Brick 10.70.35.11:/bricks/brick1/npcvol     49157     0          Y       24272
Brick 10.70.35.10:/bricks/brick1/npcvol     49160     0          Y       24369
Brick 10.70.35.231:/bricks/brick1/npcvol    49160     0          Y       32189
Brick 10.70.35.176:/bricks/brick1/npcvol    49161     0          Y       1392 
Brick 10.70.35.232:/bricks/brick1/npcvol    49161     0          Y       26630
Brick 10.70.35.173:/bricks/brick1/npcvol    49161     0          Y       28493
Brick 10.70.35.163:/bricks/brick1/npcvol    49161     0          Y       28592
Brick 10.70.37.101:/bricks/brick1/npcvol    49161     0          Y       28410
Brick 10.70.37.69:/bricks/brick1/npcvol     49161     0          Y       357  
Brick 10.70.37.60:/bricks/brick1/npcvol     49161     0          Y       31071
Brick 10.70.37.120:/bricks/brick1/npcvol    49176     0          Y       1311 
Brick 10.70.37.202:/bricks/brick2/npcvol    49162     0          Y       651  
Brick 10.70.37.195:/bricks/brick2/npcvol    49162     0          Y       30723
Brick 10.70.35.133:/bricks/brick2/npcvol    49159     0          Y       24167
Brick 10.70.35.239:/bricks/brick2/npcvol    49159     0          Y       24148
Brick 10.70.35.225:/bricks/brick2/npcvol    49158     0          Y       24486
Brick 10.70.35.11:/bricks/brick2/npcvol     49158     0          Y       24291
Brick 10.70.35.10:/bricks/brick2/npcvol     49161     0          Y       24388
Brick 10.70.35.231:/bricks/brick2/npcvol    49161     0          Y       32208
Snapshot Daemon on localhost                49163     0          Y       810  
NFS Server on localhost                     2049      0          Y       2048 
Self-heal Daemon on localhost               N/A       N/A        Y       2056 
Quota Daemon on localhost                   N/A       N/A        Y       2064 
Snapshot Daemon on 10.70.37.60              49162     0          Y       31197
NFS Server on 10.70.37.60                   2049      0          Y       32208
Self-heal Daemon on 10.70.37.60             N/A       N/A        Y       32216
Quota Daemon on 10.70.37.60                 N/A       N/A        Y       32224
Snapshot Daemon on 10.70.37.195             49163     0          Y       30851
NFS Server on 10.70.37.195                  2049      0          Y       31813
Self-heal Daemon on 10.70.37.195            N/A       N/A        Y       31821
Quota Daemon on 10.70.37.195                N/A       N/A        Y       31829
Snapshot Daemon on 10.70.37.120             49177     0          Y       1438 
NFS Server on 10.70.37.120                  2049      0          Y       2524 
Self-heal Daemon on 10.70.37.120            N/A       N/A        Y       2532 
Quota Daemon on 10.70.37.120                N/A       N/A        Y       2540 
Snapshot Daemon on 10.70.37.101             49162     0          Y       28538
NFS Server on 10.70.37.101                  2049      0          Y       29555
Self-heal Daemon on 10.70.37.101            N/A       N/A        Y       29563
Quota Daemon on 10.70.37.101                N/A       N/A        Y       29571
Snapshot Daemon on 10.70.37.69              49162     0          Y       492  
NFS Server on 10.70.37.69                   2049      0          Y       1574 
Self-heal Daemon on 10.70.37.69             N/A       N/A        Y       1582 
Quota Daemon on 10.70.37.69                 N/A       N/A        Y       1590 
Snapshot Daemon on 10.70.35.173             49162     0          Y       28625
NFS Server on 10.70.35.173                  2049      0          Y       29690
Self-heal Daemon on 10.70.35.173            N/A       N/A        Y       29698
Quota Daemon on 10.70.35.173                N/A       N/A        Y       29713
Snapshot Daemon on 10.70.35.231             49162     0          Y       32340
NFS Server on 10.70.35.231                  2049      0          Y       1022 
Self-heal Daemon on 10.70.35.231            N/A       N/A        Y       1033 
Quota Daemon on 10.70.35.231                N/A       N/A        Y       1043 
Snapshot Daemon on 10.70.35.176             49162     0          Y       1535 
NFS Server on 10.70.35.176                  2049      0          Y       2627 
Self-heal Daemon on 10.70.35.176            N/A       N/A        Y       2635 
Quota Daemon on 10.70.35.176                N/A       N/A        Y       2659 
Snapshot Daemon on 10.70.35.239             49160     0          Y       24287
NFS Server on 10.70.35.239                  2049      0          Y       25272
Self-heal Daemon on 10.70.35.239            N/A       N/A        Y       25280
Quota Daemon on 10.70.35.239                N/A       N/A        Y       25288
Snapshot Daemon on dhcp35-225.lab.eng.blr.r
edhat.com                                   49159     0          Y       24623
NFS Server on dhcp35-225.lab.eng.blr.redhat
.com                                        2049      0          Y       25622
Self-heal Daemon on dhcp35-225.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       25630
Quota Daemon on dhcp35-225.lab.eng.blr.redh
at.com                                      N/A       N/A        Y       25638
Snapshot Daemon on 10.70.35.11              49159     0          Y       24427
NFS Server on 10.70.35.11                   2049      0          Y       25455
Self-heal Daemon on 10.70.35.11             N/A       N/A        Y       25463
Quota Daemon on 10.70.35.11                 N/A       N/A        Y       25471
Snapshot Daemon on 10.70.35.133             49160     0          Y       24314
NFS Server on 10.70.35.133                  2049      0          Y       25296
Self-heal Daemon on 10.70.35.133            N/A       N/A        Y       25304
Quota Daemon on 10.70.35.133                N/A       N/A        Y       25312
Snapshot Daemon on 10.70.35.10              49162     0          Y       24521
NFS Server on 10.70.35.10                   2049      0          Y       25578
Self-heal Daemon on 10.70.35.10             N/A       N/A        Y       25586
Quota Daemon on 10.70.35.10                 N/A       N/A        Y       25594
Snapshot Daemon on 10.70.35.232             49162     0          Y       26759
NFS Server on 10.70.35.232                  2049      0          Y       27833
Self-heal Daemon on 10.70.35.232            N/A       N/A        Y       27841
Quota Daemon on 10.70.35.232                N/A       N/A        Y       27866
Snapshot Daemon on 10.70.35.163             49162     0          Y       28721
NFS Server on 10.70.35.163                  2049      0          Y       29819
Self-heal Daemon on 10.70.35.163            N/A       N/A        Y       29827
Quota Daemon on 10.70.35.163                N/A       N/A        Y       29852
 
Task Status of Volume npcvol
------------------------------------------------------------------------------
Task                 : Tier migration      
ID                   : 524ad8fe-a743-47df-a4e9-edd2db05c60b
Status               : in progress         
 





Following is the Ios triggered before attach and were going on while attach:
1)client1:created a 300Mb file and started to copy the file to new files 
for i in {2..50};do cp hlfile.1 hlfile.$i;done

2)client2:created 50Mb file and initiated a rename of file continuously 
for i in {2..1000};do cp rename.1 rename.$i;done

3)client3: linux untar
4)copying a 3GB file to create new files in loop 
for i in {1..10};do cp File.mkv cheema$i.mkv;done

4)Client 4: created 10000 Zerobyte file and while then triggered remove of 5000 file so that it goes on while attach tier
[root@rhs-client30 zerobyte]# rm -rf zb{5000..10000}

Comment 2 Nag Pavan Chilakam 2016-02-10 10:18:43 UTC

sosreports of both clients and servers available at 
[nchilaka@rhsqe-repo nchilaka]$ chmod -R 0777 bug.1306194
[nchilaka@rhsqe-repo nchilaka]$ pwd
/home/repo/sosreports/nchilaka

Comment 3 Mohammed Rafi KC 2016-02-10 16:13:31 UTC

There is a blocking lock held on one of the brick, which is not released. All of the other clients are waiting on this lock. We couldn't look into the owner of the lock, because by the time ping timer is expired and lock was released.

After that i/o's resumed. We need to look which client acquired the lock and why they are not releasing it.

Comment 4 Soumya Koduri 2016-02-11 14:00:05 UTC

When we tried to reproduce the issue, we see "Stale File Handle" errors after attach-tier. When did RCA using gdb, we found that ESTALE is returned via svc_client (which is enabled by USS). So we have disabled USS and then re-tried the test. Now we see the mount points hang.


On the server side, the volume got unexported -

[skoduri@skoduri ~]$ showmount -e 10.70.35.225
Export list for 10.70.35.225:
[skoduri@skoduri ~]$ 


Tracing back from the logs and the code, 

[2016-02-11 13:26:02.540565] E [MSGID: 112070] [nfs3.c:896:nfs3_getattr] 0-nfs-nfsv3: Volume is disabled: finalvol
[2016-02-11 13:28:02.600425] E [MSGID: 112070] [nfs3.c:896:nfs3_getattr] 0-nfs-nfsv3: Volume is disabled: finalvol
[2016-02-11 13:28:02.600546] E [rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully


This msg is logged when that volume is not in the list of nfs->initedxl[] list. This list will be updated as part of "nfs_startup_subvolume()" which is invoked during notify of "GF_EVENT_CHILD_UP". So suspecting that nfs xlator has not received this event which resulted in this volume being in unexported state. Attaching the nfs log for further debugging.

Comment 5 Mohammed Rafi KC 2016-02-11 14:55:09 UTC

During the nfs graph initialization, we do a lookup on the root. Looks like this lookup is blocked on a lock which held by another nfs process. We need to figure it out why the nfs server who acquired the lock failed to unlock it.

Comment 6 Raghavendra G 2016-02-12 06:43:36 UTC

Rafi reported that stale lock or unlock failures are seen even when first lookup on root is happening. Here is a most likely RCA. I am assuming a "tier-dht" has two dht subvols "hot-dht" and "cold-dht". Also stale lock is found on one of the bricks corresponding to hot-dht.

1. Lookup on / on tier-dht.
2. Lookup is wound to hashed subvol - cold-dht and is successful.
3. tier-dht figures out / is a directory and does a lookup on both hot-dht and cold-dht.
4. on hot-dht, some subvols - say c1, c2 - are down. But lookup is still successful as some other subvols (say c3, c4) are up.
5. lookup on / is successful on cold-dht.
6. tier-dht decides it needs to heal layout of "/".

From here I am skipping events on cold-dht as they are irrelevant for this RCA.

7. tier-dht winds inodelk on hot-dht. hot-dht winds it to first subvol in the layout-list (Say c1 in this case). Note that subvols with 0 ranges are stored in the beginning of the list. All the subvols where lookup failed (say because of ENOTCONN) ends up with 0 ranges. The relative order of subvols with 0 ranges is undefined and depends on whose lookup failed first.
8. c1 comes up
9. hot-dht acquires lock on c1.
10. tier-dht tries to refresh its layout of /. Winds lookup on hot and cold dhts again.
11. hot-dht sees that layout's generation number is lagging behind current generation number (as c1 came after lookup on / completed). It issues a fresh lookup and reconstructs the layout for /. Since c2 is still down, it is pushed to the beginning of the subvol list of layout.
12. tier-dht is done with healing. It issues unlock on hot-dht.
13. hot-dht winds unlock call to first subvol in layout of /, which is c2.
14. unlock fails with ENOTCONN and a stale lock is left on c1.

Comment 7 Raghavendra G 2016-02-12 06:46:00 UTC

steps 7 and 8 can be swapped for more clarity and RCA is still valid

Comment 10 Mohammed Rafi KC 2016-02-15 06:15:28 UTC

Yes, I have included this as part of the bug 1303045 .

Comment 12 Nag Pavan Chilakam 2016-02-17 06:17:25 UTC

Workaround testing: I tested the work around by restarting volume using force.
While the IOs resumed, which means the workaround is fine, but there is a small problem which has been discussed for which bz#1309186 - file creates fail with " failed to open '<filename>': Too many levels of symbolic links for file create/write when restarting NFS using vol start force  has been raised

Comment 14 Mohammed Rafi KC 2016-04-28 09:51:16 UTC

upstream patch : http://review.gluster.org/#/c/13492/

Comment 15 Mohammed Rafi KC 2016-05-06 09:48:03 UTC

upstream master patch : http://review.gluster.org/#/c/13492/
upstream 3.7 patch : http://review.gluster.org/#/c/14236/
downstream patch : https://code.engineering.redhat.com/gerrit/73806

Comment 17 krishnaram Karthick 2016-05-30 14:42:58 UTC

IO hang during attach tier on NFS mount has not been seen so far during the regression tests. Moving the bug to verified.

Comment 19 Mohammed Rafi KC 2016-06-03 06:12:00 UTC

Looks perfect to me.

Comment 21 errata-xmlrpc 2016-06-23 05:07:24 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240