Bug 1140660

Summary: DHT + rename + rebalance :- after rename and rebalance is completed many Directories and data inside it is not accessible from mount
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Rachana Patel <racpatel>
Component: distributeAssignee: Bug Updates Notification Mailing List <rhs-bugs>
Status: CLOSED DUPLICATE QA Contact: amainkar
Severity: urgent Docs Contact:
Priority: unspecified    
Version: rhgs-3.0CC: nsathyan, rgowdapp, ssaha, ssamanta
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-10-20 06:19:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
gfid_of_dir none

Description Rachana Patel 2014-09-11 12:59:14 UTC
Description of problem:
=======================
DHT + rename + rebalance :- after rename and rebalance is completed many Directories and data inside it is not accessible from mount


Version-Release number of selected component (if applicable):
=============================================================
3.6.0.28-1.el6rhs.x86_64



How reproducible:
=================
intermittent

Steps to Reproduce:
===================
1. create and start distributed volume
2. create 100 directory on mount - dir{1..100}
3. add-brick, start rebalance
4. while rebelance is in progress start moving directory inside each other as below
[root@dht19 screw]# for i in {1..100}; do mv dir$i dir`expr $i + 1`; done

5. once rebalance and rename is completed verify data.

Actual results:
===============
1. unable to access dir35 onwards. All Directories below dir35 (34 Directories) are not accessible, even data inside it is not accessible.

/dir35: No such file or directory
ls: cannot open directory ./dir101/dir99/dir98/dir97/dir96/dir95/dir94/dir93/dir92/dir91/dir90/dir89/dir88/dir87/dir86/dir85/dir84/dir83/dir82/dir81/dir80/dir79/dir78/dir77/dir76/dir75/dir74/dir73/dir72/dir71/dir70/dir69/dir68/dir67/dir66/dir65/dir64/dir63/dir62/dir61/dir60/dir59/dir58/dir57/dir56/dir55/dir54/dir53/dir52/dir51/dir50/dir49/dir48/dir47/dir46/dir45/dir44/dir43/dir42/dir41/dir40/dir39/dir38/dir37/dir36/dir35: No such file or directory

2. sometime mount point shows dir101 and sometimes it doesn't show entire Directory structure
[root@dht17 screw]# ls
count   dir37   f27-71  f45-96  f54-96  f6-68   f68-64   f7-70   f89-15  f95-82  in1  new1
count1  f1-101  f35-6   f47-70  f60-8   f67-89  f72-101  f83-13  f93-11  f99-64  new  newm1
[root@dht17 screw]# ls
count   dir101  dir37   f27-71  f45-96  f54-96  f6-68   f68-64   f7-70   f89-15  f95-82  in1  new1
count1  dir36   f1-101  f35-6   f47-70  f60-8   f67-89  f72-101  f83-13  f93-11  f99-64  new  newm1

Expected results:
=================
All files and Directory should be accessible from mount point







Document URL: 

Section Number and Name: 

Describe the issue: 

Suggestions for improvement: 

Additional information: 


Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Rachana Patel 2014-09-11 13:03:50 UTC
Created attachment 936537 [details]
gfid_of_dir

Comment 4 Sayan Saha 2014-09-12 20:00:06 UTC
This is a good catch but as this is intermittent and more of a data unavailability rather than a data loss we'll target this for 3.0.2.

Comment 5 Raghavendra G 2014-10-15 17:16:24 UTC
This seems to be the same problem of lookup healing the src and dest of a rename. From the ls output we see dir101, dir36 and dir37 on root. This is most likely because of src being healed during mv. As far as the observation that dir36 and dir37 having different gfid goes, the comparision should be b/w 
1. /dir36 and /dir101/dir100/dir99/.../dir38/dir37
2. /dir37 and /dir101/dir100/dir99/.../dir39/dir38

From data attached with this bug, we don't have gfids for /dir36 and /dir37. Do we still have that data?

Also from the gfids attached we can find:

lrwxrwxrwx 1 root root 54 Sep 10 20:48 /brick0/n13/.glusterfs/bc/03/bc0357fd-89c2-451e-bb3b-5f64ce30cdf6 -> ../../00/00/00000000-0000-0000-0000-000000000001/dir36
lrwxrwxrwx 1 root root 54 Sep 10 20:48 /brick0/n14/.glusterfs/bc/03/bc0357fd-89c2-451e-bb3b-5f64ce30cdf6 -> ../../00/00/00000000-0000-0000-0000-000000000001/dir36
lrwxrwxrwx 1 root root 54 Sep 10 21:07 /brick0/n15/.glusterfs/bc/03/bc0357fd-89c2-451e-bb3b-5f64ce30cdf6 -> ../../00/00/00000000-0000-0000-0000-000000000001/dir36
lrwxrwxrwx 1 root root 54 Sep 10 21:07 /brick0/n16/.glusterfs/bc/03/bc0357fd-89c2-451e-bb3b-5f64ce30cdf6 -> ../../00/00/00000000-0000-0000-0000-000000000001/dir36
lrwxrwxrwx 1 root root 54 Sep 10 20:31 /brick0/n3/.glusterfs/bc/03/bc0357fd-89c2-451e-bb3b-5f64ce30cdf6 -> ../../01/cf/01cf21d8-e12e-41e0-a968-d2389f1aa344/dir36
lrwxrwxrwx 1 root root 54 Sep 10 20:31 /brick0/n8/.glusterfs/bc/03/bc0357fd-89c2-451e-bb3b-5f64ce30cdf6 -> ../../01/cf/01cf21d8-e12e-41e0-a968-d2389f1aa344/dir36
lrwxrwxrwx 1 root root 54 Sep 10 20:31 /brick0/screw3/.glusterfs/bc/03/bc0357fd-89c2-451e-bb3b-5f64ce30cdf6 -> ../../01/cf/01cf21d8-e12e-41e0-a968-d2389f1aa344/dir36

As can be seen, as per gfid namespace, dir36 is having two different parents root and /dir101/dir99/.../dir38/dir37 on different bricks. This is most likely a heal during mv 36 /dir101/.../dir37

Another similar case is dir37 which also has parents root and /dir101/dir99/.../dir38 as can be seen below:

lrwxrwxrwx 1 root root 54 Sep 10 20:48 /brick0/n13/.glusterfs/01/cf/01cf21d8-e12e-41e0-a968-d2389f1aa344 -> ../../00/00/00000000-0000-0000-0000-000000000001/dir37
lrwxrwxrwx 1 root root 54 Sep 10 20:48 /brick0/n14/.glusterfs/01/cf/01cf21d8-e12e-41e0-a968-d2389f1aa344 -> ../../00/00/00000000-0000-0000-0000-000000000001/dir37
lrwxrwxrwx 1 root root 54 Sep 10 21:05 /brick0/n15/.glusterfs/01/cf/01cf21d8-e12e-41e0-a968-d2389f1aa344 -> ../../00/00/00000000-0000-0000-0000-000000000001/dir37
lrwxrwxrwx 1 root root 54 Sep 10 21:05 /brick0/n16/.glusterfs/01/cf/01cf21d8-e12e-41e0-a968-d2389f1aa344 -> ../../00/00/00000000-0000-0000-0000-000000000001/dir37
lrwxrwxrwx 1 root root 54 Sep 10 20:31 /brick0/n3/.glusterfs/01/cf/01cf21d8-e12e-41e0-a968-d2389f1aa344 -> ../../59/1e/591e6ab3-5832-4a8b-aa2f-1e576bf16415/dir37
lrwxrwxrwx 1 root root 54 Sep 10 20:31 /brick0/n8/.glusterfs/01/cf/01cf21d8-e12e-41e0-a968-d2389f1aa344 -> ../../59/1e/591e6ab3-5832-4a8b-aa2f-1e576bf16415/dir37
lrwxrwxrwx 1 root root 54 Sep 10 20:31 /brick0/screw3/.glusterfs/01/cf/01cf21d8-e12e-41e0-a968-d2389f1aa344 -> ../../59/1e/591e6ab3-5832-4a8b-aa2f-1e576bf16415/dir37

Comment 6 Raghavendra G 2014-10-15 17:21:24 UTC
*** Bug 1140167 has been marked as a duplicate of this bug. ***

Comment 7 Raghavendra G 2014-10-20 06:19:15 UTC

*** This bug has been marked as a duplicate of bug 1139676 ***