Bug 1140660 - DHT + rename + rebalance :- after rename and rebalance is completed many Directories and data inside it is not accessible from mount
Summary: DHT + rename + rebalance :- after rename and rebalance is completed many Dire...
Keywords:
Status: CLOSED DUPLICATE of bug 1139676
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: distribute
Version: rhgs-3.0
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Bug Updates Notification Mailing List
QA Contact: amainkar
URL:
Whiteboard:
: 1140167 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-09-11 12:59 UTC by Rachana Patel
Modified: 2015-05-13 17:37 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-10-20 06:19:15 UTC
Embargoed:


Attachments (Terms of Use)
gfid_of_dir (55.71 KB, text/plain)
2014-09-11 13:03 UTC, Rachana Patel
no flags Details

Description Rachana Patel 2014-09-11 12:59:14 UTC
Description of problem:
=======================
DHT + rename + rebalance :- after rename and rebalance is completed many Directories and data inside it is not accessible from mount


Version-Release number of selected component (if applicable):
=============================================================
3.6.0.28-1.el6rhs.x86_64



How reproducible:
=================
intermittent

Steps to Reproduce:
===================
1. create and start distributed volume
2. create 100 directory on mount - dir{1..100}
3. add-brick, start rebalance
4. while rebelance is in progress start moving directory inside each other as below
[root@dht19 screw]# for i in {1..100}; do mv dir$i dir`expr $i + 1`; done

5. once rebalance and rename is completed verify data.

Actual results:
===============
1. unable to access dir35 onwards. All Directories below dir35 (34 Directories) are not accessible, even data inside it is not accessible.

/dir35: No such file or directory
ls: cannot open directory ./dir101/dir99/dir98/dir97/dir96/dir95/dir94/dir93/dir92/dir91/dir90/dir89/dir88/dir87/dir86/dir85/dir84/dir83/dir82/dir81/dir80/dir79/dir78/dir77/dir76/dir75/dir74/dir73/dir72/dir71/dir70/dir69/dir68/dir67/dir66/dir65/dir64/dir63/dir62/dir61/dir60/dir59/dir58/dir57/dir56/dir55/dir54/dir53/dir52/dir51/dir50/dir49/dir48/dir47/dir46/dir45/dir44/dir43/dir42/dir41/dir40/dir39/dir38/dir37/dir36/dir35: No such file or directory

2. sometime mount point shows dir101 and sometimes it doesn't show entire Directory structure
[root@dht17 screw]# ls
count   dir37   f27-71  f45-96  f54-96  f6-68   f68-64   f7-70   f89-15  f95-82  in1  new1
count1  f1-101  f35-6   f47-70  f60-8   f67-89  f72-101  f83-13  f93-11  f99-64  new  newm1
[root@dht17 screw]# ls
count   dir101  dir37   f27-71  f45-96  f54-96  f6-68   f68-64   f7-70   f89-15  f95-82  in1  new1
count1  dir36   f1-101  f35-6   f47-70  f60-8   f67-89  f72-101  f83-13  f93-11  f99-64  new  newm1

Expected results:
=================
All files and Directory should be accessible from mount point







Document URL: 

Section Number and Name: 

Describe the issue: 

Suggestions for improvement: 

Additional information: 


Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Rachana Patel 2014-09-11 13:03:50 UTC
Created attachment 936537 [details]
gfid_of_dir

Comment 4 Sayan Saha 2014-09-12 20:00:06 UTC
This is a good catch but as this is intermittent and more of a data unavailability rather than a data loss we'll target this for 3.0.2.

Comment 5 Raghavendra G 2014-10-15 17:16:24 UTC
This seems to be the same problem of lookup healing the src and dest of a rename. From the ls output we see dir101, dir36 and dir37 on root. This is most likely because of src being healed during mv. As far as the observation that dir36 and dir37 having different gfid goes, the comparision should be b/w 
1. /dir36 and /dir101/dir100/dir99/.../dir38/dir37
2. /dir37 and /dir101/dir100/dir99/.../dir39/dir38

From data attached with this bug, we don't have gfids for /dir36 and /dir37. Do we still have that data?

Also from the gfids attached we can find:

lrwxrwxrwx 1 root root 54 Sep 10 20:48 /brick0/n13/.glusterfs/bc/03/bc0357fd-89c2-451e-bb3b-5f64ce30cdf6 -> ../../00/00/00000000-0000-0000-0000-000000000001/dir36
lrwxrwxrwx 1 root root 54 Sep 10 20:48 /brick0/n14/.glusterfs/bc/03/bc0357fd-89c2-451e-bb3b-5f64ce30cdf6 -> ../../00/00/00000000-0000-0000-0000-000000000001/dir36
lrwxrwxrwx 1 root root 54 Sep 10 21:07 /brick0/n15/.glusterfs/bc/03/bc0357fd-89c2-451e-bb3b-5f64ce30cdf6 -> ../../00/00/00000000-0000-0000-0000-000000000001/dir36
lrwxrwxrwx 1 root root 54 Sep 10 21:07 /brick0/n16/.glusterfs/bc/03/bc0357fd-89c2-451e-bb3b-5f64ce30cdf6 -> ../../00/00/00000000-0000-0000-0000-000000000001/dir36
lrwxrwxrwx 1 root root 54 Sep 10 20:31 /brick0/n3/.glusterfs/bc/03/bc0357fd-89c2-451e-bb3b-5f64ce30cdf6 -> ../../01/cf/01cf21d8-e12e-41e0-a968-d2389f1aa344/dir36
lrwxrwxrwx 1 root root 54 Sep 10 20:31 /brick0/n8/.glusterfs/bc/03/bc0357fd-89c2-451e-bb3b-5f64ce30cdf6 -> ../../01/cf/01cf21d8-e12e-41e0-a968-d2389f1aa344/dir36
lrwxrwxrwx 1 root root 54 Sep 10 20:31 /brick0/screw3/.glusterfs/bc/03/bc0357fd-89c2-451e-bb3b-5f64ce30cdf6 -> ../../01/cf/01cf21d8-e12e-41e0-a968-d2389f1aa344/dir36

As can be seen, as per gfid namespace, dir36 is having two different parents root and /dir101/dir99/.../dir38/dir37 on different bricks. This is most likely a heal during mv 36 /dir101/.../dir37

Another similar case is dir37 which also has parents root and /dir101/dir99/.../dir38 as can be seen below:

lrwxrwxrwx 1 root root 54 Sep 10 20:48 /brick0/n13/.glusterfs/01/cf/01cf21d8-e12e-41e0-a968-d2389f1aa344 -> ../../00/00/00000000-0000-0000-0000-000000000001/dir37
lrwxrwxrwx 1 root root 54 Sep 10 20:48 /brick0/n14/.glusterfs/01/cf/01cf21d8-e12e-41e0-a968-d2389f1aa344 -> ../../00/00/00000000-0000-0000-0000-000000000001/dir37
lrwxrwxrwx 1 root root 54 Sep 10 21:05 /brick0/n15/.glusterfs/01/cf/01cf21d8-e12e-41e0-a968-d2389f1aa344 -> ../../00/00/00000000-0000-0000-0000-000000000001/dir37
lrwxrwxrwx 1 root root 54 Sep 10 21:05 /brick0/n16/.glusterfs/01/cf/01cf21d8-e12e-41e0-a968-d2389f1aa344 -> ../../00/00/00000000-0000-0000-0000-000000000001/dir37
lrwxrwxrwx 1 root root 54 Sep 10 20:31 /brick0/n3/.glusterfs/01/cf/01cf21d8-e12e-41e0-a968-d2389f1aa344 -> ../../59/1e/591e6ab3-5832-4a8b-aa2f-1e576bf16415/dir37
lrwxrwxrwx 1 root root 54 Sep 10 20:31 /brick0/n8/.glusterfs/01/cf/01cf21d8-e12e-41e0-a968-d2389f1aa344 -> ../../59/1e/591e6ab3-5832-4a8b-aa2f-1e576bf16415/dir37
lrwxrwxrwx 1 root root 54 Sep 10 20:31 /brick0/screw3/.glusterfs/01/cf/01cf21d8-e12e-41e0-a968-d2389f1aa344 -> ../../59/1e/591e6ab3-5832-4a8b-aa2f-1e576bf16415/dir37

Comment 6 Raghavendra G 2014-10-15 17:21:24 UTC
*** Bug 1140167 has been marked as a duplicate of this bug. ***

Comment 7 Raghavendra G 2014-10-20 06:19:15 UTC

*** This bug has been marked as a duplicate of bug 1139676 ***


Note You need to log in before you can comment on or make changes to this bug.