Description of problem: ======================= DHT + rename + rebalance :- after rename and rebalance is completed many Directories and data inside it is not accessible from mount Version-Release number of selected component (if applicable): ============================================================= 3.6.0.28-1.el6rhs.x86_64 How reproducible: ================= intermittent Steps to Reproduce: =================== 1. create and start distributed volume 2. create 100 directory on mount - dir{1..100} 3. add-brick, start rebalance 4. while rebelance is in progress start moving directory inside each other as below [root@dht19 screw]# for i in {1..100}; do mv dir$i dir`expr $i + 1`; done 5. once rebalance and rename is completed verify data. Actual results: =============== 1. unable to access dir35 onwards. All Directories below dir35 (34 Directories) are not accessible, even data inside it is not accessible. /dir35: No such file or directory ls: cannot open directory ./dir101/dir99/dir98/dir97/dir96/dir95/dir94/dir93/dir92/dir91/dir90/dir89/dir88/dir87/dir86/dir85/dir84/dir83/dir82/dir81/dir80/dir79/dir78/dir77/dir76/dir75/dir74/dir73/dir72/dir71/dir70/dir69/dir68/dir67/dir66/dir65/dir64/dir63/dir62/dir61/dir60/dir59/dir58/dir57/dir56/dir55/dir54/dir53/dir52/dir51/dir50/dir49/dir48/dir47/dir46/dir45/dir44/dir43/dir42/dir41/dir40/dir39/dir38/dir37/dir36/dir35: No such file or directory 2. sometime mount point shows dir101 and sometimes it doesn't show entire Directory structure [root@dht17 screw]# ls count dir37 f27-71 f45-96 f54-96 f6-68 f68-64 f7-70 f89-15 f95-82 in1 new1 count1 f1-101 f35-6 f47-70 f60-8 f67-89 f72-101 f83-13 f93-11 f99-64 new newm1 [root@dht17 screw]# ls count dir101 dir37 f27-71 f45-96 f54-96 f6-68 f68-64 f7-70 f89-15 f95-82 in1 new1 count1 dir36 f1-101 f35-6 f47-70 f60-8 f67-89 f72-101 f83-13 f93-11 f99-64 new newm1 Expected results: ================= All files and Directory should be accessible from mount point Document URL: Section Number and Name: Describe the issue: Suggestions for improvement: Additional information: Description of problem: Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Created attachment 936537 [details] gfid_of_dir
This is a good catch but as this is intermittent and more of a data unavailability rather than a data loss we'll target this for 3.0.2.
This seems to be the same problem of lookup healing the src and dest of a rename. From the ls output we see dir101, dir36 and dir37 on root. This is most likely because of src being healed during mv. As far as the observation that dir36 and dir37 having different gfid goes, the comparision should be b/w 1. /dir36 and /dir101/dir100/dir99/.../dir38/dir37 2. /dir37 and /dir101/dir100/dir99/.../dir39/dir38 From data attached with this bug, we don't have gfids for /dir36 and /dir37. Do we still have that data? Also from the gfids attached we can find: lrwxrwxrwx 1 root root 54 Sep 10 20:48 /brick0/n13/.glusterfs/bc/03/bc0357fd-89c2-451e-bb3b-5f64ce30cdf6 -> ../../00/00/00000000-0000-0000-0000-000000000001/dir36 lrwxrwxrwx 1 root root 54 Sep 10 20:48 /brick0/n14/.glusterfs/bc/03/bc0357fd-89c2-451e-bb3b-5f64ce30cdf6 -> ../../00/00/00000000-0000-0000-0000-000000000001/dir36 lrwxrwxrwx 1 root root 54 Sep 10 21:07 /brick0/n15/.glusterfs/bc/03/bc0357fd-89c2-451e-bb3b-5f64ce30cdf6 -> ../../00/00/00000000-0000-0000-0000-000000000001/dir36 lrwxrwxrwx 1 root root 54 Sep 10 21:07 /brick0/n16/.glusterfs/bc/03/bc0357fd-89c2-451e-bb3b-5f64ce30cdf6 -> ../../00/00/00000000-0000-0000-0000-000000000001/dir36 lrwxrwxrwx 1 root root 54 Sep 10 20:31 /brick0/n3/.glusterfs/bc/03/bc0357fd-89c2-451e-bb3b-5f64ce30cdf6 -> ../../01/cf/01cf21d8-e12e-41e0-a968-d2389f1aa344/dir36 lrwxrwxrwx 1 root root 54 Sep 10 20:31 /brick0/n8/.glusterfs/bc/03/bc0357fd-89c2-451e-bb3b-5f64ce30cdf6 -> ../../01/cf/01cf21d8-e12e-41e0-a968-d2389f1aa344/dir36 lrwxrwxrwx 1 root root 54 Sep 10 20:31 /brick0/screw3/.glusterfs/bc/03/bc0357fd-89c2-451e-bb3b-5f64ce30cdf6 -> ../../01/cf/01cf21d8-e12e-41e0-a968-d2389f1aa344/dir36 As can be seen, as per gfid namespace, dir36 is having two different parents root and /dir101/dir99/.../dir38/dir37 on different bricks. This is most likely a heal during mv 36 /dir101/.../dir37 Another similar case is dir37 which also has parents root and /dir101/dir99/.../dir38 as can be seen below: lrwxrwxrwx 1 root root 54 Sep 10 20:48 /brick0/n13/.glusterfs/01/cf/01cf21d8-e12e-41e0-a968-d2389f1aa344 -> ../../00/00/00000000-0000-0000-0000-000000000001/dir37 lrwxrwxrwx 1 root root 54 Sep 10 20:48 /brick0/n14/.glusterfs/01/cf/01cf21d8-e12e-41e0-a968-d2389f1aa344 -> ../../00/00/00000000-0000-0000-0000-000000000001/dir37 lrwxrwxrwx 1 root root 54 Sep 10 21:05 /brick0/n15/.glusterfs/01/cf/01cf21d8-e12e-41e0-a968-d2389f1aa344 -> ../../00/00/00000000-0000-0000-0000-000000000001/dir37 lrwxrwxrwx 1 root root 54 Sep 10 21:05 /brick0/n16/.glusterfs/01/cf/01cf21d8-e12e-41e0-a968-d2389f1aa344 -> ../../00/00/00000000-0000-0000-0000-000000000001/dir37 lrwxrwxrwx 1 root root 54 Sep 10 20:31 /brick0/n3/.glusterfs/01/cf/01cf21d8-e12e-41e0-a968-d2389f1aa344 -> ../../59/1e/591e6ab3-5832-4a8b-aa2f-1e576bf16415/dir37 lrwxrwxrwx 1 root root 54 Sep 10 20:31 /brick0/n8/.glusterfs/01/cf/01cf21d8-e12e-41e0-a968-d2389f1aa344 -> ../../59/1e/591e6ab3-5832-4a8b-aa2f-1e576bf16415/dir37 lrwxrwxrwx 1 root root 54 Sep 10 20:31 /brick0/screw3/.glusterfs/01/cf/01cf21d8-e12e-41e0-a968-d2389f1aa344 -> ../../59/1e/591e6ab3-5832-4a8b-aa2f-1e576bf16415/dir37
*** Bug 1140167 has been marked as a duplicate of this bug. ***
*** This bug has been marked as a duplicate of bug 1139676 ***