Bug 855856

Summary: .glusterfs is missing entries after self heal and lookup
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Rahul Hinduja <rhinduja>
Component: glusterfsAssignee: vsomyaju
Status: CLOSED NOTABUG QA Contact: Sudhir D <sdharane>
Severity: high Docs Contact:
Priority: unspecified    
Version: 2.0CC: nsathyan, rhs-bugs, vbellur, vsomyaju
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-09-28 09:23:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rahul Hinduja 2012-09-10 12:11:06 UTC
Description of problem:

.glusterfs is missing entries after volume heal and lookup 

Version-Release number of selected component (if applicable):

glusterfs 3.3.0rhs built on Jul 25 11:21:58

(glusterfs 3.3.0rhs-25.el6rhs.x86_64)

How reproducible:

1/1

Steps to Reproduce:
1. Create a distribute volume with 1 brick.
2. Populate huge amount of data to the volume. (We had Total 85 directories having 1200 files)
3. Translate the volume to Replicate(1X2) using "gluster volume add-brick <vol-name> replica 2 <brick>"
4. Use "gluster volume heal <vol-name> full" to perform self heal.
5. Once the heal is done "Number of entries : 0" in "gluster volume heal <vol-name> info"
6. Use arequal-checksum to calculate the checsum of the bricks(brick1 and brick2). 
  
Actual results:

Arequal doesnt match for the brick1 and brick2. Newly added bricks are having less number of directories in ".glusterfs". Even after performing the lookup on mount point using "find .| xargs stat" the directories missmatch between brick1 and brick2 for .glusterfs directory.

Expected results:

areequal should match and should report the equal number of directories and files for brick1 and brick2

Additional info:

Comment 2 vsomyaju 2012-09-28 09:23:35 UTC
I have reproduced the problem and  found that there is a possibility of creation of temporary files(swap file) and when they get removed only the hard links are removed and the parent directories exists even after that.

For eg. 
The difference of .glusterfs of brick1 and brick2: 
------------------------------------------
3747d3746
< ./.glusterfs/61/61
3970d3968
< ./.glusterfs/67/5d
9765,9766d9762
< ./.glusterfs/indices
< ./.glusterfs/indices/xattrop
------------------------------------------

and from the log files it can be seen that some temporary files has got created and removed.
So the hard link were created and removed only for those swap files. 


for gfid:61612913-fd9b-4fcb-85cd-749d9aae49f3


creation of hard link:61612913-fd9b-4fcb-85cd-749d9aae49f3
--------------------------
[2012-09-28 07:50:43.032358] I [posix-handle.c:593:posix_handle_hard] (-->/usr/local/lib/glusterfs/3git/xlator/features/access-control.so(posix_acl_create+0x25d) [0x7f3b8323a75d] (-->/usr/local/lib/glusterfs/3git/xlator/storage/posix.so(posix_create+0x2a0) [0x7f3b8344b4d0] (-->/usr/local/lib/glusterfs/3git/xlator/storage/posix.so(posix_gfid_set+0x132) [0x7f3b83457202]))) 0-volume3-posix: /temp_disk/bricks/volume3/brick1/.glusterfs/61/61/61612913-fd9b-4fcb-85cd-749d9aae49f3 <--> /temp_disk/bricks/volume3/brick1/.sh1.sh.swp <--> hard

SWAP FILE:.sh1.sh.swp

unlink:61612913-fd9b-4fcb-85cd-749d9aae49f3
---------------------------------------------
[2012-09-28 07:51:43.727594] I [posix-handle.c:711:posix_handle_unset] (-->/usr/local/lib/libglusterfs.so.0(default_unlink+0x124) [0x7f3b8721cea4] (-->/usr/local/lib/glusterfs/3git/xlator/features/access-control.so(posix_acl_unlink+0x218) [0x7f3b83238b68] (-->/usr/local/lib/glusterfs/3git/xlator/storage/posix.so(posix_unlink+0x575) [0x7f3b8344f8d5]))) 0-volume3-posix: 61612913-fd9b-4fcb-85cd-749d9aae49f3 <--> unset



for gfid:675d18a5-2e71-4881-91b0-309a48e23ed7


creation of hard link:
--------------------------
[2012-09-28 07:50:43.027252] I [posix-handle.c:593:posix_handle_hard] (-->/usr/local/lib/glusterfs/3git/xlator/features/access-control.so(posix_acl_create+0x25d) [0x7f3b8323a75d] (-->/usr/local/lib/glusterfs/3git/xlator/storage/posix.so(posix_create+0x2a0) [0x7f3b8344b4d0] (-->/usr/local/lib/glusterfs/3git/xlator/storage/posix.so(posix_gfid_set+0x132) [0x7f3b83457202]))) 0-volume3-posix: /temp_disk/bricks/volume3/brick1/.glusterfs/67/5d/675d18a5-2e71-4881-91b0-309a48e23ed7 <--> /temp_disk/bricks/volume3/brick1/.sh1.sh.swp <--> hard

SWAP FILE:.sh1.sh.swp


unlink:
------------------------------
[2012-09-28 07:50:43.029697] I [posix-handle.c:711:posix_handle_unset] (-->/usr/local/lib/libglusterfs.so.0(default_unlink+0x124) [0x7f3b8721cea4] (-->/usr/local/lib/glusterfs/3git/xlator/features/access-control.so(posix_acl_unlink+0x218) [0x7f3b83238b68] (-->/usr/local/lib/glusterfs/3git/xlator/storage/posix.so(posix_unlink+0x575) [0x7f3b8344f8d5]))) 0-volume3-posix: 675d18a5-2e71-4881-91b0-309a48e23ed7 <--> unset

So it is not a bug as long as  arequal-checksum( excluding .glusterfs ) are same for both the bricks of a replicate volume.