Hide Forgot
Description of problem: On a distributed volume, when gfid of a directory is removed from the hashed sub-vol and a lookup is performed, the hashed sub-vol gets a new gfid assigned instead of getting healed from other sub-vols. Version-Release number of selected component (if applicable): glusterfs-server-3.7.9-1.el7rhgs.x86_64 How reproducible: Always Steps to Reproduce: 1. create a distributed volume 2. create a directory from the mount point and identify the hashed sub-vol of the directory 3. From the backend brick of the hashed sub-vol remove gfid 4. stop and start the volume 5. check the gfid on all the sub-vols Actual results: gfid is different for hashed sub-vol and other sub-vols Expected results: gfid on all subvols should be same Additional info: [2016-04-20 04:44:28.152930] W [MSGID: 109009] [dht-common.c:638:dht_lookup_dir_cbk] 0-gfid-issue-dht: /testdir: gfid different on gfid-issue-client-3. gfid local = ed111c75-37be-4e74-9cc9-10a94cf86179, gfid subvol = fd282179-cfba-4656-a24d-9878bb048aa0 [2016-04-20 04:44:31.354575] I [MSGID: 109063] [dht-layout.c:702:dht_layout_normalize] 0-gfid-issue-dht: Found anomalies in /testdir (gfid = ed111c75-37be-4e74-9cc9-10a94cf86179). Holes=2 overlaps=0 [2016-04-20 04:44:31.357904] W [MSGID: 109009] [dht-common.c:638:dht_lookup_dir_cbk] 0-gfid-issue-dht: /testdir: gfid different on gfid-issue-client-0. gfid local = ed111c75-37be-4e74-9cc9-10a94cf86179, gfid subvol = fd282179-cfba-4656-a24d-9878bb048aa0
RCA: First lookup to dht-subvols, which is sent to hashed-subvol has the "gfid-req" set by fuse. So, this will be a new gfid generated by fuse during this call. So, if gfid is missing on hashed-subvol, then this new gfid is set on it and synced to other non-hashed subvols that doesn't have the gfid. Note that if gfid is missing from non-hashed subvols, they all get the gfid of the directory stored on hashed-subvol, as for lookup calls on them, dht sets "gfid-req" with the gfid on hashed-subvol. I think this issue can only be reproduced if someone removes gfid from backend directly. Other than backend corruption, this issue cannot be reproduced for following reasons: 1. An mkdir is successful on a brick only after gfid is set. 2. Only after a successful mkdir on hashed-subvol, mkdir is attempted on non-hashed subvols with gfid set on hashed subvol. So, I would consider this as NOT A BUG