Bug 1328699

Summary: [DHT]: Directory ends up with different gfid on different subvols, when gfid is removed on the hashed subvol and a lookup is performed
Product: Red Hat Gluster Storage Reporter: krishnaram Karthick <kramdoss>
Component: distributeAssignee: Raghavendra G <rgowdapp>
Status: CLOSED NOTABUG QA Contact: Anoop <annair>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.1CC: rhs-bugs, storage-qa-internal
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-04-21 06:29:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description krishnaram Karthick 2016-04-20 05:57:18 UTC
Description of problem:
On a distributed volume, when gfid of a directory is removed from the hashed sub-vol and a lookup is performed, the hashed sub-vol gets a new gfid assigned instead of getting healed from other sub-vols.

Version-Release number of selected component (if applicable):
glusterfs-server-3.7.9-1.el7rhgs.x86_64

How reproducible:
Always

Steps to Reproduce:
1. create a distributed volume
2. create a directory from the mount point and identify the hashed sub-vol of the directory
3. From the backend brick of the hashed sub-vol remove gfid
4. stop and start the volume
5. check the gfid on all the sub-vols

Actual results:
gfid is different for hashed sub-vol and other sub-vols

Expected results:
gfid on all subvols should be same

Additional info:
[2016-04-20 04:44:28.152930] W [MSGID: 109009] [dht-common.c:638:dht_lookup_dir_cbk] 0-gfid-issue-dht: /testdir: gfid different on gfid-issue-client-3. gfid local = ed111c75-37be-4e74-9cc9-10a94cf86179, gfid subvol = fd282179-cfba-4656-a24d-9878bb048aa0
[2016-04-20 04:44:31.354575] I [MSGID: 109063] [dht-layout.c:702:dht_layout_normalize] 0-gfid-issue-dht: Found anomalies in /testdir (gfid = ed111c75-37be-4e74-9cc9-10a94cf86179). Holes=2 overlaps=0
[2016-04-20 04:44:31.357904] W [MSGID: 109009] [dht-common.c:638:dht_lookup_dir_cbk] 0-gfid-issue-dht: /testdir: gfid different on gfid-issue-client-0. gfid local = ed111c75-37be-4e74-9cc9-10a94cf86179, gfid subvol = fd282179-cfba-4656-a24d-9878bb048aa0

Comment 2 Raghavendra G 2016-04-20 06:39:11 UTC
RCA: First lookup to dht-subvols, which is sent to hashed-subvol has the "gfid-req" set by fuse. So, this will be a new gfid generated by fuse during this call. So, if gfid is missing on hashed-subvol, then this new gfid is set on it and synced to other non-hashed subvols that doesn't have the gfid. Note that if gfid is missing from non-hashed subvols, they all get the gfid of the directory stored on hashed-subvol, as for lookup calls on them, dht sets "gfid-req" with the gfid on hashed-subvol.

I think this issue can only be reproduced if someone removes gfid from backend directly. Other than backend corruption, this issue cannot be reproduced for following reasons:
1. An mkdir is successful on a brick only after gfid is set.
2. Only after a successful mkdir on hashed-subvol, mkdir is attempted on non-hashed subvols with gfid set on hashed subvol.

So, I would consider this as NOT A BUG