1318427 – gfid-reset of a directory in distributed replicate volume doesn't set gfid on 2nd till last subvolumes

Bug 1318427 - gfid-reset of a directory in distributed replicate volume doesn't set gfid on 2nd till last subvolumes

Summary: gfid-reset of a directory in distributed replicate volume doesn't set gfid on...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	replicate
Sub Component:
Version:	rhgs-3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.1.3
Assignee:	Pranith Kumar K
QA Contact:	Nag Pavan Chilakam
Docs Contact:
URL:
Whiteboard:
Depends On:	1312816
Blocks:	1311817
TreeView+	depends on / blocked

Reported:	2016-03-16 20:41 UTC by Dustin Black
Modified:	2019-11-14 07:36 UTC (History)
CC List:	7 users (show)
Fixed In Version:	glusterfs-3.7.9-2
Doc Type:	Bug Fix
Doc Text:	When a GFID was cleared from the all of the backend bricks of a distributed replicate volume, only the first replica pair received the new GFID. This update ensures all replicas receive new GFIDs.
Clone Of:	1312816
Environment:
Last Closed:	2016-06-23 05:03:48 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:1240	0	normal	SHIPPED_LIVE	Red Hat Gluster Storage 3.1 Update 3	2016-06-23 08:51:28 UTC

Description Dustin Black 2016-03-16 20:41:19 UTC

+++ This bug was initially created as a clone of Bug #1312816 +++

Description of problem:
    Afr does dict_ref of the xattr_req that comes to it and deletes "gfid-req" key.
    Dht uses same dict to send lookup to other subvolumes. So in case of
    directories and more than 1 dht subvolumes, second subvolume till the last
    subvolume won't get a lookup request with "gfid-req". So gfid reset never
    happens on the directories in distributed replicate subvolume for 2nd till last
    subvolumes.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Vijay Bellur on 2016-02-29 05:26:17 EST ---

REVIEW: http://review.gluster.org/13545 (cluster/afr: Don't delete gfid-req from lookup request) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Vijay Bellur on 2016-03-01 03:58:38 EST ---

REVIEW: http://review.gluster.org/13545 (cluster/afr: Don't delete gfid-req from lookup request) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Vijay Bellur on 2016-03-01 21:36:25 EST ---

REVIEW: http://review.gluster.org/13545 (cluster/afr: Don't delete gfid-req from lookup request) posted (#3) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Vijay Bellur on 2016-03-02 03:55:01 EST ---

COMMIT: http://review.gluster.org/13545 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit 9b022c3a3f2f774904b5b458ae065425b46cc15d
Author: Pranith Kumar K <pkarampu>
Date:   Sat Feb 27 23:08:06 2016 +0530

    cluster/afr: Don't delete gfid-req from lookup request
    
    Problem:
    Afr does dict_ref of the xattr_req that comes to it and deletes "gfid-req" key.
    Dht uses same dict to send lookup to other subvolumes. So in case of
    directories and more than 1 dht subvolumes, second subvolume till the last
    subvolume won't get a lookup request with "gfid-req". So gfid reset never
    happens on the directories in distributed replicate subvolume for 2nd till last
    subvolumes.
    
    Fix:
    Make a copy of lookup xattr request.
    
    Also fixed replies_wipe possibly resetting gfid to NULL gfid
    
    BUG: 1312816
    Change-Id: Ic16260e5a4664837d069c1dc05b9e96ca05bda88
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/13545
    Smoke: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>
    Reviewed-by: Krutika Dhananjay <kdhananj>

--- Additional comment from Vijay Bellur on 2016-03-16 12:28:37 EDT ---

REVIEW: http://review.gluster.org/13754 (cluster/afr: Enhance the test to be more robust) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Vijay Bellur on 2016-03-16 12:40:59 EDT ---

REVIEW: http://review.gluster.org/13754 (cluster/afr: Enhance the test to be more robust) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 3 Dustin Black 2016-03-18 15:23:45 UTC

User-side problem description from support case 01581565 is below. This should help to clarify what the impact of the bug is.

Issue is :
    After clearing gfid using script we perform named lookup and expect that new gfid would get created on all the sub-volumes.
   But in reality, GFID get created only on one subvolume. On other subvolume, you will find gfid missing.

Comment 8 Nag Pavan Chilakam 2016-05-20 10:41:43 UTC

QATP:
====
1)create a dist-rep volume and start it
2)now mount the volume and create a directory on the mount
3)check the backend-bricks and the dir should be created on all bricks of all subvols
4) get the gfid from these backend bricks. the gfid should be same
5)now from the backend, simultaneously create a new brick on the bricks directly
Thsi means the new dir would not have got any gfid
6)now do a  look up from mount
Expected result: the lookup must have caused a gfid assign to all the bricks in all subvols
check the backend bricks and all subvols and bricks must have the same gfid
(previously only first subvol got the gfid)


rerun on x3 and on both fuse and client

Comment 9 Pranith Kumar K 2016-05-20 10:49:41 UTC

We should also check if the softlink with the new gfid present in .glusterfs/ab/cd/abcd....

Pranith

Comment 10 Nag Pavan Chilakam 2016-05-20 12:48:33 UTC

Ran the qatp on x2 and x3 volume on glusterfs-server-3.7.9-5.el7rhgs.x86_64
The case has passed and also the softlinks are availble in .glusterfs
ALso, I tested with softlinks for the dirs and it worked well

Hence moving to verified

Comment 12 Pranith Kumar K 2016-06-15 09:02:08 UTC

Laura, I don't think users understand gfid-reset. May be we should explicitly say that it means 'gfid was cleared from the backend bricks'

Comment 13 Pranith Kumar K 2016-06-15 11:18:26 UTC

Laura, Please note the changes between '*'
     When a GFID was cleared from *all the backend bricks* of a distributed replicate volume, only the first replica pair received the new GFID. This update ensures all replicas receive new GFIDs.

Pranith

Comment 14 Pranith Kumar K 2016-06-15 11:48:59 UTC

Looks good to me.

Comment 16 errata-xmlrpc 2016-06-23 05:03:48 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240

Note You need to log in before you can comment on or make changes to this bug.