Bug 1114501

Summary: Dist-geo-rep : deletion of files on master, geo-rep fails to propagate to slaves.
Product: [Community] GlusterFS Reporter: Pranith Kumar K <pkarampu>
Component: coreAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: mainlineCC: aavati, avishwan, csaba, gluster-bugs, nlevinki, nsathyan, pkarampu, sharne, vkoppad
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.5.2beta1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1111587 Environment:
Last Closed: 2014-07-31 11:43:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1111587    
Bug Blocks: 1104511, 1112531    

Comment 1 Pranith Kumar K 2014-06-30 10:24:59 UTC
Without the fix:
afr does comparison with virtual gfid: e992cc46-7761-4311-a4cf-8fdde34636a5
[2014-06-30 10:08:23.727572] I [afr-lk-common.c:73:afr_entry_lockee_cmp] 0-CMP: e992cc46-7761-4311-a4cf-8fdde34636a5 - fc5d9f18-eac9-4f49-92ea-9bf27df31a30, -1
[2014-06-30 10:08:23.727631] I [client-rpc-fops.c:5501:client3_3_entrylk] 0-r2-client-0: 5980de94-0f80-4eba-b2d4-52a1abf9056a(b)
[2014-06-30 10:08:23.727735] I [client-rpc-fops.c:5501:client3_3_entrylk] 0-r2-client-1: 5980de94-0f80-4eba-b2d4-52a1abf9056a(b)
[2014-06-30 10:08:23.727839] I [client-rpc-fops.c:5501:client3_3_entrylk] 0-r2-client-0: fc5d9f18-eac9-4f49-92ea-9bf27df31a30()
[2014-06-30 10:08:23.727912] I [client-rpc-fops.c:5501:client3_3_entrylk] 0-r2-client-1: fc5d9f18-eac9-4f49-92ea-9bf27df31a30()
[2014-06-30 10:08:23.728775] I [client-rpc-fops.c:5501:client3_3_entrylk] 0-r2-client-0: 5980de94-0f80-4eba-b2d4-52a1abf9056a(b)
[2014-06-30 10:08:23.728863] I [client-rpc-fops.c:5501:client3_3_entrylk] 0-r2-client-1: 5980de94-0f80-4eba-b2d4-52a1abf9056a(b)
[2014-06-30 10:08:23.728941] I [client-rpc-fops.c:5501:client3_3_entrylk] 0-r2-client-0: fc5d9f18-eac9-4f49-92ea-9bf27df31a30()
[2014-06-30 10:08:23.729006] I [client-rpc-fops.c:5501:client3_3_entrylk] 0-r2-client-1: fc5d9f18-eac9-4f49-92ea-9bf27df31a30()

With the fix:
Comparisons are happening with real-gfid.
[2014-06-30 10:14:58.884052] I [afr-lk-common.c:73:afr_entry_lockee_cmp] 0-CMP: 5ea076c7-0373-4914-9484-9423a265b519 - fe5362ee-e06c-4fd2-9108-f14058cdeb81, -1
[2014-06-30 10:14:58.884104] I [client-rpc-fops.c:5501:client3_3_entrylk] 0-r2-client-0: 5ea076c7-0373-4914-9484-9423a265b519(b)
[2014-06-30 10:14:58.884199] I [client-rpc-fops.c:5501:client3_3_entrylk] 0-r2-client-1: 5ea076c7-0373-4914-9484-9423a265b519(b)
[2014-06-30 10:14:58.884325] I [client-rpc-fops.c:5501:client3_3_entrylk] 0-r2-client-0: fe5362ee-e06c-4fd2-9108-f14058cdeb81()
[2014-06-30 10:14:58.884401] I [client-rpc-fops.c:5501:client3_3_entrylk] 0-r2-client-1: fe5362ee-e06c-4fd2-9108-f14058cdeb81()
[2014-06-30 10:14:58.885544] I [client-rpc-fops.c:5501:client3_3_entrylk] 0-r2-client-0: 5ea076c7-0373-4914-9484-9423a265b519(b)
[2014-06-30 10:14:58.885624] I [client-rpc-fops.c:5501:client3_3_entrylk] 0-r2-client-1: 5ea076c7-0373-4914-9484-9423a265b519(b)
[2014-06-30 10:14:58.885702] I [client-rpc-fops.c:5501:client3_3_entrylk] 0-r2-client-0: fe5362ee-e06c-4fd2-9108-f14058cdeb81()
[2014-06-30 10:14:58.885773] I [client-rpc-fops.c:5501:client3_3_entrylk] 0-r2-client-1: fe5362ee-e06c-4fd2-9108-f14058cdeb81()

Writing some regression tests. Will send the patch out soon.

Pranith

Comment 2 Anand Avati 2014-06-30 15:38:05 UTC
REVIEW: http://review.gluster.org/8204 (features/gfid-access: Fix entry operations) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 3 Shalaka 2014-07-01 06:36:28 UTC
Please add doc text for this Known Issue.

Comment 4 Anand Avati 2014-07-03 01:33:09 UTC
REVIEW: http://review.gluster.org/8204 (features/gfid-access: Fix entry operations) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 5 Anand Avati 2014-07-03 04:54:50 UTC
REVIEW: http://review.gluster.org/8204 (features/gfid-access: Fix entry operations) posted (#3) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 6 Anand Avati 2014-07-07 04:01:26 UTC
COMMIT: http://review.gluster.org/8204 committed in master by Vijay Bellur (vbellur) 
------
commit 8202705f98d139ef7d691587b9f68cf1db2e397a
Author: Pranith Kumar K <pkarampu>
Date:   Thu Jul 3 06:50:56 2014 +0530

    features/gfid-access: Fix entry operations
    
    Problem:
    When more than one aux-mounts are performing rmdir .gfid/<pargfid>/dir
    simultaneously, then sometimes a hang is observed.  In gfid-access xlator When
    virtual parent/inode are replaced with real parent/inode in loc, virtual
    pargfid/gfid are not replaced with real pargfid/gfid respectively. Afr is using
    parent_loc->gfids to order the entry locks. But parent_loc->gfid contains
    random/virtual gfid generated by gfid-access xlator. Entrylk in client xlator
    is using loc->inod->gfid for sending entrylk which has 'real' gfid. Because the
    ordering is happening based on random gfids, One mount orders the locks as (L1,
    L2) where as the other orders them as (L2, L1) leading to a dead-lock thus
    a hang.
    
    Fix:
    Replace virtual pargfid/gfid with real pargfid/gfid when virtual-inodes are
    replaced with real-inodes in loc.
    
    BUG: 1114501
    Change-Id: Ie94e816122ef9e7aad51605adbf49291de60827e
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/8204
    Reviewed-by: Kotresh HR <khiremat>
    Reviewed-by: Vijay Bellur <vbellur>
    Tested-by: Vijay Bellur <vbellur>

Comment 7 Anand Avati 2014-07-07 08:30:17 UTC
REVIEW: http://review.gluster.org/8251 (features/gfid-access: Fix entry operations) posted (#1) for review on release-3.5 by Pranith Kumar Karampuri (pkarampu)

Comment 8 Anand Avati 2014-07-08 08:13:25 UTC
COMMIT: http://review.gluster.org/8251 committed in release-3.5 by Niels de Vos (ndevos) 
------
commit 828fe8068de0f1357e5c26097e45d752b3f7f6c4
Author: Pranith Kumar K <pkarampu>
Date:   Thu Jul 3 06:50:56 2014 +0530

    features/gfid-access: Fix entry operations
    
            Backport of http://review.gluster.org/8204
    
    Problem:
    When more than one aux-mounts are performing rmdir .gfid/<pargfid>/dir
    simultaneously, then sometimes a hang is observed.  In gfid-access xlator When
    virtual parent/inode are replaced with real parent/inode in loc, virtual
    pargfid/gfid are not replaced with real pargfid/gfid respectively. Afr is using
    parent_loc->gfids to order the entry locks. But parent_loc->gfid contains
    random/virtual gfid generated by gfid-access xlator. Entrylk in client xlator
    is using loc->inod->gfid for sending entrylk which has 'real' gfid. Because the
    ordering is happening based on random gfids, One mount orders the locks as (L1,
    L2) where as the other orders them as (L2, L1) leading to a dead-lock thus
    a hang.
    
    Fix:
    Replace virtual pargfid/gfid with real pargfid/gfid when virtual-inodes are
    replaced with real-inodes in loc.
    
    BUG: 1114501
    Change-Id: I13016de1da11762e0697792d76e6e946d991c0a4
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/8251
    Reviewed-by: Kotresh HR <khiremat>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Niels de Vos <ndevos>

Comment 9 Niels de Vos 2014-07-21 15:41:59 UTC
The first (and last?) Beta for GlusterFS 3.5.2 has been released [1]. Please verify if the release solves this bug report for you. In case the glusterfs-3.5.2beta1 release does not have a resolution for this issue, leave a comment in this bug and move the status to ASSIGNED. If this release fixes the problem for you, leave a note and change the status to VERIFIED.

Packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update (possibly an "updates-testing" repository) infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-devel/2014-July/041636.html
[2] http://supercolony.gluster.org/pipermail/gluster-users/

Comment 10 Niels de Vos 2014-07-31 11:43:34 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.5.2, please reopen this bug report.

glusterfs-3.5.2 has been announced on the Gluster Users mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-July/041217.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user