Bug 1443373 - mkdir/rmdir loop causes gfid-mismatch on a 6 brick distribute volume
Summary: mkdir/rmdir loop causes gfid-mismatch on a 6 brick distribute volume
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: distribute
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: Kotresh HR
QA Contact:
URL:
Whiteboard: triaged, dht-gss, dht-gss-ask, dht-3....
Depends On: 915992 951195 981196 1094724 1115367 1286593
Blocks: 1089628
TreeView+ depends on / blocked
 
Reported: 2017-04-19 07:26 UTC by Kotresh HR
Modified: 2017-05-30 18:50 UTC (History)
13 users (show)

Fixed In Version: glusterfs-3.11.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1286593
Environment:
Last Closed: 2017-05-30 18:50:19 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Comment 1 Kotresh HR 2017-04-26 07:07:12 UTC
Following script is used for reproducing:

#!/bin/bash
echo "starting.."
while :; do
        mkdir -p foo/bar/goo
        mkdir -p foo/bar/gee
        mkdir -p foo/gue/gar
        rm -rf foo
done


The affected volume is a 6-brick distribute:

# gluster volume info bz922792_dht
 
Volume Name: bz922792_dht
Type: Distribute
Volume ID: 99301415-d889-4d25-8b55-bce17bfdfbce
Status: Started
Number of Bricks: 6
Transport-type: tcp
Bricks:
Brick1: rhs-1:/bricks/bz922792_dht_1
Brick2: rhs-2:/bricks/bz922792_dht_1
Brick3: rhs-1:/bricks/bz922792_dht_2
Brick4: rhs-2:/bricks/bz922792_dht_2
Brick5: rhs-1:/bricks/bz922792_dht_3
Brick6: rhs-2:/bricks/bz922792_dht_3


After running the reproducer script on two glusterfs-clients (on the servers),
a gfid mismatch will occur relatively soon (mostly within a minute):

rhs-1# getfattr -d -e hex -m trusted.gfid /bricks/bz922792_dht_?/foo 2> /dev/null 
# file: bricks/bz922792_dht_1/foo
trusted.gfid=0x05dda1efa857498ebb989eae513ad811

# file: bricks/bz922792_dht_2/foo
trusted.gfid=0x05dda1efa857498ebb989eae513ad811

# file: bricks/bz922792_dht_3/foo
trusted.gfid=0xcd99da3a04d549deb22fd44aef5fa340

rhs-2# getfattr -d -e hex -m trusted.gfid /bricks/bz922792_dht_?/foo 2> /dev/null 
# file: bricks/bz922792_dht_1/foo
trusted.gfid=0x05dda1efa857498ebb989eae513ad811

# file: bricks/bz922792_dht_2/foo
trusted.gfid=0x05dda1efa857498ebb989eae513ad811

# file: bricks/bz922792_dht_3/foo
trusted.gfid=0xcd99da3a04d549deb22fd44aef5fa340

0-bz922792_dht-client-0 to 0-bz922792_dht-client-3 have gfid:05dda1ef-a857-498e-bb98-9eae513ad811

0-bz922792_dht-client-4 = rhs-1:/bricks/bz922792_dht_3
0-bz922792_dht-client-5 = rhs-2:/bricks/bz922792_dht_3
                        -> gfid:cd99da3a-04d5-49de-b22f-d44aef5fa340


From the client log of rhs-1, I think that this is the start of the problem:

  [2013-04-11 14:23:54.328021] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 0-bz922792_dht-client-4: remote operation failed: File exists. Path: /foo
  [2013-04-11 14:23:54.328789] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 0-bz922792_dht-client-5: remote operation failed: File exists. Path: /foo
  ...
  [2013-04-11 14:25:07.032185] W [client-rpc-fops.c:2604:client3_3_lookup_cbk] 0-bz922792_dht-client-5: remote operation failed: Stale NFS file handle. Path: /foo (05dda1ef-a857-498e-bb98-9eae513ad811)
  [2013-04-11 14:25:07.032220] W [client-rpc-fops.c:2604:client3_3_lookup_cbk] 0-bz922792_dht-client-4: remote operation failed: Stale NFS file handle. Path: /foo (05dda1ef-a857-498e-bb98-9eae513ad811)
  [2013-04-11 14:25:07.033762] W [dht-common.c:419:dht_lookup_dir_cbk] 0-bz922792_dht-dht: /foo: gfid different on bz922792_dht-client-1
  [2013-04-11 14:25:07.033798] W [dht-common.c:419:dht_lookup_dir_cbk] 0-bz922792_dht-dht: /foo: gfid different on bz922792_dht-client-0
  [2013-04-11 14:25:07.033823] W [dht-common.c:419:dht_lookup_dir_cbk] 0-bz922792_dht-dht: /foo: gfid different on bz922792_dht-client-3
  [2013-04-11 14:25:07.033855] W [dht-common.c:419:dht_lookup_dir_cbk] 0-bz922792_dht-dht: /foo: gfid different on bz922792_dht-client-2
  [2013-04-11 14:25:07.035677] W [dht-common.c:419:dht_lookup_dir_cbk] 0-bz922792_dht-dht: /foo: gfid different on bz922792_dht-client-2
  [2013-04-11 14:25:07.035721] W [dht-common.c:419:dht_lookup_dir_cbk] 0-bz922792_dht-dht: /foo: gfid different on bz922792_dht-client-1
  [2013-04-11 14:25:07.035756] W [dht-common.c:419:dht_lookup_dir_cbk] 0-bz922792_dht-dht: /foo: gfid different on bz922792_dht-client-0
  [2013-04-11 14:25:07.035779] W [dht-common.c:419:dht_lookup_dir_cbk] 0-bz922792_dht-dht: /foo: gfid different on bz922792_dht-client-3
  ...
  [2013-04-11 14:25:07.053041] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-bz922792_dht-client-1: remote operation failed: No such file or directory. Path: /foo (cd99da3a-04d5-49de-b22f-d44aef5fa340)
  [2013-04-11 14:25:07.053073] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-bz922792_dht-client-0: remote operation failed: No such file or directory. Path: /foo (cd99da3a-04d5-49de-b22f-d44aef5fa340)
  [2013-04-11 14:25:07.053102] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-bz922792_dht-client-3: remote operation failed: No such file or directory. Path: /foo (cd99da3a-04d5-49de-b22f-d44aef5fa340)
  [2013-04-11 14:25:07.053124] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-bz922792_dht-client-2: remote operation failed: No such file or directory. Path: /foo (cd99da3a-04d5-49de-b22f-d44aef5fa340)

From rhs-2, the first messages concerning the same gfids:

  [2013-04-11 14:23:54.357739] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-bz922792_dht-client-0: remote operation failed: No such file or directory. Path: /foo (cd99da3a-04d5-49de-b22f-d44aef5fa340)
  [2013-04-11 14:23:54.357804] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-bz922792_dht-client-1: remote operation failed: No such file or directory. Path: /foo (cd99da3a-04d5-49de-b22f-d44aef5fa340)
  [2013-04-11 14:23:54.357832] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-bz922792_dht-client-3: remote operation failed: No such file or directory. Path: /foo (cd99da3a-04d5-49de-b22f-d44aef5fa340)
  [2013-04-11 14:23:54.357868] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-bz922792_dht-client-2: remote operation failed: No such file or directory. Path: /foo (cd99da3a-04d5-49de-b22f-d44aef5fa340)
  ...
  [2013-04-11 14:25:57.053218] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-bz922792_dht-client-5: remote operation failed: No such file or directory. Path: /foo (05dda1ef-a857-498e-bb98-9eae513ad811)
  [2013-04-11 14:25:57.053254] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-bz922792_dht-client-4: remote operation failed: No such file or directory. Path: /foo (05dda1ef-a857-498e-bb98-9eae513ad811)

The first mkdir operation seems to have succeeded for 0-bz922792_dht-client-0
to 0-bz922792_dht-client-3, but failed on the two bricks which have a
different gfid.

Comment 2 Worker Ant 2017-04-26 09:00:37 UTC
COMMIT: https://review.gluster.org/15472 committed in master by Raghavendra G (rgowdapp) 
------
commit 4076b73b2f4fb3cca0737974b124f33f76f9c9c1
Author: Kotresh HR <khiremat>
Date:   Tue Jan 3 02:35:06 2017 -0500

    feature/dht: Directory synchronization
    
    Design doc: https://review.gluster.org/16876
    
    Directory creation is now synchronized with blocking inodelk of the
    parent on the hashed subvolume followed by the entrylk on the hashed
    subvolume between dht_mkdir, dht_rmdir, dht_rename_dir and lookup
    selfheal mkdir.
    
    To maintain internal consistency of directories across all subvols of
    dht, we need locks. Specifically we are interested in:
    
     1. Consistency of layout of a directory. Only one writer should modify
        the layout at a time. A writer (layout setting during directory heal
        as part of lookup) shouldn't modify the layout while there are
        readers (all other fops like create, mkdir etc., which consume
        layout) and readers shouldn't read the layout while a writer is in
        progress. Readers can read the layout simultaneously. Writer takes
        a WRITE inodelk on the directory (whose layout is being modified)
        across ALL subvols. Reader takes a READ inodelk on the directory
        (whose layout is being read) on ANY subvol.
    
     2. Consistency of directory namespace across subvols. The path and
        associated gfid should be same on all subvols. A gfid should not be
        associated with more than one path on any subvol. All fops that can
        change directory names (mkdir, rmdir, renamedir, directory creation
        phase in lookup-heal) takes an entrylk on hashed subvol of the
        directory.
    
     NOTE1: In point 2 above, since dht takes entrylk on hashed subvol of a
            directory, the transaction itself is a consumer of layout on
            parent directory. So, the transaction is a reader of parent
            layout and does an inodelk on parent directory just like any
            other layout reader. So a mkdir (dir/subdir) would:
    
         > Acquire a READ inodelk on "dir" on any subvol.
         > Acquire an entrylk (dir, "subdir") on hashed subvol of "subdir".
         > creates directory on hashed subvol and possibly on non-hashed subvols.
         > UNLOCK (entrylk)
         > UNLOCK (inodelk)
    
     NOTE2: mkdir fop while setting the layout of the directory being created
            is considered as a reader, but NOT a writer. The reason is for
            a fop which can consume the layout of a directory to come either
            of the following conditions has to be true:
    
         > mkdir syscall from application has to complete. In this case no
           need of synchronization.
         > A lookup issued on the directory racing with mkdir has to complete.
           Since layout setting by a lookup is considered as a writer, only
           one of either mkdir or lookup will set the layout.
    
    Code re-organization:
       All the lock related routines are moved to "dht-lock.c" file.
       New wrapper function is introduced to take blocking inodelk
       followed by entrylk 'dht_protect_namespace'
    
    Updates #191
    Change-Id: I01569094dfbe1852de6f586475be79c1ba965a31
    Signed-off-by: Kotresh HR <khiremat>
    BUG: 1443373
    Reviewed-on: https://review.gluster.org/15472
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Raghavendra G <rgowdapp>
    Smoke: Gluster Build System <jenkins.org>

Comment 3 Shyamsundar 2017-05-30 18:50:19 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.11.0, please open a new bug report.

glusterfs-3.11.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-May/000073.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.