Bug 951195 - mkdir/rmdir loop causes gfid-mismatch on a 6 brick distribute volume
Summary: mkdir/rmdir loop causes gfid-mismatch on a 6 brick distribute volume
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: distribute
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: Raghavendra G
QA Contact:
URL:
Whiteboard:
Depends On: 915992
Blocks: 981196 986916 1094724 1121920 1286582 1286592 1286593 1338634 1338668 1338669 1443373
TreeView+ depends on / blocked
 
Reported: 2013-04-11 16:03 UTC by Niels de Vos
Modified: 2017-04-19 07:26 UTC (History)
8 users (show)

Fixed In Version: glusterfs-3.5.2
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1094724 (view as bug list)
Environment:
Last Closed: 2014-09-16 19:44:13 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)
client side logs: rhs-1_mnt-bz922792_dht.log.gz rhs-2_mnt-bz922792_dht.log.gz (3.72 MB, application/octet-stream)
2013-04-11 16:03 UTC, Niels de Vos
no flags Details

Description Niels de Vos 2013-04-11 16:03:02 UTC
Created attachment 734304 [details]
client side logs: rhs-1_mnt-bz922792_dht.log.gz rhs-2_mnt-bz922792_dht.log.gz

Affected version:
glusterfs master/HEAD build with this last commit:
  commit ce111f472796d027796b0cc3a4a6f78689f1172d
  Author: Anand Avati <avati@redhat.com>
  Date:   Fri Apr 5 02:18:06 2013 -0700


Following script is used for reproducing:

#!/bin/bash
echo "starting.."
while :; do
        mkdir -p foo/bar/goo
        mkdir -p foo/bar/gee
        mkdir -p foo/gue/gar
        rm -rf foo
done


The affected volume is a 6-brick distribute:

# gluster volume info bz922792_dht
 
Volume Name: bz922792_dht
Type: Distribute
Volume ID: 99301415-d889-4d25-8b55-bce17bfdfbce
Status: Started
Number of Bricks: 6
Transport-type: tcp
Bricks:
Brick1: rhs-1:/bricks/bz922792_dht_1
Brick2: rhs-2:/bricks/bz922792_dht_1
Brick3: rhs-1:/bricks/bz922792_dht_2
Brick4: rhs-2:/bricks/bz922792_dht_2
Brick5: rhs-1:/bricks/bz922792_dht_3
Brick6: rhs-2:/bricks/bz922792_dht_3


After running the reproducer script on two glusterfs-clients (on the servers),
a gfid mismatch will occur relatively soon (mostly within a minute):

rhs-1# getfattr -d -e hex -m trusted.gfid /bricks/bz922792_dht_?/foo 2> /dev/null 
# file: bricks/bz922792_dht_1/foo
trusted.gfid=0x05dda1efa857498ebb989eae513ad811

# file: bricks/bz922792_dht_2/foo
trusted.gfid=0x05dda1efa857498ebb989eae513ad811

# file: bricks/bz922792_dht_3/foo
trusted.gfid=0xcd99da3a04d549deb22fd44aef5fa340

rhs-2# getfattr -d -e hex -m trusted.gfid /bricks/bz922792_dht_?/foo 2> /dev/null 
# file: bricks/bz922792_dht_1/foo
trusted.gfid=0x05dda1efa857498ebb989eae513ad811

# file: bricks/bz922792_dht_2/foo
trusted.gfid=0x05dda1efa857498ebb989eae513ad811

# file: bricks/bz922792_dht_3/foo
trusted.gfid=0xcd99da3a04d549deb22fd44aef5fa340

0-bz922792_dht-client-0 to 0-bz922792_dht-client-3 have gfid:05dda1ef-a857-498e-bb98-9eae513ad811

0-bz922792_dht-client-4 = rhs-1:/bricks/bz922792_dht_3
0-bz922792_dht-client-5 = rhs-2:/bricks/bz922792_dht_3
                        -> gfid:cd99da3a-04d5-49de-b22f-d44aef5fa340


From the client log of rhs-1, I think that this is the start of the problem:

  [2013-04-11 14:23:54.328021] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 0-bz922792_dht-client-4: remote operation failed: File exists. Path: /foo
  [2013-04-11 14:23:54.328789] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 0-bz922792_dht-client-5: remote operation failed: File exists. Path: /foo
  ...
  [2013-04-11 14:25:07.032185] W [client-rpc-fops.c:2604:client3_3_lookup_cbk] 0-bz922792_dht-client-5: remote operation failed: Stale NFS file handle. Path: /foo (05dda1ef-a857-498e-bb98-9eae513ad811)
  [2013-04-11 14:25:07.032220] W [client-rpc-fops.c:2604:client3_3_lookup_cbk] 0-bz922792_dht-client-4: remote operation failed: Stale NFS file handle. Path: /foo (05dda1ef-a857-498e-bb98-9eae513ad811)
  [2013-04-11 14:25:07.033762] W [dht-common.c:419:dht_lookup_dir_cbk] 0-bz922792_dht-dht: /foo: gfid different on bz922792_dht-client-1
  [2013-04-11 14:25:07.033798] W [dht-common.c:419:dht_lookup_dir_cbk] 0-bz922792_dht-dht: /foo: gfid different on bz922792_dht-client-0
  [2013-04-11 14:25:07.033823] W [dht-common.c:419:dht_lookup_dir_cbk] 0-bz922792_dht-dht: /foo: gfid different on bz922792_dht-client-3
  [2013-04-11 14:25:07.033855] W [dht-common.c:419:dht_lookup_dir_cbk] 0-bz922792_dht-dht: /foo: gfid different on bz922792_dht-client-2
  [2013-04-11 14:25:07.035677] W [dht-common.c:419:dht_lookup_dir_cbk] 0-bz922792_dht-dht: /foo: gfid different on bz922792_dht-client-2
  [2013-04-11 14:25:07.035721] W [dht-common.c:419:dht_lookup_dir_cbk] 0-bz922792_dht-dht: /foo: gfid different on bz922792_dht-client-1
  [2013-04-11 14:25:07.035756] W [dht-common.c:419:dht_lookup_dir_cbk] 0-bz922792_dht-dht: /foo: gfid different on bz922792_dht-client-0
  [2013-04-11 14:25:07.035779] W [dht-common.c:419:dht_lookup_dir_cbk] 0-bz922792_dht-dht: /foo: gfid different on bz922792_dht-client-3
  ...
  [2013-04-11 14:25:07.053041] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-bz922792_dht-client-1: remote operation failed: No such file or directory. Path: /foo (cd99da3a-04d5-49de-b22f-d44aef5fa340)
  [2013-04-11 14:25:07.053073] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-bz922792_dht-client-0: remote operation failed: No such file or directory. Path: /foo (cd99da3a-04d5-49de-b22f-d44aef5fa340)
  [2013-04-11 14:25:07.053102] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-bz922792_dht-client-3: remote operation failed: No such file or directory. Path: /foo (cd99da3a-04d5-49de-b22f-d44aef5fa340)
  [2013-04-11 14:25:07.053124] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-bz922792_dht-client-2: remote operation failed: No such file or directory. Path: /foo (cd99da3a-04d5-49de-b22f-d44aef5fa340)

From rhs-2, the first messages concerning the same gfids:

  [2013-04-11 14:23:54.357739] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-bz922792_dht-client-0: remote operation failed: No such file or directory. Path: /foo (cd99da3a-04d5-49de-b22f-d44aef5fa340)
  [2013-04-11 14:23:54.357804] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-bz922792_dht-client-1: remote operation failed: No such file or directory. Path: /foo (cd99da3a-04d5-49de-b22f-d44aef5fa340)
  [2013-04-11 14:23:54.357832] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-bz922792_dht-client-3: remote operation failed: No such file or directory. Path: /foo (cd99da3a-04d5-49de-b22f-d44aef5fa340)
  [2013-04-11 14:23:54.357868] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-bz922792_dht-client-2: remote operation failed: No such file or directory. Path: /foo (cd99da3a-04d5-49de-b22f-d44aef5fa340)
  ...
  [2013-04-11 14:25:57.053218] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-bz922792_dht-client-5: remote operation failed: No such file or directory. Path: /foo (05dda1ef-a857-498e-bb98-9eae513ad811)
  [2013-04-11 14:25:57.053254] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-bz922792_dht-client-4: remote operation failed: No such file or directory. Path: /foo (05dda1ef-a857-498e-bb98-9eae513ad811)

The first mkdir operation seems to have succeeded for 0-bz922792_dht-client-0
to 0-bz922792_dht-client-3, but failed on the two bricks which have a
different gfid.

Comment 2 Anand Avati 2013-04-23 07:32:19 UTC
REVIEW: http://review.gluster.org/4846 (cluster/dht: xattr on to prevent races in rmdir lookup_heal) posted (#3) for review on master by Shishir Gowda (sgowda@redhat.com)

Comment 3 Anand Avati 2013-04-23 08:54:34 UTC
REVIEW: http://review.gluster.org/4846 (cluster/dht: xattr on to prevent races in rmdir lookup_heal) posted (#4) for review on master by Shishir Gowda (sgowda@redhat.com)

Comment 4 Anand Avati 2013-04-24 12:46:02 UTC
REVIEW: http://review.gluster.org/4846 (cluster/dht: inodelk on hashed to prevent races in rmdir heal) posted (#5) for review on master by Shishir Gowda (sgowda@redhat.com)

Comment 5 Anand Avati 2013-04-26 06:54:28 UTC
REVIEW: http://review.gluster.org/4889 (locks: Added an xdata-based 'cmd' for inodelk count in a given domain) posted (#1) for review on master by Krishnan Parthasarathi (kparthas@redhat.com)

Comment 6 Anand Avati 2013-04-26 10:37:29 UTC
REVIEW: http://review.gluster.org/4889 (locks: Added an xdata-based 'cmd' for inodelk count in a given domain) posted (#2) for review on master by Krishnan Parthasarathi (kparthas@redhat.com)

Comment 7 Anand Avati 2013-06-05 10:49:33 UTC
REVIEW: http://review.gluster.org/4846 (cluster/dht: inodelk on hashed to prevent races in rmdir  heal) posted (#6) for review on master by Shishir Gowda (sgowda@redhat.com)

Comment 8 Anand Avati 2013-06-05 10:49:57 UTC
REVIEW: http://review.gluster.org/4889 (locks: Added an xdata-based 'cmd' for inodelk count in a given domain) posted (#4) for review on master by Shishir Gowda (sgowda@redhat.com)

Comment 9 Niels de Vos 2013-06-13 09:06:37 UTC
These two patches don't fix this issue for me when I apply them on top of
master (last commit 328ea4b).

In my first attempt to verify these patches, after stopping the reproduser
scripts the output looks like this:

[root@rhs-1 ~]# ls -li /mnt/bz922792_dht/foo/
total 0
12580817571139378177 d--------- 3 root root 80 Jun 13 08:57 bar
12580817571139378177 d--------- 3 root root 80 Jun 13 08:57 bar
10650833170816791630 d--------- 2 root root 76 Jun 13 08:57 gue
10650833170816791630 d--------- 2 root root 76 Jun 13 08:57 gue
[root@rhs-1 ~]# ls -li /mnt/bz922792_dht/foo/bar
total 0
9888433851475164314 drwxr-xr-x 2 root root 30 Jun 13 08:57 goo
9888433851475164314 drwxr-xr-x 2 root root 30 Jun 13 08:57 goo

Comment 10 Niels de Vos 2013-06-13 09:22:00 UTC
GFIDs are inconsistent, that likely explains the double listing in 'ls'.

On rhs-1:                                       _
# file: bricks/bz922792_dht_1/foo                \
trusted.gfid=0xea63465236a440d095d0c7047482af7f   \
# file: bricks/bz922792_dht_2/foo                  |_ OK on both
trusted.gfid=0xea63465236a440d095d0c7047482af7f    |
# file: bricks/bz922792_dht_3/foo                 /
trusted.gfid=0xea63465236a440d095d0c7047482af7f _/
# file: bricks/bz922792_dht_1/foo/bar            \
trusted.gfid=0x4051087efb3f45dfae980aecc7c15c01   \
# file: bricks/bz922792_dht_2/foo/bar              |_ 1/6 wrong
trusted.gfid=0x4051087efb3f45dfae980aecc7c15c01    |
# file: bricks/bz922792_dht_3/foo/bar             /
trusted.gfid=0x7347f22b7d3d49d28eea7f635f94c7ed _/  <-- differs, unique
# file: bricks/bz922792_dht_1/foo/gue            \
trusted.gfid=0x2b182bc2d831432193cf5c569c7b744e   \ <-- match rhs-2 dht_3
# file: bricks/bz922792_dht_2/foo/gue              |_ 2/6 wrong
trusted.gfid=0x30fe1e3ca0e2416a9143b8fe66b2f032    |
# file: bricks/bz922792_dht_3/foo/gue             /
trusted.gfid=0x30fe1e3ca0e2416a9143b8fe66b2f032 _/

On rhs-2:                                       _
# file: bricks/bz922792_dht_1/foo                \
trusted.gfid=0xea63465236a440d095d0c7047482af7f   \
# file: bricks/bz922792_dht_2/foo                  |_ OK on both
trusted.gfid=0xea63465236a440d095d0c7047482af7f    |
# file: bricks/bz922792_dht_3/foo                 /
trusted.gfid=0xea63465236a440d095d0c7047482af7f _/
# file: bricks/bz922792_dht_1/foo/bar            \
trusted.gfid=0x4051087efb3f45dfae980aecc7c15c01   \
# file: bricks/bz922792_dht_2/foo/bar              |_ 1/6 wrong (on rhs-1)
trusted.gfid=0x4051087efb3f45dfae980aecc7c15c01    |
# file: bricks/bz922792_dht_3/foo/bar             /
trusted.gfid=0x4051087efb3f45dfae980aecc7c15c01 _/
# file: bricks/bz922792_dht_1/foo/gue            \
trusted.gfid=0x30fe1e3ca0e2416a9143b8fe66b2f032   \
# file: bricks/bz922792_dht_2/foo/gue              |_ 2/6 wrong
trusted.gfid=0x30fe1e3ca0e2416a9143b8fe66b2f032    |
# file: bricks/bz922792_dht_3/foo/gue             /
trusted.gfid=0x2b182bc2d831432193cf5c569c7b744e _/ <-- matches rhs-1 dht_1

Comment 11 Niels de Vos 2013-06-13 10:01:13 UTC
Ai, going through the logs, I notice that not all glusterfsd processes were
running (no idea how that happened). Re-running the tests now, will leave a
new update later.

Comment 12 Niels de Vos 2013-06-13 16:49:22 UTC
I have not seen the duplicate entries in 'ls' anymore, but the reproducers
hungs after a while never the less. The gfid mismatches on the directories
look a little different:

On rhs-1:                                       _
# file: bricks/bz922792_dht_1/foo                \
trusted.gfid=0x9703ccec339a45708da0aa7a098b23ba   \
# file: bricks/bz922792_dht_2/foo                  |_ OK on both
trusted.gfid=0x9703ccec339a45708da0aa7a098b23ba    |
# file: bricks/bz922792_dht_3/foo                 /
trusted.gfid=0x9703ccec339a45708da0aa7a098b23ba _/
# file: bricks/bz922792_dht_1/foo/bar            \
trusted.gfid=0xee8578f0a69b43ec82889187186a30a3   \  <-- match rhs-2:dht_3
# file: bricks/bz922792_dht_2/foo/bar              |_ 2/6 wrong
trusted.gfid=0x852d1dd258c84bccaa7c8575e9c99dda    |
# file: bricks/bz922792_dht_3/foo/bar             /
trusted.gfid=0x852d1dd258c84bccaa7c8575e9c99dda _/
# file: bricks/bz922792_dht_1/foo/gue            \
trusted.gfid=0xd7d84b28dd524f10b76386b6f44be101   \
# file: bricks/bz922792_dht_2/foo/gue              |_ 3/6 wrong
trusted.gfid=0xd7d84b28dd524f10b76386b6f44be101    |
# file: bricks/bz922792_dht_3/foo/gue             /
trusted.gfid=0x2516d664966748bc956a54f3a356ad3b _/ <-- match rhs-2:dht_2+3

On rhs-2:                                       _
# file: bricks/bz922792_dht_1/foo                \
trusted.gfid=0x9703ccec339a45708da0aa7a098b23ba   \
# file: bricks/bz922792_dht_2/foo                  |_ OK on both
trusted.gfid=0x9703ccec339a45708da0aa7a098b23ba    |
# file: bricks/bz922792_dht_3/foo                 /
trusted.gfid=0x9703ccec339a45708da0aa7a098b23ba _/
# file: bricks/bz922792_dht_1/foo/bar            \
trusted.gfid=0x852d1dd258c84bccaa7c8575e9c99dda   \
# file: bricks/bz922792_dht_2/foo/bar              |_ 2/6 wrong
trusted.gfid=0x852d1dd258c84bccaa7c8575e9c99dda    |
# file: bricks/bz922792_dht_3/foo/bar             /
trusted.gfid=0xee8578f0a69b43ec82889187186a30a3 _/  <-- match rhs-1:dht_1
# file: bricks/bz922792_dht_1/foo/gue            \
trusted.gfid=0xd7d84b28dd524f10b76386b6f44be101   \ <-- match rhs-1:dht_1+2
# file: bricks/bz922792_dht_2/foo/gue              |_ 3/6 wrong
trusted.gfid=0x2516d664966748bc956a54f3a356ad3b    |
# file: bricks/bz922792_dht_3/foo/gue             /
trusted.gfid=0x2516d664966748bc956a54f3a356ad3b _/

The logs (mountpoint and the bricks from both servers) from the last test-run
that resulted in these gfis mismatches are available from
http://people.redhat.com/ndevos/bz951195/bz951195_comment12.tar.bz2 (54MB).

I have not been able to make a useful diagnosis from these logs yet. Some
guidance and suggestions are much appreciated!

Comment 13 shishir gowda 2013-06-20 04:19:57 UTC
Looks like a race between mkdir and lookup setting gfid's in posix xlator.

We might have to revert back fix 

commit 97807e75956a2d240282bc64fab1b71762de0546
Author: Pranith K <pranithk@gluster.com>
Date:   Thu Jul 14 06:31:47 2011 +0000

    storage/posix: Remove the interim fix that handles the gfid race
    
    Signed-off-by: Pranith Kumar K <pranithk@gluster.com>
    Signed-off-by: Anand Avati <avati@gluster.com>
    
    BUG: 2745 (failure to detect split brain)
    URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2745


Error logs: rhs-1 brick-3:
[2013-06-13 10:51:11.776493] W [posix-helpers.c:485:posix_gfid_set] 0-bz922792_dht-posix: setting GFID on /bricks/bz922792_dht_3/foo/gue/gar failed (File exists)
[2013-06-13 10:51:11.776515] E [posix.c:960:posix_mkdir] 0-bz922792_dht-posix: setting gfid on /bricks/bz922792_dht_3/foo/gue/gar failed

[2013-06-13 11:31:34.485813] W [posix-handle.c:624:posix_handle_soft] 0-bz922792_dht-posix: symlink ../.
./ee/85/ee8578f0-a69b-43ec-8288-9187186a30a3/goo -> /bricks/bz922792_dht_3/.glusterfs/7b/d2/7bd23cd6-b82
b-498f-85f0-c08744b91295 failed (File exists)
[2013-06-13 11:31:34.485838] E [posix.c:960:posix_mkdir] 0-bz922792_dht-posix: setting gfid on /bricks/bz922792_dht_3/foo/bar/goo failed

Comment 14 Anand Avati 2013-06-20 08:37:05 UTC
REVIEW: http://review.gluster.org/4846 (cluster/dht: inodelk on hashed to prevent races in rmdir  heal) posted (#7) for review on master by Shishir Gowda (sgowda@redhat.com)

Comment 15 Anand Avati 2013-06-20 08:37:30 UTC
REVIEW: http://review.gluster.org/4889 (locks: Added an xdata-based 'cmd' for inodelk count in a given domain) posted (#5) for review on master by Shishir Gowda (sgowda@redhat.com)

Comment 16 Anand Avati 2013-06-20 08:37:54 UTC
REVIEW: http://review.gluster.org/5240 (Revert "storage/posix: Remove the interim fix that handles the gfid race") posted (#1) for review on master by Shishir Gowda (sgowda@redhat.com)

Comment 18 Anand Avati 2013-07-12 09:46:38 UTC
REVIEW: http://review.gluster.org/4846 (cluster/dht: inodelk on hashed to prevent races in rmdir  heal) posted (#8) for review on master by Shishir Gowda (sgowda@redhat.com)

Comment 19 Anand Avati 2013-07-12 09:47:06 UTC
REVIEW: http://review.gluster.org/5240 (Revert "storage/posix: Remove the interim fix that handles the gfid race") posted (#2) for review on master by Shishir Gowda (sgowda@redhat.com)

Comment 20 Anand Avati 2013-07-12 09:47:31 UTC
REVIEW: http://review.gluster.org/4889 (locks: Added an xdata-based 'cmd' for inodelk count in a given domain) posted (#6) for review on master by Shishir Gowda (sgowda@redhat.com)

Comment 21 Anand Avati 2013-07-18 08:38:59 UTC
COMMIT: http://review.gluster.org/4889 committed in master by Vijay Bellur (vbellur@redhat.com) 
------
commit 15e11cfa1dec9cafd5a9039da7a43e9c02b19d98
Author: shishir gowda <sgowda@redhat.com>
Date:   Wed Jun 5 15:56:27 2013 +0530

    locks: Added an xdata-based 'cmd' for inodelk count in a given domain
    
    Following is the semantics of the 'cmd':
    1) If @domain is NULL - returns no. of locks blocked/granted in all domains
    2) If @domain is non-NULL- returns no. of locks blocked/granted in that
    domain
    3) If @domain is non-existent - returns '0'; This is important since
    locks xlator creates a domain in a lazy manner.
    
    where @domain - a string representing the domain.
    
    Change-Id: I5e609772343acc157ca650300618c1161efbe72d
    BUG: 951195
    Original-author: Krishnan Parthasarathi <kparthas@redhat.com>
    Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
    Signed-off-by: shishir gowda <sgowda@redhat.com>
    Reviewed-on: http://review.gluster.org/4889
    Tested-by: Gluster Build System <jenkins@build.gluster.com>
    Reviewed-by: Amar Tumballi <amarts@redhat.com>

Comment 22 Anand Avati 2013-07-31 06:57:29 UTC
COMMIT: http://review.gluster.org/5240 committed in master by Vijay Bellur (vbellur@redhat.com) 
------
commit acf8cfdf698aa3ebe42ed55bba8be4f85b751c29
Author: shishir gowda <sgowda@redhat.com>
Date:   Thu Jun 20 14:06:04 2013 +0530

    Revert "storage/posix: Remove the interim fix that handles the gfid race"
    
    This reverts commit 97807e75956a2d240282bc64fab1b71762de0546.
    
    In a distribute or distribute-replica volume, this fix is required to prevent
    gfid mis-match due to race issues.
    
    test script bug-767585-gfid.t  needs a sleep of 2, cause after setting backend
    gfid directly, we try to heal, and with this fix, we do not allow setxattr of
    gfid within creation of 1 second if not created by itself
    
    Change-Id: Ie3f4b385416889fd5de444638a64a7eaaf24cd60
    BUG: 951195
    Signed-off-by: shishir gowda <sgowda@redhat.com>
    Reviewed-on: http://review.gluster.org/5240
    Tested-by: Gluster Build System <jenkins@build.gluster.com>
    Reviewed-by: Amar Tumballi <amarts@redhat.com>

Comment 23 Anand Avati 2013-09-13 08:01:51 UTC
REVIEW: http://review.gluster.org/5908 (cluster/dht: inodelk on hashed to prevent races in rmdir deal) posted (#1) for review on master by Shishir Gowda (sgowda@redhat.com)

Comment 25 Anand Avati 2014-02-28 19:12:08 UTC
REVIEW: http://review.gluster.org/4846 (cluster/dht: inodelk on hashed to prevent races in rmdir heal) posted (#11) for review on master by Harshavardhana (harsha@harshavardhana.net)

Comment 26 Anand Avati 2014-02-28 20:29:21 UTC
REVIEW: http://review.gluster.org/4846 (cluster/dht: inodelk on hashed to prevent races in rmdir heal) posted (#12) for review on master by Harshavardhana (harsha@harshavardhana.net)

Comment 27 Anand Avati 2014-03-15 08:43:50 UTC
REVIEW: http://review.gluster.org/4846 (cluster/dht: inodelk on hashed to prevent races in rmdir heal) posted (#13) for review on master by Harshavardhana (harsha@harshavardhana.net)

Comment 28 Anand Avati 2014-03-21 02:27:52 UTC
REVIEW: http://review.gluster.org/4846 (cluster/dht: inodelk on hashed to prevent races in rmdir heal) posted (#14) for review on master by Harshavardhana (harsha@harshavardhana.net)

Comment 29 Anand Avati 2014-04-01 21:29:44 UTC
REVIEW: http://review.gluster.org/4846 (cluster/dht: inodelk on hashed to prevent races in rmdir heal) posted (#15) for review on master by Harshavardhana (harsha@harshavardhana.net)

Comment 30 Anand Avati 2014-04-23 07:16:56 UTC
REVIEW: http://review.gluster.org/4846 (cluster/dht: inodelk on hashed to prevent races in rmdir heal) posted (#16) for review on master by Harshavardhana (harsha@harshavardhana.net)

Comment 31 Anand Avati 2014-04-23 17:43:50 UTC
REVIEW: http://review.gluster.org/4846 (cluster/dht: inodelk on hashed to prevent races in rmdir heal) posted (#17) for review on master by Harshavardhana (harsha@harshavardhana.net)

Comment 32 Anand Avati 2014-04-23 21:27:26 UTC
REVIEW: http://review.gluster.org/4846 (cluster/dht: inodelk on hashed to prevent races in rmdir heal) posted (#18) for review on master by Harshavardhana (harsha@harshavardhana.net)

Comment 33 Anand Avati 2014-04-24 11:05:45 UTC
REVIEW: http://review.gluster.org/4846 (cluster/dht: inodelk on hashed to prevent races in rmdir  heal) posted (#19) for review on master by Raghavendra G (rgowdapp@redhat.com)

Comment 34 Anand Avati 2014-04-26 13:58:29 UTC
REVIEW: http://review.gluster.org/4846 (cluster/dht: inodelk on hashed to prevent races in rmdir  heal) posted (#20) for review on master by Raghavendra G (rgowdapp@redhat.com)

Comment 35 Anand Avati 2014-04-29 09:35:23 UTC
REVIEW: http://review.gluster.org/4846 (cluster/dht: inodelk on hashed to prevent races in rmdir  heal) posted (#21) for review on master by Raghavendra G (rgowdapp@redhat.com)

Comment 36 Anand Avati 2014-04-29 09:37:34 UTC
REVIEW: http://review.gluster.org/4846 (cluster/dht: inodelk on hashed to prevent races in rmdir  heal) posted (#22) for review on master by Raghavendra G (rgowdapp@redhat.com)

Comment 37 Anand Avati 2014-05-03 11:23:46 UTC
REVIEW: http://review.gluster.org/4846 (cluster/dht: inodelk on hashed to prevent races in rmdir  heal) posted (#23) for review on master by Raghavendra G (rgowdapp@redhat.com)

Comment 38 Anand Avati 2014-05-05 07:29:34 UTC
REVIEW: http://review.gluster.org/4846 (cluster/dht: inodelk on hashed to prevent races in rmdir  heal) posted (#24) for review on master by Raghavendra G (rgowdapp@redhat.com)

Comment 39 Anand Avati 2014-05-05 07:29:43 UTC
REVIEW: http://review.gluster.org/7662 (cluster/dht: fail rmdir if hashed subvolume is not found.) posted (#1) for review on master by Raghavendra G (rgowdapp@redhat.com)

Comment 40 Anand Avati 2014-05-05 13:15:36 UTC
REVIEW: http://review.gluster.org/4846 (cluster/dht: inodelk on hashed to prevent races in rmdir  heal) posted (#25) for review on master by Raghavendra G (rgowdapp@redhat.com)

Comment 41 Anand Avati 2014-05-05 13:15:42 UTC
REVIEW: http://review.gluster.org/7662 (cluster/dht: fail rmdir if hashed subvolume is not found.) posted (#2) for review on master by Raghavendra G (rgowdapp@redhat.com)

Comment 42 Frank Lu 2014-05-06 11:06:27 UTC
I have tried to backport the http://review.gluster.org/5240 to my glusterfs deployments, (both 3.3 & 3.4.2) I still found the gfid-mismatch issue.

my test script is:
#!/bin/bash

mkdir -p /mnt/gluster/test_volume/test_dir

for i in `seq 1 100000`;do
echo $i;
md5=`echo $i | md5sum | awk '{print $1}'`
dir=${md5:0:2}/${md5:2:2}/${md5:4:2}
mkdir -p /mnt/gluster/test_volume/test_dir/$dir/a$i
mkdir -p /mnt/gluster/test_volume/test_dir/$dir/b$i
mkdir -p /mnt/gluster/test_volume/test_dir/$dir/c$i
done


I use 10 VMs each has one client to run the test script concurrently.
/mnt/gluster/test_volume is the mount-point of glustesr volume.


I could find one directory which has gfid-mismatch issue.

clush -g bj-mig -b -q "getfattr -dm - -e hex /data/xfsd/test_volume/test_dir/7d/3e/3e/a46180 | grep gfid"

---------------
10.15.187.150,10.15.187.159,10.15.187.160,10.15.187.164,10.15.187.165,10.15.187.166
---------------
trusted.gfid=0x6f5984f9deee42ab96a1de7de0ac4533
---------------
10.15.187.161,10.15.187.162,10.15.187.163
---------------
trusted.gfid=0x6270d2c9a6de4de38d7890d67ee97536

The volume info is:
 gluster volume info test_volume

Volume Name: test_volume
Type: Distributed-Replicate
Volume ID: d28ade83-7394-45fb-bce8-56bdf252194d
Status: Started
Number of Bricks: 3 x 3 = 9
Transport-type: tcp
Bricks:
Brick1: 10.15.187.150:/data/xfsd/test_volume
Brick2: 10.15.187.159:/data/xfsd/test_volume
Brick3: 10.15.187.160:/data/xfsd/test_volume
Brick4: 10.15.187.161:/data/xfsd/test_volume
Brick5: 10.15.187.162:/data/xfsd/test_volume
Brick6: 10.15.187.163:/data/xfsd/test_volume
Brick7: 10.15.187.164:/data/xfsd/test_volume
Brick8: 10.15.187.165:/data/xfsd/test_volume
Brick9: 10.15.187.166:/data/xfsd/test_volume

I could provide more information if you need.

Comment 43 Niels de Vos 2014-09-16 19:44:13 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.5.2, please reopen this bug report.

glusterfs-3.5.2 has been announced on the Gluster Users mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-July/041217.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.