Following script is used for reproducing: #!/bin/bash echo "starting.." while :; do mkdir -p foo/bar/goo mkdir -p foo/bar/gee mkdir -p foo/gue/gar rm -rf foo done The affected volume is a 6-brick distribute: # gluster volume info bz922792_dht Volume Name: bz922792_dht Type: Distribute Volume ID: 99301415-d889-4d25-8b55-bce17bfdfbce Status: Started Number of Bricks: 6 Transport-type: tcp Bricks: Brick1: rhs-1:/bricks/bz922792_dht_1 Brick2: rhs-2:/bricks/bz922792_dht_1 Brick3: rhs-1:/bricks/bz922792_dht_2 Brick4: rhs-2:/bricks/bz922792_dht_2 Brick5: rhs-1:/bricks/bz922792_dht_3 Brick6: rhs-2:/bricks/bz922792_dht_3 After running the reproducer script on two glusterfs-clients (on the servers), a gfid mismatch will occur relatively soon (mostly within a minute): rhs-1# getfattr -d -e hex -m trusted.gfid /bricks/bz922792_dht_?/foo 2> /dev/null # file: bricks/bz922792_dht_1/foo trusted.gfid=0x05dda1efa857498ebb989eae513ad811 # file: bricks/bz922792_dht_2/foo trusted.gfid=0x05dda1efa857498ebb989eae513ad811 # file: bricks/bz922792_dht_3/foo trusted.gfid=0xcd99da3a04d549deb22fd44aef5fa340 rhs-2# getfattr -d -e hex -m trusted.gfid /bricks/bz922792_dht_?/foo 2> /dev/null # file: bricks/bz922792_dht_1/foo trusted.gfid=0x05dda1efa857498ebb989eae513ad811 # file: bricks/bz922792_dht_2/foo trusted.gfid=0x05dda1efa857498ebb989eae513ad811 # file: bricks/bz922792_dht_3/foo trusted.gfid=0xcd99da3a04d549deb22fd44aef5fa340 0-bz922792_dht-client-0 to 0-bz922792_dht-client-3 have gfid:05dda1ef-a857-498e-bb98-9eae513ad811 0-bz922792_dht-client-4 = rhs-1:/bricks/bz922792_dht_3 0-bz922792_dht-client-5 = rhs-2:/bricks/bz922792_dht_3 -> gfid:cd99da3a-04d5-49de-b22f-d44aef5fa340 From the client log of rhs-1, I think that this is the start of the problem: [2013-04-11 14:23:54.328021] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 0-bz922792_dht-client-4: remote operation failed: File exists. Path: /foo [2013-04-11 14:23:54.328789] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 0-bz922792_dht-client-5: remote operation failed: File exists. Path: /foo ... [2013-04-11 14:25:07.032185] W [client-rpc-fops.c:2604:client3_3_lookup_cbk] 0-bz922792_dht-client-5: remote operation failed: Stale NFS file handle. Path: /foo (05dda1ef-a857-498e-bb98-9eae513ad811) [2013-04-11 14:25:07.032220] W [client-rpc-fops.c:2604:client3_3_lookup_cbk] 0-bz922792_dht-client-4: remote operation failed: Stale NFS file handle. Path: /foo (05dda1ef-a857-498e-bb98-9eae513ad811) [2013-04-11 14:25:07.033762] W [dht-common.c:419:dht_lookup_dir_cbk] 0-bz922792_dht-dht: /foo: gfid different on bz922792_dht-client-1 [2013-04-11 14:25:07.033798] W [dht-common.c:419:dht_lookup_dir_cbk] 0-bz922792_dht-dht: /foo: gfid different on bz922792_dht-client-0 [2013-04-11 14:25:07.033823] W [dht-common.c:419:dht_lookup_dir_cbk] 0-bz922792_dht-dht: /foo: gfid different on bz922792_dht-client-3 [2013-04-11 14:25:07.033855] W [dht-common.c:419:dht_lookup_dir_cbk] 0-bz922792_dht-dht: /foo: gfid different on bz922792_dht-client-2 [2013-04-11 14:25:07.035677] W [dht-common.c:419:dht_lookup_dir_cbk] 0-bz922792_dht-dht: /foo: gfid different on bz922792_dht-client-2 [2013-04-11 14:25:07.035721] W [dht-common.c:419:dht_lookup_dir_cbk] 0-bz922792_dht-dht: /foo: gfid different on bz922792_dht-client-1 [2013-04-11 14:25:07.035756] W [dht-common.c:419:dht_lookup_dir_cbk] 0-bz922792_dht-dht: /foo: gfid different on bz922792_dht-client-0 [2013-04-11 14:25:07.035779] W [dht-common.c:419:dht_lookup_dir_cbk] 0-bz922792_dht-dht: /foo: gfid different on bz922792_dht-client-3 ... [2013-04-11 14:25:07.053041] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-bz922792_dht-client-1: remote operation failed: No such file or directory. Path: /foo (cd99da3a-04d5-49de-b22f-d44aef5fa340) [2013-04-11 14:25:07.053073] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-bz922792_dht-client-0: remote operation failed: No such file or directory. Path: /foo (cd99da3a-04d5-49de-b22f-d44aef5fa340) [2013-04-11 14:25:07.053102] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-bz922792_dht-client-3: remote operation failed: No such file or directory. Path: /foo (cd99da3a-04d5-49de-b22f-d44aef5fa340) [2013-04-11 14:25:07.053124] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-bz922792_dht-client-2: remote operation failed: No such file or directory. Path: /foo (cd99da3a-04d5-49de-b22f-d44aef5fa340) From rhs-2, the first messages concerning the same gfids: [2013-04-11 14:23:54.357739] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-bz922792_dht-client-0: remote operation failed: No such file or directory. Path: /foo (cd99da3a-04d5-49de-b22f-d44aef5fa340) [2013-04-11 14:23:54.357804] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-bz922792_dht-client-1: remote operation failed: No such file or directory. Path: /foo (cd99da3a-04d5-49de-b22f-d44aef5fa340) [2013-04-11 14:23:54.357832] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-bz922792_dht-client-3: remote operation failed: No such file or directory. Path: /foo (cd99da3a-04d5-49de-b22f-d44aef5fa340) [2013-04-11 14:23:54.357868] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-bz922792_dht-client-2: remote operation failed: No such file or directory. Path: /foo (cd99da3a-04d5-49de-b22f-d44aef5fa340) ... [2013-04-11 14:25:57.053218] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-bz922792_dht-client-5: remote operation failed: No such file or directory. Path: /foo (05dda1ef-a857-498e-bb98-9eae513ad811) [2013-04-11 14:25:57.053254] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-bz922792_dht-client-4: remote operation failed: No such file or directory. Path: /foo (05dda1ef-a857-498e-bb98-9eae513ad811) The first mkdir operation seems to have succeeded for 0-bz922792_dht-client-0 to 0-bz922792_dht-client-3, but failed on the two bricks which have a different gfid.
COMMIT: https://review.gluster.org/15472 committed in master by Raghavendra G (rgowdapp) ------ commit 4076b73b2f4fb3cca0737974b124f33f76f9c9c1 Author: Kotresh HR <khiremat> Date: Tue Jan 3 02:35:06 2017 -0500 feature/dht: Directory synchronization Design doc: https://review.gluster.org/16876 Directory creation is now synchronized with blocking inodelk of the parent on the hashed subvolume followed by the entrylk on the hashed subvolume between dht_mkdir, dht_rmdir, dht_rename_dir and lookup selfheal mkdir. To maintain internal consistency of directories across all subvols of dht, we need locks. Specifically we are interested in: 1. Consistency of layout of a directory. Only one writer should modify the layout at a time. A writer (layout setting during directory heal as part of lookup) shouldn't modify the layout while there are readers (all other fops like create, mkdir etc., which consume layout) and readers shouldn't read the layout while a writer is in progress. Readers can read the layout simultaneously. Writer takes a WRITE inodelk on the directory (whose layout is being modified) across ALL subvols. Reader takes a READ inodelk on the directory (whose layout is being read) on ANY subvol. 2. Consistency of directory namespace across subvols. The path and associated gfid should be same on all subvols. A gfid should not be associated with more than one path on any subvol. All fops that can change directory names (mkdir, rmdir, renamedir, directory creation phase in lookup-heal) takes an entrylk on hashed subvol of the directory. NOTE1: In point 2 above, since dht takes entrylk on hashed subvol of a directory, the transaction itself is a consumer of layout on parent directory. So, the transaction is a reader of parent layout and does an inodelk on parent directory just like any other layout reader. So a mkdir (dir/subdir) would: > Acquire a READ inodelk on "dir" on any subvol. > Acquire an entrylk (dir, "subdir") on hashed subvol of "subdir". > creates directory on hashed subvol and possibly on non-hashed subvols. > UNLOCK (entrylk) > UNLOCK (inodelk) NOTE2: mkdir fop while setting the layout of the directory being created is considered as a reader, but NOT a writer. The reason is for a fop which can consume the layout of a directory to come either of the following conditions has to be true: > mkdir syscall from application has to complete. In this case no need of synchronization. > A lookup issued on the directory racing with mkdir has to complete. Since layout setting by a lookup is considered as a writer, only one of either mkdir or lookup will set the layout. Code re-organization: All the lock related routines are moved to "dht-lock.c" file. New wrapper function is introduced to take blocking inodelk followed by entrylk 'dht_protect_namespace' Updates #191 Change-Id: I01569094dfbe1852de6f586475be79c1ba965a31 Signed-off-by: Kotresh HR <khiremat> BUG: 1443373 Reviewed-on: https://review.gluster.org/15472 NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Raghavendra G <rgowdapp> Smoke: Gluster Build System <jenkins.org>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.11.0, please open a new bug report. glusterfs-3.11.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2017-May/000073.html [2] https://www.gluster.org/pipermail/gluster-users/