+++ This bug was initially created as a clone of Bug #1323040 +++
Description of problem:
After rebalance changes the layouts of the directories on-disk, client's in-memory layout becomes stale. If a lookup was not sent on the directory (driven by higher layers), dht goes ahead and uses the stale layouts. This has a serious consequence during entry operations which normally rely on layout to determine a hashed subvol. Some of the manifestations of this problem we've seen are:
1. A directory having different gfid on different subvolumes (resulting from parallel mkdir of same path from different clients with some having up-to-date layout and some having stale layout).
2. A file with data-file being present on different subvols and having different gfid (resulting from parallel create of same file from different clients with some having up-to-date layout and some having stale layout).
Version-Release number of selected component (if applicable):
Steps to Reproduce:
Set up a dist-rep volume, maybe 6x2.
1. Create a data set with a large number of directories- fairly deep and several
dirs at each level
2. Add several bricks.
3. From multiple NFS clients run the same script to create multiple dirs inside
the ones already created. We want different clients to try creating the same
dirs so only one should succeed.
4. While the script is running, start a rebalance.
The issue we want to test is a mkdir issue during rebalance when different
clients have different in memory layouts for the parent dirs.
Same dir has different gfids on different subvols.
To find the issue use following steps, once your test is complete.
1. Have a fuse mount with use-readdirp=no and disable attribute/entry caching.
[root@unused ~]# mount -t glusterfs -o entry-timeout=0,attribute-timeout=0,use-readdirp=no localhost:/dist /mnt/glusterfs
[root@unused ~]# ps ax | grep -i readdirp
30801 ? Ssl 0:00 /usr/local/sbin/glusterfs --use-readdirp=no --attribute-timeout=0 --entry-timeout=0 --volfile-server=localhost --volfile-id=/dist /mnt/glusterfs
2. turn off md-cache/stat-prefetch
[root@unused ~]# gluster volume set dist performance.stat-prefetch off
volume set: success
3. Now do a crawl of the entire glusterfs.
[root@unused ~]# find /mnt/glusterfs > /dev/null
4. Look for mount log for MGSID: 109009
[root@unused ~]# grep "MSGID: 109009" /var/log/glusterfs/mnt-glusterfs.log
[2016-03-30 06:00:18.762188] W [MSGID: 109009] [dht-common.c:571ht_lookup_dir_cbk] 0-dist-dht: /dir: gfid different on dist-client-9. gfid local = cd4adbd2-823b-4feb-82eb-b0011d71cfec, gfid subvol = cafedbd2-823b-4feb-82eb-b0011d71babe
[2016-03-30 06:00:22.596947] W [MSGID: 109009] [dht-common.c:571ht_lookup_dir_cbk] 0-dist-dht: /dir: gfid different on dist-client-9. gfid local = cd4adbd2-823b-4feb-82eb-b0011d71cfec, gfid subvol = cafedbd2-823b-4feb-82eb-b0011d71babe
1. Not more than one among mkdirs issued on same path from multiple clients should succeed.
2. No directory should have different gfid on different subvols.
Verified the bug in build - glusterfs-3.7.9-5
Following steps were used to verify the fix
1) create a pure distribute volume
2) Fuse mount the volume on a client, say client-1
3) gdb into the process of mount and have a breakpoint on 'dht_mkdir'
4) create a directory from client-1
5) print the value of hashed_subvol->name. Note down the value
6) Add more bricks
7) Run fix layout
8) Fuse mount the volume on a new client, say client-2
9) gdb into the process of mount and have a breakpoint on 'dht_mkdir_hashed_cbk'
10) print the value of hashed_subvol->name. Ensure this value is not the same as the value in step 5
11) allow both process to continue
12) check if directory is created and gfid of the directory created on all the sub-vols are same
Additionally, the test mentioned in steps to reproduce was also tried. The issue is no more seen. Marking the bug as verified.
Doc text is fine.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.