+++ This bug was initially created as a clone of Bug #1323040 +++ Description of problem: After rebalance changes the layouts of the directories on-disk, client's in-memory layout becomes stale. If a lookup was not sent on the directory (driven by higher layers), dht goes ahead and uses the stale layouts. This has a serious consequence during entry operations which normally rely on layout to determine a hashed subvol. Some of the manifestations of this problem we've seen are: 1. A directory having different gfid on different subvolumes (resulting from parallel mkdir of same path from different clients with some having up-to-date layout and some having stale layout). 2. A file with data-file being present on different subvols and having different gfid (resulting from parallel create of same file from different clients with some having up-to-date layout and some having stale layout). Version-Release number of selected component (if applicable): How reproducible: Quite consistently Steps to Reproduce: Set up a dist-rep volume, maybe 6x2. 1. Create a data set with a large number of directories- fairly deep and several dirs at each level 2. Add several bricks. 3. From multiple NFS clients run the same script to create multiple dirs inside the ones already created. We want different clients to try creating the same dirs so only one should succeed. 4. While the script is running, start a rebalance. The issue we want to test is a mkdir issue during rebalance when different clients have different in memory layouts for the parent dirs. Actual results: Same dir has different gfids on different subvols. To find the issue use following steps, once your test is complete. 1. Have a fuse mount with use-readdirp=no and disable attribute/entry caching. [root@unused ~]# mount -t glusterfs -o entry-timeout=0,attribute-timeout=0,use-readdirp=no localhost:/dist /mnt/glusterfs [root@unused ~]# ps ax | grep -i readdirp 30801 ? Ssl 0:00 /usr/local/sbin/glusterfs --use-readdirp=no --attribute-timeout=0 --entry-timeout=0 --volfile-server=localhost --volfile-id=/dist /mnt/glusterfs 2. turn off md-cache/stat-prefetch [root@unused ~]# gluster volume set dist performance.stat-prefetch off volume set: success 3. Now do a crawl of the entire glusterfs. [root@unused ~]# find /mnt/glusterfs > /dev/null 4. Look for mount log for MGSID: 109009 [root@unused ~]# grep "MSGID: 109009" /var/log/glusterfs/mnt-glusterfs.log [2016-03-30 06:00:18.762188] W [MSGID: 109009] [dht-common.c:571ht_lookup_dir_cbk] 0-dist-dht: /dir: gfid different on dist-client-9. gfid local = cd4adbd2-823b-4feb-82eb-b0011d71cfec, gfid subvol = cafedbd2-823b-4feb-82eb-b0011d71babe [2016-03-30 06:00:22.596947] W [MSGID: 109009] [dht-common.c:571ht_lookup_dir_cbk] 0-dist-dht: /dir: gfid different on dist-client-9. gfid local = cd4adbd2-823b-4feb-82eb-b0011d71cfec, gfid subvol = cafedbd2-823b-4feb-82eb-b0011d71babe Expected results: 1. Not more than one among mkdirs issued on same path from multiple clients should succeed. 2. No directory should have different gfid on different subvols. Additional info: --- Additional comment from Vijay Bellur on 2016-04-01 05:55:50 EDT --- REVIEW: http://review.gluster.org/13885 (cluster/distribute: detect stale layouts in entry fops) posted (#1) for review on master by Raghavendra G (rgowdapp) --- Additional comment from Vijay Bellur on 2016-04-01 05:59:56 EDT --- REVIEW: http://review.gluster.org/13885 (cluster/distribute: detect stale layouts in entry fops) posted (#2) for review on master by Raghavendra G (rgowdapp) --- Additional comment from Vijay Bellur on 2016-04-02 01:03:15 EDT --- REVIEW: http://review.gluster.org/13885 (cluster/distribute: detect stale layouts in entry fops) posted (#3) for review on master by Raghavendra G (rgowdapp) --- Additional comment from Vijay Bellur on 2016-04-06 03:12:02 EDT --- REVIEW: http://review.gluster.org/13885 (cluster/distribute: detect stale layouts in entry fops) posted (#4) for review on master by Raghavendra G (rgowdapp) --- Additional comment from Vijay Bellur on 2016-04-06 03:15:05 EDT --- REVIEW: http://review.gluster.org/13885 (cluster/distribute: detect stale layouts in entry fops) posted (#5) for review on master by Raghavendra G (rgowdapp) --- Additional comment from Vijay Bellur on 2016-04-13 01:13:47 EDT --- REVIEW: http://review.gluster.org/13885 (cluster/distribute: detect stale layouts in entry fops) posted (#6) for review on master by Raghavendra G (rgowdapp) --- Additional comment from Vijay Bellur on 2016-04-13 01:15:16 EDT --- REVIEW: http://review.gluster.org/13885 (cluster/distribute: detect stale layouts in entry fops) posted (#7) for review on master by Raghavendra G (rgowdapp) --- Additional comment from Vijay Bellur on 2016-04-13 01:26:57 EDT --- REVIEW: http://review.gluster.org/13885 (cluster/distribute: detect stale layouts in entry fops) posted (#8) for review on master by Raghavendra G (rgowdapp) --- Additional comment from Vijay Bellur on 2016-04-14 02:17:12 EDT --- REVIEW: http://review.gluster.org/13885 (cluster/distribute: detect stale layouts in entry fops) posted (#9) for review on master by Raghavendra G (rgowdapp) --- Additional comment from Vijay Bellur on 2016-04-15 00:40:55 EDT --- REVIEW: http://review.gluster.org/13885 (cluster/distribute: detect stale layouts in entry fops) posted (#10) for review on master by Raghavendra G (rgowdapp) --- Additional comment from Vijay Bellur on 2016-04-19 01:21:47 EDT --- REVIEW: http://review.gluster.org/13885 (cluster/distribute: detect stale layouts in entry fops) posted (#11) for review on master by Raghavendra G (rgowdapp) --- Additional comment from Vijay Bellur on 2016-04-19 02:34:26 EDT --- REVIEW: http://review.gluster.org/13885 (cluster/distribute: detect stale layouts in entry fops) posted (#12) for review on master by Raghavendra G (rgowdapp) --- Additional comment from Vijay Bellur on 2016-04-20 08:59:04 EDT --- REVIEW: http://review.gluster.org/13885 (cluster/distribute: detect stale layouts in entry fops) posted (#13) for review on master by Raghavendra G (rgowdapp) --- Additional comment from Vijay Bellur on 2016-04-20 23:58:34 EDT --- REVIEW: http://review.gluster.org/13885 (cluster/distribute: detect stale layouts in entry fops) posted (#14) for review on master by Raghavendra G (rgowdapp)
REVIEW: http://review.gluster.org/14040 (cluster/distribute: detect stale layouts in entry fops) posted (#6) for review on release-3.7 by Raghavendra G (rgowdapp)
REVIEW: http://review.gluster.org/14040 (cluster/distribute: detect stale layouts in entry fops) posted (#7) for review on release-3.7 by Raghavendra G (rgowdapp)
COMMIT: http://review.gluster.org/14040 committed in release-3.7 by Raghavendra G (rgowdapp) ------ commit 0ce1a038ab54a52a0c295e830abe035d4113ba83 Author: Raghavendra G <rgowdapp> Date: Fri Apr 1 15:16:23 2016 +0530 cluster/distribute: detect stale layouts in entry fops dht_mkdir () { first-hashed-subvol = hashed-subvol for "bname" in in-memory layout of "parent"; inodelk (SETLKW, parent, "LAYOUT_HEAL_DOMAIN", "can be any subvol, but we choose first-hashed-subvol randomly"); { begin: hashed-subvol = hashed-subvol for "bname" in in-memory layout of "parent"; hash-range = extract hashe-range from layout of "parent"; ret = mkdir (parent/bname, hashed-subvol, hash-range); if (ret == "hash-value doesn't fall into layout stored on the brick (this error is returned by posix-mkdir)") { refresh_parent_layout (); goto begin; } } inodelk (UNLCK, parent, "LAYOUT_HEAL_DOMAIN", "first-hashed-subvol"); proceed with other parts of dht_mkdir; } posix_mkdir (parent/bname, client-hash-range) { disk-hash-range = getxattr (parent, "dht-layout-key"); if (disk-hash-range != client-hash-range) { fail-with-error ("hash-value doesn't fall into layout stored on the brick"); return 0; } continue-with-posix-mkdir; } Similar changes need to be done for dentry operations like create, symlink, link, unlink, rmdir, rename. These will be addressed in subsequent patches. This patch addresses only mkdir codepath. This change breaks stripe tests, as on some striped subvols dht layout xattrs are not set for some reason. This results in failure of mkdir. Since striped volumes are always created with dht, some tests associated with stripe also fail. So, I am making following tests changes (since stripe is out of maintainance): * modify ./tests/basic/rpc-coverage.t to not to use striped volumes * mark all (2) tests in tests/bugs/stripe/ as bad tests Change-Id: Idd1ae879f24a48303dc743c1bb4d91f89a629e25 BUG: 1329062 Signed-off-by: Raghavendra G <rgowdapp> Reviewed-on: http://review.gluster.org/14040 Smoke: Gluster Build System <jenkins.com> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.com>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.12, please open a new bug report. glusterfs-3.7.12 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://www.gluster.org/pipermail/gluster-devel/2016-June/049918.html [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user