1329062 – Inconsistent directory structure on dht subvols caused by parent layouts going stale during entry create operations because of fix-layout

Bug 1329062 - Inconsistent directory structure on dht subvols caused by parent layouts going stale during entry create operations because of fix-layout

Summary: Inconsistent directory structure on dht subvols caused by parent layouts goin...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	distribute
Sub Component:
Version:	3.7.11
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1323040
Blocks:	1323042
TreeView+	depends on / blocked

Reported:	2016-04-21 05:18 UTC by Raghavendra G
Modified:	2016-06-28 12:14 UTC (History)
CC List:	2 users (show)
Fixed In Version:	glusterfs-3.7.12
Clone Of:	1323040
Environment:
Last Closed:	2016-06-28 12:14:39 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Raghavendra G 2016-04-21 05:18:06 UTC

+++ This bug was initially created as a clone of Bug #1323040 +++

Description of problem:
After rebalance changes the layouts of the directories on-disk, client's in-memory layout becomes stale. If a lookup was not sent on the directory (driven by higher layers), dht goes ahead and uses the stale layouts. This has a serious consequence during entry operations which normally rely on layout to determine a hashed subvol. Some of the manifestations of this problem we've seen are:

1. A directory having different gfid on different subvolumes (resulting from parallel mkdir of same path from different clients with some having up-to-date layout and some having stale layout).
2. A file with data-file being present on different subvols and having different gfid (resulting from parallel create of same file from different clients with some having up-to-date layout and some having stale layout).

Version-Release number of selected component (if applicable):


How reproducible:
Quite consistently

Steps to Reproduce:

Set up a dist-rep volume, maybe 6x2.
1. Create a data set with a large number of directories- fairly deep and several
dirs at each level
2. Add several bricks.
3. From multiple NFS clients run the same script to create multiple dirs inside
the ones already created. We want different clients to try creating the same
dirs so only one should succeed.
4. While the script is running, start a rebalance.
 
The issue we want to test is a mkdir issue during rebalance when different
clients have different in memory layouts for the parent dirs.

Actual results:

Same dir has different gfids on different subvols.

To find the issue use following steps, once your test is complete.

1. Have a fuse mount with use-readdirp=no and disable attribute/entry caching.

[root@unused ~]# mount -t glusterfs -o entry-timeout=0,attribute-timeout=0,use-readdirp=no localhost:/dist /mnt/glusterfs
[root@unused ~]# ps ax | grep -i readdirp
30801 ?        Ssl    0:00 /usr/local/sbin/glusterfs --use-readdirp=no --attribute-timeout=0 --entry-timeout=0 --volfile-server=localhost --volfile-id=/dist /mnt/glusterfs

2. turn off md-cache/stat-prefetch 

[root@unused ~]# gluster volume set dist performance.stat-prefetch off
volume set: success

3. Now do a crawl of the entire glusterfs.
  [root@unused ~]# find /mnt/glusterfs > /dev/null

4. Look for mount log for MGSID: 109009

[root@unused ~]# grep "MSGID: 109009" /var/log/glusterfs/mnt-glusterfs.log 

[2016-03-30 06:00:18.762188] W [MSGID: 109009] [dht-common.c:571ht_lookup_dir_cbk] 0-dist-dht: /dir: gfid different on dist-client-9. gfid local = cd4adbd2-823b-4feb-82eb-b0011d71cfec, gfid subvol = cafedbd2-823b-4feb-82eb-b0011d71babe
[2016-03-30 06:00:22.596947] W [MSGID: 109009] [dht-common.c:571ht_lookup_dir_cbk] 0-dist-dht: /dir: gfid different on dist-client-9. gfid local = cd4adbd2-823b-4feb-82eb-b0011d71cfec, gfid subvol = cafedbd2-823b-4feb-82eb-b0011d71babe

Expected results:
1. Not more than one among mkdirs issued on same path from multiple clients should succeed.
2. No directory should have different gfid on different subvols.

Additional info:

--- Additional comment from Vijay Bellur on 2016-04-01 05:55:50 EDT ---

REVIEW: http://review.gluster.org/13885 (cluster/distribute: detect stale layouts in entry fops) posted (#1) for review on master by Raghavendra G (rgowdapp)

--- Additional comment from Vijay Bellur on 2016-04-01 05:59:56 EDT ---

REVIEW: http://review.gluster.org/13885 (cluster/distribute: detect stale layouts in entry fops) posted (#2) for review on master by Raghavendra G (rgowdapp)

--- Additional comment from Vijay Bellur on 2016-04-02 01:03:15 EDT ---

REVIEW: http://review.gluster.org/13885 (cluster/distribute: detect stale layouts in entry fops) posted (#3) for review on master by Raghavendra G (rgowdapp)

--- Additional comment from Vijay Bellur on 2016-04-06 03:12:02 EDT ---

REVIEW: http://review.gluster.org/13885 (cluster/distribute: detect stale layouts in entry fops) posted (#4) for review on master by Raghavendra G (rgowdapp)

--- Additional comment from Vijay Bellur on 2016-04-06 03:15:05 EDT ---

REVIEW: http://review.gluster.org/13885 (cluster/distribute: detect stale layouts in entry fops) posted (#5) for review on master by Raghavendra G (rgowdapp)

--- Additional comment from Vijay Bellur on 2016-04-13 01:13:47 EDT ---

REVIEW: http://review.gluster.org/13885 (cluster/distribute: detect stale layouts in entry fops) posted (#6) for review on master by Raghavendra G (rgowdapp)

--- Additional comment from Vijay Bellur on 2016-04-13 01:15:16 EDT ---

REVIEW: http://review.gluster.org/13885 (cluster/distribute: detect stale layouts in entry fops) posted (#7) for review on master by Raghavendra G (rgowdapp)

--- Additional comment from Vijay Bellur on 2016-04-13 01:26:57 EDT ---

REVIEW: http://review.gluster.org/13885 (cluster/distribute: detect stale layouts in entry fops) posted (#8) for review on master by Raghavendra G (rgowdapp)

--- Additional comment from Vijay Bellur on 2016-04-14 02:17:12 EDT ---

REVIEW: http://review.gluster.org/13885 (cluster/distribute: detect stale layouts in entry fops) posted (#9) for review on master by Raghavendra G (rgowdapp)

--- Additional comment from Vijay Bellur on 2016-04-15 00:40:55 EDT ---

REVIEW: http://review.gluster.org/13885 (cluster/distribute: detect stale layouts in entry fops) posted (#10) for review on master by Raghavendra G (rgowdapp)

--- Additional comment from Vijay Bellur on 2016-04-19 01:21:47 EDT ---

REVIEW: http://review.gluster.org/13885 (cluster/distribute: detect stale layouts in entry fops) posted (#11) for review on master by Raghavendra G (rgowdapp)

--- Additional comment from Vijay Bellur on 2016-04-19 02:34:26 EDT ---

REVIEW: http://review.gluster.org/13885 (cluster/distribute: detect stale layouts in entry fops) posted (#12) for review on master by Raghavendra G (rgowdapp)

--- Additional comment from Vijay Bellur on 2016-04-20 08:59:04 EDT ---

REVIEW: http://review.gluster.org/13885 (cluster/distribute: detect stale layouts in entry fops) posted (#13) for review on master by Raghavendra G (rgowdapp)

--- Additional comment from Vijay Bellur on 2016-04-20 23:58:34 EDT ---

REVIEW: http://review.gluster.org/13885 (cluster/distribute: detect stale layouts in entry fops) posted (#14) for review on master by Raghavendra G (rgowdapp)

Comment 1 Vijay Bellur 2016-04-22 12:39:32 UTC

REVIEW: http://review.gluster.org/14040 (cluster/distribute: detect stale layouts in entry fops) posted (#6) for review on release-3.7 by Raghavendra G (rgowdapp)

Comment 2 Vijay Bellur 2016-04-25 04:55:12 UTC

REVIEW: http://review.gluster.org/14040 (cluster/distribute: detect stale layouts in entry fops) posted (#7) for review on release-3.7 by Raghavendra G (rgowdapp)

Comment 3 Vijay Bellur 2016-04-25 09:30:48 UTC

COMMIT: http://review.gluster.org/14040 committed in release-3.7 by Raghavendra G (rgowdapp) 
------
commit 0ce1a038ab54a52a0c295e830abe035d4113ba83
Author: Raghavendra G <rgowdapp>
Date:   Fri Apr 1 15:16:23 2016 +0530

    cluster/distribute: detect stale layouts in entry fops
    
    dht_mkdir ()
    {
          first-hashed-subvol = hashed-subvol for "bname" in in-memory
                                layout of "parent";
          inodelk (SETLKW, parent, "LAYOUT_HEAL_DOMAIN", "can be any
                   subvol, but we choose first-hashed-subvol randomly");
          {
    begin:
                hashed-subvol = hashed-subvol for "bname" in in-memory
                                layout of "parent";
                hash-range = extract hashe-range from layout of "parent";
    
                ret = mkdir (parent/bname, hashed-subvol, hash-range);
                if (ret == "hash-value doesn't fall into layout stored on
                           the brick (this error is returned by posix-mkdir)")
                {
                    refresh_parent_layout ();
                    goto begin;
                }
    
          }
          inodelk (UNLCK, parent, "LAYOUT_HEAL_DOMAIN",
                   "first-hashed-subvol");
    
          proceed with other parts of dht_mkdir;
    }
    
    posix_mkdir (parent/bname, client-hash-range)
    {
    
           disk-hash-range = getxattr (parent, "dht-layout-key");
           if (disk-hash-range != client-hash-range) {
                  fail-with-error ("hash-value doesn't fall into layout
                                    stored on the brick");
                  return 0;
           }
    
           continue-with-posix-mkdir;
    }
    
    Similar changes need to be done for dentry operations like create,
    symlink, link, unlink, rmdir, rename. These will be addressed in
    subsequent patches. This patch addresses only mkdir codepath.
    
    This change breaks stripe tests, as on some striped subvols dht layout
    xattrs are not set for some reason. This results in failure of
    mkdir. Since striped volumes are always created with dht, some tests
    associated with stripe also fail. So, I am making following tests
    changes (since stripe is out of maintainance):
    * modify ./tests/basic/rpc-coverage.t to not to use striped volumes
    * mark all (2) tests in tests/bugs/stripe/ as bad tests
    
    Change-Id: Idd1ae879f24a48303dc743c1bb4d91f89a629e25
    BUG: 1329062
    Signed-off-by: Raghavendra G <rgowdapp>
    Reviewed-on: http://review.gluster.org/14040
    Smoke: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>

Comment 4 Kaushal 2016-06-28 12:14:39 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.12, please open a new bug report.

glusterfs-3.7.12 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/gluster-devel/2016-June/049918.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.