Description of problem: Able to mount subdir which is in gfid split-brain Version-Release number of selected component (if applicable): Build Used: glusterfs-3.12.2-14.el7rhgs.x86_64 How reproducible: Always Steps to Reproduce: 1) create 1 * 2 volume and start 2) create gfid directory split-brain ( let say dir1 ) 3) mount the subdir which was in split-brain ( dir1 ) Actual results: mount is successful and able to write data into split-brain dir Expected results: Shouldn't mount the dir which is in split-brain Additional info: > Before mounting, below are the attributes N1: # getfattr -d -m . -e hex /bricks/brick4/b0/dir1 getfattr: Removing leading '/' from absolute path names # file: bricks/brick4/b0/dir1 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.sb12-client-1=0x000000000000000400000001 trusted.gfid=0xaa7ed3bba30c43b19e6aa3f184fb28f3 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 # N2: # getfattr -d -m . -e hex /bricks/brick4/b1/dir1 getfattr: Removing leading '/' from absolute path names # file: bricks/brick4/b1/dir1 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.sb12-client-0=0x000000000000000400000001 trusted.gfid=0x568f03de21754c8a812ca07277d18b0a trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 # # gluster vol heal sb12 info Brick 10.70.47.45:/bricks/brick4/b0 /dir1 / - Is in split-brain Status: Connected Number of entries: 2 Brick 10.70.47.144:/bricks/brick4/b1 <gfid:568f03de-2175-4c8a-812c-a07277d18b0a> / - Is in split-brain Status: Connected Number of entries: 2 # > subdir mount : # mount -t glusterfs 10.70.47.45:/sb12/dir1 /mnt/subdir_sb12 [root@dhcp35-125 ~]# df -h | grep subdir_sb12 10.70.47.45:sb12/dir1 40G 441M 40G 2% /mnt/subdir_sb12 # cd /mnt/subdir_sb12/ # touch f{1..3} # ls f1 f2 f3 # From the logs, it looks like its performs healing and cleared the attributes for the directory when we issue mount on subdir [2018-07-27 10:37:12.079828] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-sb12-replicate-0: performing metadata selfheal on 00000000-0000-0000-0000-000000000001 [2018-07-27 10:37:12.080325] W [MSGID: 108027] [afr-common.c:2841:afr_discover_done] 0-sb12-replicate-0: no read subvols for / [2018-07-27 10:37:12.096556] I [MSGID: 108026] [afr-self-heal-common.c:1724:afr_log_selfheal] 0-sb12-replicate-0: Completed metadata selfheal on 00000000-0000-0000-0000-000000000001. sources=[0] sinks=1 [2018-07-27 10:37:12.104257] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-sb12-replicate-0: performing entry selfheal on 00000000-0000-0000-0000-000000000001 [2018-07-27 10:37:12.121502] I [MSGID: 108026] [afr-self-heal-common.c:1724:afr_log_selfheal] 0-sb12-replicate-0: Completed entry selfheal on 00000000-0000-0000-0000-000000000001. sources= sinks=0 1 > # getfattr -d -m . -e hex /bricks/brick4/b0/dir1 getfattr: Removing leading '/' from absolute path names # file: bricks/brick4/b0/dir1 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.sb12-client-1=0x000000000000000000000000 trusted.gfid=0xaa7ed3bba30c43b19e6aa3f184fb28f3 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 # # getfattr -d -m . -e hex /bricks/brick4/b1/dir1 getfattr: Removing leading '/' from absolute path names # file: bricks/brick4/b1/dir1 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.sb12-client-0=0x000000000000000000000000 trusted.gfid=0x568f03de21754c8a812ca07277d18b0a trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 # > ran gluster-health-report tool and there is one error due to script issue. SOS Reports: http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/vavuthu/fuse_sub_dir_split_brain/
Moving this bug out of 3.4.0 as this doesn't meet blocker criteria. This can be re-proposed for 3.4.0 if required.
Can this be tried with replica 3 volume? A replica 2 volume type is not supported officially.
Hi I am able to recreate the issue in 1X3 volume, 1. Create 1X3 volume. 2. Change quorum options to create a split-brain directory in replica 3 set. 3. Split-brain was create successfully # gluster v heal vol_36411cb4f9b145b165212aeaf0ca2588 info Brick 10.70.47.7:/var/lib/heketi/mounts/vg_1a2cebdd439fca0eb9d5197d6a6ca504/brick_25836af43f1427d9fe24c06feebbb1c7/brick / - Is in split-brain Status: Connected Number of entries: 1 Brick 10.70.47.108:/var/lib/heketi/mounts/vg_690b7b8be089c66b07c1259811ef6dbc/brick_04c52fc4af911b40730665ef9203304a/brick / - Is in split-brain Status: Connected Number of entries: 1 Brick 10.70.46.206:/var/lib/heketi/mounts/vg_bb4c74213d62f197a5baed1abad3df73/brick_d2469a52e6c4f94556140cec1de5582e/brick / - Is in split-brain Status: Connected Number of entries: 1 4. Change volume quorum option back to auto 4. Mount the directory to a client # mount -t glusterfs 10.70.47.108:vol_36411cb4f9b145b165212aeaf0ca2588/dir1 /mnt/split/ Mount was successful. 5. Write files from mount point # touch f{1..10} # ls f1 f10 f2 f3 f4 f5 f6 f7 f8 f9 # echo "Hi" >>f1 lit]# cat f1 Hi Expected Result: Shouldn't mount the dir which is in split-brain
Gluster fuse-subdir mount is a new feature (GA in 3.4), and the issue was discovered in 3.4 build. Issue is reproducible in latest build # rpm -qa | grep gluster python2-gluster-3.12.2-29.el7rhgs.x86_64 glusterfs-debuginfo-3.12.2-29.el7rhgs.x86_64 glusterfs-libs-3.12.2-29.el7rhgs.x86_64 glusterfs-geo-replication-3.12.2-29.el7rhgs.x86_64 glusterfs-3.12.2-29.el7rhgs.x86_64
Was trying to understand what to do in this case. Technically, there is no way for a client to know that there is a GFID mismatch in this scenario, mainly because client sees the GFID of this directory as 0x01 (ie, root). Server would know the GFID, but it won't have visibility into what is the 'correct' GFID for the directory, as it won't have the cluster view. I would like to understand from PM (or any others) about what should be the right behavior in this scenario ? If the right step is failing the mount by stating the file is not in correct state, then we need to make sure we change design etc. The quick and dirty fix is handling this scenario in mount.glusterfs script by checking the directory in bigger volume as temp mount first, before continuing the actual mount with subdir. While this would be good enough to handle 99% of the cases, if one uses `glusterfs` command directly to mount subdirectory, then this would still be an issue. For now, I will wait for some discussion on this here, and then we can take decision on this.