Description of problem: ======================= Georep session went to faulty with following errors in geo-rep logs: [2015-12-24 10:57:45.463694] E [syncdutils(/rhs/brick2/ct-8):276:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 165, in main main_i() File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 662, in main_i local.service_loop(*[r for r in [remote] if r]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1439, in service_loop g3.crawlwrap(oneshot=True) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 586, in crawlwrap '.', '.'.join([str(self.uuid), str(gconf.slave_id)])) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 323, in ff return f(*a) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 489, in stime_mnt 8) File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 55, in lgetxattr return cls._query_xattr(path, siz, 'lgetxattr', attr) File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 47, in _query_xattr cls.raise_oserr() File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 37, in raise_oserr raise OSError(errn, os.strerror(errn)) OSError: [Errno 5] Input/output error getfattr on slave mount logs: Input/Output error [root@dhcp37-133 ~]# getfattr -d -m . -e hex /mnt/test/ getfattr: Removing leading '/' from absolute path names # file: mnt/test/ security.selinux=0x73797374656d5f753a6f626a6563745f723a6675736566735f743a733000 /mnt/test/: trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.5fc0f095-23c4-4b96-9d94-69decd14f1d4.stime: Input/output error trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.xtime=0x3000 trusted.tier.tier-dht=0x000000010000000000000000ffffffff trusted.tier.tier-dht.commithash=0x3330313736383334313800 [root@dhcp37-133 ~]# [root@dhcp37-133 ~]# mount | grep test 10.70.37.165:/tiervolume on /mnt/test type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) [root@dhcp37-133 ~]# Client log snippet: =================== # less /var/log/glusterfs/mnt-test.log [2015-12-24 10:28:50.791227] W [MSGID: 122002] [ec-common.c:71:ec_heal_report] 0-tiervolume-disperse-1: Heal failed [Input/output error] The message "W [MSGID: 122002] [ec-common.c:71:ec_heal_report] 0-tiervolume-disperse-1: Heal failed [Input/output error]" repeated 6 times between [2015-12-24 10:28:50.791227] and [2015-12-24 10:28:51.062863] [2015-12-24 10:39:42.715503] W [MSGID: 122056] [ec-combine.c:866:ec_combine_check] 0-tiervolume-disperse-1: Mismatching xdata in answers of 'LOOKUP' [2015-12-24 10:39:42.718869] W [MSGID: 122053] [ec-common.c:116:ec_check_status] 0-tiervolume-disperse-1: Operation failed on some subvolumes (up=3F, mask=3F, remaining=0, good=3E, bad=1) [2015-12-24 10:39:42.727887] W [MSGID: 122002] [ec-common.c:71:ec_heal_report] 0-tiervolume-disperse-1: Heal failed [Input/output error] [2015-12-24 10:39:42.919641] N [MSGID: 122031] [ec-generic.c:1133:ec_combine_xattrop] 0-tiervolume-disperse-1: Mismatching dictionary in answers of 'GF_FOP_XATTROP' [2015-12-24 10:39:42.919750] W [MSGID: 122040] [ec-common.c:907:ec_prepare_update_cbk] 0-tiervolume-disperse-1: Failed to get size and version [Input/output error] [2015-12-24 10:39:42.926486] W [fuse-bridge.c:3355:fuse_xattr_cbk] 0-glusterfs-fuse: 15: GETXATTR(trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.5fc0f095-23c4-4b96-9d94-69decd14f1d4.stime) / => -1 (Input/output error) [2015-12-24 10:39:42.925954] N [MSGID: 122031] [ec-generic.c:1133:ec_combine_xattrop] 0-tiervolume-disperse-1: Mismatching dictionary in answers of 'GF_FOP_XATTROP' [2015-12-24 10:39:42.926445] W [MSGID: 122040] [ec-common.c:907:ec_prepare_update_cbk] 0-tiervolume-disperse-1: Failed to get size and version [Input/output error] [2015-12-24 10:58:58.908160] W [MSGID: 122056] [ec-combine.c:866:ec_combine_check] 0-tiervolume-disperse-1: Mismatching xdata in answers of 'LOOKUP' [2015-12-24 10:58:58.909422] W [MSGID: 122053] [ec-common.c:116:ec_check_status] 0-tiervolume-disperse-1: Operation failed on some subvolumes (up=3F, mask=3F, remaining=0, good=3E, bad=1) [2015-12-24 10:58:58.918637] W [MSGID: 122002] [ec-common.c:71:ec_heal_report] 0-tiervolume-disperse-1: Heal failed [Input/output error] [2015-12-24 10:58:58.922502] N [MSGID: 122031] [ec-generic.c:1133:ec_combine_xattrop] 0-tiervolume-disperse-1: Mismatching dictionary in answers of 'GF_FOP_XATTROP' [2015-12-24 10:58:58.924043] W [MSGID: 122053] [ec-common.c:116:ec_check_status] 0-tiervolume-disperse-1: Operation failed on some subvolumes (up=3F, mask=3E, remaining=0, good=3E, bad=1) The message "N [MSGID: 122031] [ec-generic.c:1133:ec_combine_xattrop] 0-tiervolume-disperse-1: Mismatching dictionary in answers of 'GF_FOP_XATTROP'" repeated 2 times between [2015-12-24 10:58:58.922502] and [2015-12-24 10:58:58.972485] [2015-12-24 10:58:58.973055] W [MSGID: 122040] [ec-common.c:907:ec_prepare_update_cbk] 0-tiervolume-disperse-1: Failed to get size and version [Input/output error] [2015-12-24 10:58:58.973187] W [fuse-bridge.c:3355:fuse_xattr_cbk] 0-glusterfs-fuse: 19: GETXATTR(trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.5fc0f095-23c4-4b96-9d94-69decd14f1d4.stime) / => -1 (Input/output error) [2015-12-24 10:58:58.989738] N [MSGID: 122031] [ec-generic.c:1133:ec_combine_xattrop] 0-tiervolume-disperse-1: Mismatching dictionary in answers of 'GF_FOP_XATTROP' Following are the ec.version for disperse subvolume 1: ====================================================== [root@dhcp37-165 ~]# getfattr -d -e hex -m . /rhs/brick2/ct-7/ getfattr: Removing leading '/' from absolute path names # file: rhs/brick2/ct-7/ security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.ec.version=0x00000000000000000000000000000011 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.5fc0f095-23c4-4b96-9d94-69decd14f1d4.stime=0x567bc34f00000000 trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.xtime=0x567bc348000b7704 trusted.glusterfs.dht=0x00000001000000007ffa1668ffffffff trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.size.1=0x000000000000000000000000000000000000000000000003 trusted.glusterfs.volume-id=0x1dd755248cc94d939b14518021c8df3f trusted.tier.tier-dht=0x000000010000000000000000ba2cafe3 trusted.tier.tier-dht.commithash=0x3330313736373533323400 [root@dhcp37-165 ~]# [root@dhcp37-133 ~]# getfattr -d -e hex -m . /rhs/brick2/ct-8/ getfattr: Removing leading '/' from absolute path names # file: rhs/brick2/ct-8/ security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.ec.dirty=0x0000000000000000000000000000000e trusted.ec.version=0x00000000000000000000000000000020 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.5fc0f095-23c4-4b96-9d94-69decd14f1d4.stime=0x567bc34c00000000 trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.xtime=0x567bc348000b781e trusted.glusterfs.dht=0x00000001000000007ffa1668ffffffff trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.size.1=0x000000000000000000000000000000000000000000000003 trusted.glusterfs.volume-id=0x1dd755248cc94d939b14518021c8df3f trusted.tier.tier-dht=0x000000010000000000000000ffffffff trusted.tier.tier-dht.commithash=0x3330313736383334313800 [root@dhcp37-133 ~]# [root@dhcp37-160 ~]# getfattr -d -e hex -m . /rhs/brick2/ct-9/ getfattr: Removing leading '/' from absolute path names # file: rhs/brick2/ct-9/ security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.ec.dirty=0x0000000000000000000000000000000e trusted.ec.version=0x00000000000000000000000000000020 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.5fc0f095-23c4-4b96-9d94-69decd14f1d4.stime=0x567bc34b00000000 trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.xtime=0x567bc348000b780d trusted.glusterfs.dht=0x00000001000000007ffa1668ffffffff trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.size.1=0x000000000000000000000000000000000000000000000003 trusted.glusterfs.volume-id=0x1dd755248cc94d939b14518021c8df3f trusted.tier.tier-dht=0x000000010000000000000000ffffffff trusted.tier.tier-dht.commithash=0x3330313736383334313800 [root@dhcp37-160 ~]# [root@dhcp37-158 ~]# getfattr -d -e hex -m . /rhs/brick2/ct-10/ getfattr: Removing leading '/' from absolute path names # file: rhs/brick2/ct-10/ security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.ec.dirty=0x0000000000000000000000000000000e trusted.ec.version=0x00000000000000000000000000000020 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.5fc0f095-23c4-4b96-9d94-69decd14f1d4.stime=0x567bc34f00000000 trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.xtime=0x567bc348000b7c51 trusted.glusterfs.dht=0x00000001000000007ffa1668ffffffff trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.size.1=0x000000000000000000000000000000000000000000000003 trusted.glusterfs.volume-id=0x1dd755248cc94d939b14518021c8df3f trusted.tier.tier-dht=0x000000010000000000000000ffffffff trusted.tier.tier-dht.commithash=0x3330313736383334313800 [root@dhcp37-158 ~]# [root@dhcp37-110 ~]# getfattr -d -e hex -m . /rhs/brick2/ct-11/ getfattr: Removing leading '/' from absolute path names # file: rhs/brick2/ct-11/ security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.ec.dirty=0x0000000000000000000000000000000e trusted.ec.version=0x00000000000000000000000000000020 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.5fc0f095-23c4-4b96-9d94-69decd14f1d4.stime=0x567bc34b00000000 trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.xtime=0x567bc348000b7714 trusted.glusterfs.dht=0x00000001000000007ffa1668ffffffff trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.size.1=0x000000000000000000000000000000000000000000000003 trusted.glusterfs.volume-id=0x1dd755248cc94d939b14518021c8df3f trusted.tier.tier-dht=0x000000010000000000000000ffffffff trusted.tier.tier-dht.commithash=0x3330313736383334313800 [root@dhcp37-110 ~]# [root@dhcp37-155 ~]# getfattr -d -e hex -m . /rhs/brick2/ct-12/ getfattr: Removing leading '/' from absolute path names # file: rhs/brick2/ct-12/ security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.ec.dirty=0x0000000000000000000000000000000e trusted.ec.version=0x00000000000000000000000000000020 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.5fc0f095-23c4-4b96-9d94-69decd14f1d4.stime=0x567bc34f00000000 trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.xtime=0x567bc348000b7989 trusted.glusterfs.dht=0x00000001000000007ffa1668ffffffff trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.size.1=0x000000000000000000000000000000000000000000000003 trusted.glusterfs.volume-id=0x1dd755248cc94d939b14518021c8df3f trusted.tier.tier-dht=0x000000010000000000000000ffffffff trusted.tier.tier-dht.commithash=0x3330313736383334313800 [root@dhcp37-155 ~]# [root@dhcp37-165 ~]# gluster volume info tiervolume Volume Name: tiervolume Type: Distributed-Disperse Volume ID: 1dd75524-8cc9-4d93-9b14-518021c8df3f Status: Started Number of Bricks: 2 x (4 + 2) = 12 Transport-type: tcp Bricks: Brick1: 10.70.37.165:/rhs/brick1/ct-1 Brick2: 10.70.37.133:/rhs/brick1/ct-2 Brick3: 10.70.37.160:/rhs/brick1/ct-3 Brick4: 10.70.37.158:/rhs/brick1/ct-4 Brick5: 10.70.37.110:/rhs/brick1/ct-5 Brick6: 10.70.37.155:/rhs/brick1/ct-6 Brick7: 10.70.37.165:/rhs/brick2/ct-7 Brick8: 10.70.37.133:/rhs/brick2/ct-8 Brick9: 10.70.37.160:/rhs/brick2/ct-9 Brick10: 10.70.37.158:/rhs/brick2/ct-10 Brick11: 10.70.37.110:/rhs/brick2/ct-11 Brick12: 10.70.37.155:/rhs/brick2/ct-12 Options Reconfigured: changelog.changelog: on geo-replication.ignore-pid-check: on geo-replication.indexing: on features.quota-deem-statfs: on features.inode-quota: on features.quota: on performance.readdir-ahead: on cluster.enable-shared-storage: enable [root@dhcp37-165 ~]# <Note: It was a tier volume when the original issue has been seen. The output above is after detach> Steps Carried: ============== 1. Create Master volume Tiered {HT: 3x2, CT: 2x(4+2)} 2. Create Slave volume (4x2) 3. Create geo-rep session 4. Start geo-rep session Actual results: =============== All the passive bricks went to faulty Expected results: ================= Geo-Rep should be ACTIVE
REVIEW: http://review.gluster.org/13242 (geo-rep: Mask xtime and stime xattrs) posted (#2) for review on master by Kotresh HR (khiremat)
COMMIT: http://review.gluster.org/13242 committed in master by Venky Shankar (vshankar) ------ commit bf2004bc1346890e69292c5177a5d8e002b696e2 Author: Kotresh HR <khiremat> Date: Thu Jan 14 17:14:25 2016 +0530 geo-rep: Mask xtime and stime xattrs Allow access to xtime and stime xattrs only to gsyncd client and mask them for the rest. This is to prevent afr from performing self healing on marker xtime and geo-rep stime xattr which is not expected as each of which gets updated them from backend brick and should not be healed. Change-Id: I24c30f3cfac636a55fd55be989f8db9f8ca10856 BUG: 1296496 Signed-off-by: Kotresh HR <khiremat> Reviewed-on: http://review.gluster.org/13242 NetBSD-regression: NetBSD Build System <jenkins.org> Reviewed-by: Vijaikumar Mallikarjuna <vmallika> Smoke: Gluster Build System <jenkins.com> CentOS-regression: Gluster Build System <jenkins.com> Reviewed-by: Venky Shankar <vshankar>
REVIEW: http://review.gluster.org/13678 (posix: Filter gsyncd stime xattr) posted (#1) for review on master by Kotresh HR (khiremat)
COMMIT: http://review.gluster.org/13678 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit 8d8743ebf0eea7e87eef4cabb7ebcef4a602c471 Author: Kotresh HR <khiremat> Date: Fri Mar 11 15:07:48 2016 +0530 posix: Filter gsyncd stime xattr Filter gsyncd stime xattr in lookup as well. The value of stime would be different among replica bricks and EC bricks. AFR and EC should not take any action on these as it could be different. Change-Id: If577f6115b36e036af2292ea0eaae93110f006ba BUG: 1296496 Signed-off-by: Kotresh HR <khiremat> Reviewed-on: http://review.gluster.org/13678 Smoke: Gluster Build System <jenkins.com> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report. glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/ [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user