Description of problem: Given three hosts and three bricks on them, combined into replica 3 volume with arbiter, it could happen, that arbiter brick will become a source for data heal, which should not happed How reproducible: time to time Steps to Reproduce: 1. Create a replica 3 volume with arbiter, keeping bricks on three different hosts. 2. Start updating some file frequently 3. Start rebooting nodes in a random order (breaking network connectivity is fine too), several reboots should affect two nodes in a random order Actual results: Some files will not be healed. [root@hc-lion ~]# gluster volume heal data full Launching heal operation to perform full self heal on volume data has been successful Use heal info commands to check status [root@hc-lion ~]# gluster volume heal data info Brick hc-lion:/rhgs/data /555425cf-e3e4-4665-ae82-6152896d8190/dom_md/ids Status: Connected Number of entries: 1 Brick hc-tiger:/rhgs/data /555425cf-e3e4-4665-ae82-6152896d8190/dom_md/ids Status: Connected Number of entries: 1 Brick hc-panther:/rhgs/data /555425cf-e3e4-4665-ae82-6152896d8190/dom_md/ids Status: Connected Number of entries: 1 [root@hc-lion dom_md]# getfattr -d -m . -e hex ids # file: ids security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.data-client-1=0x0000000e0000000000000000 trusted.afr.data-client-2=0x000000000000000000000000 trusted.afr.dirty=0x000000000000000000000000 trusted.bit-rot.version=0x080000000000000058e6028e000829f0 trusted.gfid=0x405ab9b11adb4ced927294ef36272b44 trusted.glusterfs.shard.block-size=0x0000000020000000 trusted.glusterfs.shard.file-size=0x0000000000100000000000000000000000000000000008000000000000000000 Expected results: All files should be healed. Additional info: I do not have a good way to reproduce that bug. But i hope that logs from my nodes will be helpful. Bug was observed during first half of day 6th of April.
Notes to self while Denis uploads the logs: [root@hc-lion ~]# gluster v info data Volume Name: data Type: Replicate Volume ID: 7070474d-14be-4cf3-96fa-3efb72a5458c Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: hc-lion:/rhgs/data Brick2: hc-tiger:/rhgs/data Brick3: hc-panther:/rhgs/data (arbiter) Options Reconfigured: cluster.self-heal-daemon: enable user.cifs: off performance.strict-o-direct: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular performance.low-prio-threads: 32 features.shard-block-size: 512MB network.ping-timeout: 30 server.allow-insecure: on storage.owner-gid: 36 storage.owner-uid: 36 cluster.data-self-heal-algorithm: full features.shard: on cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: off cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off nfs.disable: on performance.readdir-ahead: on transport.address-family: inet Xattr info: 1st Node: [root@hc-lion ~]# g /rhgs/data/555425cf-e3e4-4665-ae82-6152896d8190/dom_md/ids getfattr: Removing leading '/' from absolute path names # file: rhgs/data/555425cf-e3e4-4665-ae82-6152896d8190/dom_md/ids security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.data-client-1=0x0000000e0000000000000000 trusted.afr.data-client-2=0x000000000000000000000000 trusted.afr.dirty=0x000000000000000000000000 trusted.bit-rot.version=0x080000000000000058e6028e000829f0 trusted.gfid=0x405ab9b11adb4ced927294ef36272b44 trusted.glusterfs.shard.block-size=0x0000000020000000 trusted.glusterfs.shard.file-size=0x0000000000100000000000000000000000000000000008000000000000000000 [root@hc-lion ~]# stat /rhgs/data/555425cf-e3e4-4665-ae82-6152896d8190/dom_md/ids File: ‘/rhgs/data/555425cf-e3e4-4665-ae82-6152896d8190/dom_md/ids’ Size: 1048576 Blocks: 2048 IO Block: 4096 regular file Device: fd07h/64775d Inode: 67108931 Links: 2 Access: (0660/-rw-rw----) Uid: ( 36/ vdsm) Gid: ( 36/ kvm) Context: system_u:object_r:unlabeled_t:s0 Access: 2017-04-06 12:30:08.330377133 +0300 Modify: 2017-04-06 12:05:06.547723917 +0300 Change: 2017-04-06 12:05:08.570709032 +0300 2nd Node: root@hc-tiger ~]# g /rhgs/data/555425cf-e3e4-4665-ae82-6152896d8190/dom_md/ids getfattr: Removing leading '/' from absolute path names # file: rhgs/data/555425cf-e3e4-4665-ae82-6152896d8190/dom_md/ids security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.data-client-0=0x000000050000000000000000 trusted.afr.data-client-2=0x000000000000000000000000 trusted.afr.dirty=0x000000010000000000000000 trusted.bit-rot.version=0x060000000000000058e5f9100009aaa1 trusted.gfid=0x405ab9b11adb4ced927294ef36272b44 trusted.glusterfs.shard.block-size=0x0000000020000000 trusted.glusterfs.shard.file-size=0x0000000000100000000000000000000000000000000008000000000000000000 [root@hc-tiger ~]# stat /rhgs/data/555425cf-e3e4-4665-ae82-6152896d8190/dom_md/ids File: ‘/rhgs/data/555425cf-e3e4-4665-ae82-6152896d8190/dom_md/ids’ Size: 1048576 Blocks: 2048 IO Block: 4096 regular file Device: fd09h/64777d Inode: 67108931 Links: 2 Access: (0660/-rw-rw----) Uid: ( 36/ vdsm) Gid: ( 36/ kvm) Context: system_u:object_r:unlabeled_t:s0 Access: 2017-04-06 14:03:20.028466007 +0300 Modify: 2017-04-06 11:59:28.291178965 +0300 Change: 2017-04-06 11:59:28.291178965 +0300 Birth: - 3rd Node: [root@hc-panther ~]# g /rhgs/data/555425cf-e3e4-4665-ae82-6152896d8190/dom_md/ids getfattr: Removing leading '/' from absolute path names # file: rhgs/data/555425cf-e3e4-4665-ae82-6152896d8190/dom_md/ids security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.data-client-0=0x000000000000000000000000 trusted.afr.data-client-1=0x0000000e0000000000000000 trusted.afr.dirty=0x000000000000000000000000 trusted.bit-rot.version=0x040000000000000058e5f8f7000d5127 trusted.gfid=0x405ab9b11adb4ced927294ef36272b44 trusted.glusterfs.shard.block-size=0x0000000020000000 trusted.glusterfs.shard.file-size=0x0000000000100000000000000000000000000000000008000000000000000000 [root@hc-panther ~]# stat /rhgs/data/555425cf-e3e4-4665-ae82-6152896d8190/dom_md/ids File: ‘/rhgs/data/555425cf-e3e4-4665-ae82-6152896d8190/dom_md/ids’ Size: 0 Blocks: 0 IO Block: 4096 regular empty file Device: fd07h/64775d Inode: 67108931 Links: 2 Access: (0660/-rw-rw----) Uid: ( 36/ vdsm) Gid: ( 36/ kvm) Context: system_u:object_r:unlabeled_t:s0 Access: 2017-04-06 14:03:20.006835579 +0300 Modify: 2017-04-06 11:15:00.430926000 +0300 Change: 2017-04-06 12:05:08.572428152 +0300 Birth: - [root@hc-panther ~]# md5sums on the 3 nodes respectively are c6e665a63b15c4c2c6d66beff671834e, f84fd35dd9f09215e7710b7bed347a8a and d41d8cd98f00b204e9800998ecf8427e
This bug is getting closed because the 3.8 version is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.