Description of problem: files that need healing increment while gluster is healing.(Number of entries goes up when a heal is already started) Version-Release number of selected component (if applicable): glusterfs.x86_64 6.4-1.el7 installed glusterfs-api.x86_64 6.4-1.el7 installed glusterfs-cli.x86_64 6.4-1.el7 installed glusterfs-client-xlators.x86_64 6.4-1.el7 installed glusterfs-events.x86_64 6.4-1.el7 installed glusterfs-fuse.x86_64 6.4-1.el7 installed glusterfs-geo-replication.x86_64 6.4-1.el7 installed glusterfs-libs.x86_64 6.4-1.el7 installed glusterfs-rdma.x86_64 6.4-1.el7 installed glusterfs-server.x86_64 6.4-1.el7 installed libvirt-daemon-driver-storage-gluster.x86_64 4.5.0-10.el7_6.12 installed python2-gluster.x86_64 6.4-1.el7 installed vdsm-gluster.x86_64 4.30.24-1.el7 installed How reproducible: In our gluster cluster ( replica 3 - 1 arbiter) it is sufficient to reboot a node of the cluster. When the node is back online and the heal is started, more files are added to the 'files that need healing' list. $(gluster volume heal ${volumeName} info| grep entries) Steps to Reproduce: 1. Reboot node in gluster cluster 2. check with 'gluster peer status' on all nodes, if all nodes are connected (if not, stop firewalld, wait until every node is connected, start firewalld. 3. wait 10 minutes or trigger heal manually: 'gluster volume heal ${volumenName}' Actual results: the list of files that need healing grow. 'gluster volume heal ${volumeName} info| grep entries' Expected results: The list of files should decrease continuously, because the gluster fuse should write to all members of the gluster cluster. Additional info: Fix: To fix this situation we execute the following steps on our ovirt cluster: The gluster volume should be remounted, depending on how the storage domain is used in ovirt this is done differently. * data volume: volume where all the vms are running on => 1 by 1 put every host/hypervisor in maintenance mode and activate the host again. (This will unmount and remount the data volume on that host) * engine volume: volume where hosted engine is running - on every host not running the engine, find the systemd scope the engine mount is running with, and restart it - migrate the engine to another host and execute the steps on the hypervisor the engine migrated from - commands to use for finding the correct scope and restarting it: ``` [root@compute0103 ~]# volname='engine' [root@compute0103 ~]# systemctl list-units|grep rhev| grep scope| grep ${volname} run-17819.scope loaded active running /usr/bin/mount -t glusterfs -o backup-volfile-servers=compute0103.priv.domain.com:compute0104.priv.domain.com compute0102.priv.domain.com:/engine /rhev/data-center/mnt/glusterSD/compute0102.priv.domain.com:_engine [root@compute0103 ~]# systemctl restart run-17819.scope ``` ``` [root@compute0103 ~]# gluster volume info data Volume Name: data Type: Replicate Volume ID: 404ec6b1-731c-4e65-a07f-4ca646054eb4 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: compute0102.priv.vangenechten.com:/gluster_bricks/data/data Brick2: compute0103.priv.vangenechten.com:/gluster_bricks/data/data Brick3: compute0104.priv.vangenechten.com:/gluster_bricks/data/data (arbiter) Options Reconfigured: server.event-threads: 4 client.event-threads: 4 features.read-only: off features.barrier: disable performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: enable cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.choose-local: off storage.owner-uid: 36 storage.owner-gid: 36 network.ping-timeout: 30 performance.strict-o-direct: on transport.address-family: inet performance.client-io-threads: on nfs.disable: on disperse.shd-wait-qlength: 1024 storage.build-pgfid: on ```
What I forgot to mention and the reason I opened this as a bug report for gluster fuse: When this happens we tested on a gluster mount to see where the files were written to, and on only 2 of the 3 members the data (metadata for arbiter) was written. The file was not found on 1 member of the gluster cluster. When we remounted our gluster mounts(like described in the opening post), the healing finished and no other files were added to the 'files to heal list'. Yesterday a gluster member node was rebooted and today I repeated the test to see where the data is written to. Now all data is written to all members but the heal is still in progress. How long does it take for a gluster mount to notice a reappearing member and to write to a node that was down?
The cluster has been reinstalled with the latest version(ovirt + hci). Now the heal is fast and reliable. I won't be able to provide more info.
Apparently the issue reappeared when I rebooted a second node (the arbiter). So on a freshly installed system the issue is also present.
I tracked the issue to the firewall. When the firewall is disabled, the gluster volume heals fast. But when the firewall is enabled it never ends healing. Disabling the firewall causes other issues in ovirt, so this is not advisable. (all vms went offline and moved to 1 hypervisor and migration couldn't be done) I'm closing this bug report, because it seems to be an ovirt related issue.