Bug 1745026

Summary:	endless heal gluster volume; incrementing number of files to heal when all peers in volume are up
Product:	[Community] GlusterFS	Reporter:	tvanberlo <tvanberlo>
Component:	fuse	Assignee:	bugs <bugs>
Status:	CLOSED NOTABUG	QA Contact:
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	4.1	CC:	bugs
Target Milestone:	---	Keywords:	Reopened
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-09-06 14:05:59 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description tvanberlo@vangenechten.com 2019-08-23 13:52:36 UTC

Description of problem:
files that need healing increment while gluster is healing.(Number of entries goes up when a heal is already started)

Version-Release number of selected component (if applicable):
glusterfs.x86_64                              6.4-1.el7                installed
glusterfs-api.x86_64                          6.4-1.el7                installed
glusterfs-cli.x86_64                          6.4-1.el7                installed
glusterfs-client-xlators.x86_64               6.4-1.el7                installed
glusterfs-events.x86_64                       6.4-1.el7                installed
glusterfs-fuse.x86_64                         6.4-1.el7                installed
glusterfs-geo-replication.x86_64              6.4-1.el7                installed
glusterfs-libs.x86_64                         6.4-1.el7                installed
glusterfs-rdma.x86_64                         6.4-1.el7                installed
glusterfs-server.x86_64                       6.4-1.el7                installed
libvirt-daemon-driver-storage-gluster.x86_64  4.5.0-10.el7_6.12        installed
python2-gluster.x86_64                        6.4-1.el7                installed
vdsm-gluster.x86_64                           4.30.24-1.el7            installed

How reproducible:
In our gluster cluster ( replica 3 - 1 arbiter) it is sufficient to reboot a node of the cluster.
When the node is back online and the heal is started, more files are added to the 'files that need healing' list. $(gluster volume heal ${volumeName} info| grep entries)


Steps to Reproduce:
1. Reboot node in gluster cluster
2. check with 'gluster peer status' on all nodes, if all nodes are connected (if not, stop firewalld, wait until every node is connected, start firewalld.
3. wait 10 minutes or trigger heal manually: 'gluster volume heal ${volumenName}'

Actual results:
the list of files that need healing grow. 'gluster volume heal ${volumeName} info| grep entries'

Expected results:
The list of files should decrease continuously, because the gluster fuse should write to all members of the gluster cluster.

Additional info:
Fix:
To fix this situation we execute the following steps on our ovirt cluster:
The gluster volume should be remounted, depending on how the storage domain is used in ovirt this is done differently.
        * data volume: volume where all the vms are running on => 1 by 1 put every host/hypervisor in maintenance mode and activate the host again. (This will unmount and remount the data volume on that host)
        * engine volume: volume where hosted engine is running
            - on every host not running the engine, find the systemd scope the engine mount is running with, and restart it
            - migrate the engine to another host and execute the steps on the hypervisor the engine migrated from
            - commands to use for finding the correct scope and restarting it:
```
[root@compute0103 ~]# volname='engine'
[root@compute0103 ~]# systemctl list-units|grep rhev| grep scope| grep ${volname}
 run-17819.scope                                                                                     loaded active running   /usr/bin/mount -t glusterfs -o backup-volfile-servers=compute0103.priv.domain.com:compute0104.priv.domain.com compute0102.priv.domain.com:/engine /rhev/data-center/mnt/glusterSD/compute0102.priv.domain.com:_engine
[root@compute0103 ~]# systemctl restart run-17819.scope
```

```
[root@compute0103 ~]# gluster volume info data 
 
Volume Name: data
Type: Replicate
Volume ID: 404ec6b1-731c-4e65-a07f-4ca646054eb4
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: compute0102.priv.vangenechten.com:/gluster_bricks/data/data
Brick2: compute0103.priv.vangenechten.com:/gluster_bricks/data/data
Brick3: compute0104.priv.vangenechten.com:/gluster_bricks/data/data (arbiter)
Options Reconfigured:
server.event-threads: 4
client.event-threads: 4
features.read-only: off
features.barrier: disable
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.low-prio-threads: 32
network.remote-dio: enable
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off
cluster.choose-local: off
storage.owner-uid: 36
storage.owner-gid: 36
network.ping-timeout: 30
performance.strict-o-direct: on
transport.address-family: inet
performance.client-io-threads: on
nfs.disable: on
disperse.shd-wait-qlength: 1024
storage.build-pgfid: on
```

Comment 1 tvanberlo@vangenechten.com 2019-08-29 07:14:08 UTC

What I forgot to mention and the reason I opened this as a bug report for gluster fuse:
When this happens we tested on a gluster mount to see where the files were written to, and on only 2 of the 3 members the data (metadata for arbiter) was written.
The file was not found on 1 member of the gluster cluster.
When we remounted our gluster mounts(like described in the opening post), the healing finished and no other files were added to the 'files to heal list'.

Yesterday a gluster member node was rebooted and today I repeated the test to see where the data is written to. Now all data is written to all members but the heal is still in progress.

How long does it take for a gluster mount to notice a reappearing member and to write to a node that was down?

Comment 2 tvanberlo@vangenechten.com 2019-09-05 12:55:35 UTC

The cluster has been reinstalled with the latest version(ovirt + hci). Now the heal is fast and reliable. I won't be able to provide more info.

Comment 3 tvanberlo@vangenechten.com 2019-09-05 14:56:34 UTC

Apparently the issue reappeared when I rebooted a second node (the arbiter).
So on a freshly installed system the issue is also present.

Comment 4 tvanberlo@vangenechten.com 2019-09-06 14:05:59 UTC

I tracked the issue to the firewall. When the firewall is disabled, the gluster volume heals fast. But when the firewall is enabled it never ends healing.
Disabling the firewall causes other issues in ovirt, so this is not advisable. (all vms went offline and moved to 1 hypervisor and migration couldn't be done)

I'm closing this bug report, because it seems to be an ovirt related issue.