Hide Forgot
Environment: [192.168.0.1]#gluster volume create san replica 2 transport tcp 192.168.0.1:/var/gluster 192.168.0.2:/var/gluster [192.168.0.1]#gluster volume start san [192.168.0.1]#gluster volume set san network.ping-timeout 10 [192.168.0.3]#mount -t glusterfs 192.168.0.1:/san /mnt/gluster share /mnt/gluster using SAMBA Start copying two files of about 2GB in size from WinXP client. In the middle of the 1st file copy hit: [192.168.0.1]#killall glusterd glusterfsd glusterfs Wait about 10 secs (copy of 1st file is still running) and enter [192.168.0.1]#service glusterd start (copy of 1st file hasn't finished yet) Peer absence hasn't influenced data copy in any way (it has been running smoothly). Now the peer is back but the 1st file is only been written on 192.168.0.2 (this is normal behaviour). When the 1st file copy finishes, before starting the 2nd, self-heal of 1st is someway forced and 2nd file copy is posponed until self-heal is finished. It seems that the file-close action after data copy triggers self-heal process. I think this represents a big limit in GlusterFS performance. Isn't the self-heal process supposed to run in background with minor system resources use?
With the current design it is supposed work this way, but there is a plan to implement what you have suggested, so it is not a bug but a feature enhancement.
The design change brought in as part of 3182 fixes this.