Bug 764301 (GLUSTER-2569) - Peer shortly unreacheable in the middle of data copy forces heal at the end of copy
Summary: Peer shortly unreacheable in the middle of data copy forces heal at the end o...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-2569
Product: GlusterFS
Classification: Community
Component: replicate
Version: 3.1.3
Hardware: i386
OS: Linux
medium
low
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-03-21 10:16 UTC by raf
Modified: 2011-08-23 06:37 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: fuse
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description raf 2011-03-21 10:16:56 UTC
Environment:
[192.168.0.1]#gluster volume create san replica 2 transport tcp 192.168.0.1:/var/gluster 192.168.0.2:/var/gluster
[192.168.0.1]#gluster volume start san
[192.168.0.1]#gluster volume set san network.ping-timeout 10
[192.168.0.3]#mount -t glusterfs 192.168.0.1:/san /mnt/gluster
share /mnt/gluster using SAMBA 

Start copying two files of about 2GB in size from WinXP client.

In the middle of the 1st file copy hit:
[192.168.0.1]#killall glusterd glusterfsd glusterfs

Wait about 10 secs (copy of 1st file is still running) and enter
[192.168.0.1]#service glusterd start (copy of 1st file hasn't finished yet)

Peer absence hasn't influenced data copy in any way (it has been running smoothly).
Now the peer is back but the 1st file is only been written on 192.168.0.2 (this is normal behaviour).
When the 1st file copy finishes, before starting the 2nd, self-heal of 1st is someway forced and 2nd file copy is posponed until self-heal is finished.
It seems that the file-close action after data copy triggers self-heal process.

I think this represents a big limit in GlusterFS performance. 
Isn't the self-heal process supposed to run in background with minor system resources use?

Comment 1 Pranith Kumar K 2011-03-22 03:47:49 UTC
With the current design it is supposed work this way, but there is a plan to implement what you have suggested, so it is not a bug but a feature enhancement.

Comment 2 Pranith Kumar K 2011-08-23 03:37:48 UTC
The design change brought in as part of 3182 fixes this.


Note You need to log in before you can comment on or make changes to this bug.