| Summary: | Peer's death in a 3 replica cluster stops data transfer for up to 45 sec. | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | raf <milanraf> |
| Component: | replicate | Assignee: | Pranith Kumar K <pkarampu> |
| Status: | CLOSED WONTFIX | QA Contact: | |
| Severity: | low | Docs Contact: | |
| Priority: | medium | ||
| Version: | 3.1.2 | CC: | gluster-bugs, joe |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | i386 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | --- | |
| Regression: | --- | Mount Type: | fuse |
| Documentation: | DP | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
raf
2011-03-10 22:03:09 UTC
(In reply to comment #0) > [192.168.0.1]#gluster volume create test replica 3 transport tcp > 192.168.0.1:/var/gluster 192.168.0.2:/var/gluster 192.168.0.3:/var/gluster > > [192.168.0.1]#gluster volume start test > > [192.168.0.1]#mount -t glusterfs localhost:/test /mnt/gluster > > share /mnt/gluster using SAMBA and start copying a bunch of data from a Window$ > client > > during data copy let's kill (unplug form surge) 192.168.0.2 > data transfer stops for up to 45 secs. and then goes again without errors > > Raf The network ping timeout for glusterfs is around 45 seconds. Could you check if the same happens after setting the ping-timeout to something lesser than the samba client. example: gluster volume set test network.ping-timeout 10 Well, I entered gluster volume set test network.ping-timeout 5 and no more hang-up is noticeable. Thank you Raf Is 42 seconds really reasonable? I know it's the answer to life, the universe, and everything, but I'm not sure it's the best answer to ping timeouts. This is a common issue on the IRC channel and I'm thinking that unless you're trying to replicate over a WAN, 2 - 10 seconds seems a much more reasonable timeout. (In reply to comment #3) > Is 42 seconds really reasonable? I know it's the answer to life, the universe, > and everything, but I'm not sure it's the best answer to ping timeouts. This is > a common issue on the IRC channel and I'm thinking that unless you're trying to > replicate over a WAN, 2 - 10 seconds seems a much more reasonable timeout. hi Joe, In production the servers are expected to come back online within ~30 seconds, that is the reason why the ping time-out is set as > ~30 seconds, because network reconnection is a costly operation. It involves resource cleanup on the server side and client will have to redo locking etc after coming back. This is exposed as an option that can be changed by the users based on their needs, we dont want to change the default. We will document this in the wiki. Will be closing the bug. Thanks Pranith. |