Description of problem: Use 1x3 replica setup, when a single brick go down, client mount directory hang for about 30s to a minute before continue to work. "Mount" method: mount --bind abc xyz (linux) "Hang" explain: -When use "cd" to client mount point directory, unable to list (using ls command) any files, sub-directory (directory simply not working). -When access website that use mount point directory, browser spinning until glusterfs client mount directory came back. Version-Release number of selected component (if applicable): 3.11.2 How reproducible: Steps to Reproduce: 1. Set up 1x3 replica 2. Bring a single brick down 3. QUICKLY try to list file in client mount point (you have to be quick, or the service will be back in 30s) Actual results: -Client mount point directory inaccessible Expected results: -Should smoothly continuous working (more than 50% brick are online) Additional info: I'm unable to find this issue with keywords "brick down", "storage hang", "hiccup", if there is please tag me in. 30s outage is too much, is there a config so that, if a brick slow to answer (more than 500ms), reject it immediately (consider it dead), and instantly continue the service ? Thank you.
Bring a single brick down, mean: reboot or shutdown the machine having that brick
Moving this to the replicate component as the volume is a 1x3 volume.
This is not a problem in replication per se. If you reboot a node of a plain distribute volume and list the files from the mount, you should see identical behaviour. When you power off/reboot a node, the client does not receive the disconnect event immediately. (See https://bugzilla.redhat.com/show_bug.cgi?id=1054694 for some idea). It won't occur if you kill the brick process before power off or reboot of the node, so if it is a planned reboot, I would recommend to use https://github.com/gluster/glusterfs/blob/master/extras/stop-all-gluster-processes.sh. Assigning to RPC component to see if this can be fixed in gluster using TCP keepalive or other means.
@Ravishankar, Thank for the script!! But what about case like power loss, no time to graceful shutdown gluster, I think to truly achieve HA, gluster should deal with that as well :D
A patch has been merged upstream [1], to manage keepalive/tcp-user-timeout values. With this patch ungraceful shutdown should be detected much earlier than ping-timeout. @Milind what is the maximum time for a client to identify brick disconnect with [1]? What are the defaults? @ko_co_ten, What do you think are sane values for these options in [1] without leading to spurious disconnects where client disconnects even though brick is up and reachable? [1] https://review.gluster.org/#/c/16731
Base on http://royal.pingdom.com/2007/06/01/theoretical-vs-real-world-speed-limit-of-ping/. Theoretically, to ping travel half way around Earth, it would take 133ms. So in real life, double it and we have 266ms. But base on my exp, that not always the case, my worse stable ping was around 330ms (I live in Vietnam where cables under ocean constantly broken), adding a bit more, I would say 400ms is paranoid enough, pretty safe for everyone. If the administrator fond of his setup, he can always bring it down to 50ms (like in DC).
(In reply to Raghavendra G from comment #6) > A patch has been merged upstream [1], to manage keepalive/tcp-user-timeout > values. With this patch ungraceful shutdown should be detected much earlier > than ping-timeout. > > @Milind what is the maximum time for a client to identify brick disconnect > with [1]? What are the defaults? With the patch and appropriate settings for the different options, we can bring down the brick disconnect to below 7 seconds. However, the patch is not yet available downstream. > > @ko_co_ten, > What do you think are sane values for these options in [1] without leading > to spurious disconnects where client disconnects even though brick is up and > reachable? > > [1] https://review.gluster.org/#/c/16731
ko_co_ten, Since you have filed the bug in the Red Hat Gluster Storage product, Are you using downstream version of gluster? I think all downstream bugs need to be filed via our customer support team. If you are using upstream version, please change the product to glusterfs and the version to the appropriate one. Also see if Milind's patch helps solve your issue. The patch seems to be present in both 3.10 and 3.12 upstream branches of glusterfs
I've recently upgrade all machine to gluster 3.12, with this ppa https://launchpad.net/~gluster/+archive/ubuntu/glusterfs-3.12 The problem still there, but it weirder than before, Before: - shutdown -> immediately access client mount -> spinning Now: - shutdown -> immediately access client mount point -> work for about 10-15s -> spinning The new spinning time nothing like 7s, it the same old 30s
(In reply to ko_co_ten_1992 from comment #10) > I've recently upgrade all machine to gluster 3.12, with this ppa > > https://launchpad.net/~gluster/+archive/ubuntu/glusterfs-3.12 > > The problem still there, but it weirder than before, > > Before: > - shutdown -> immediately access client mount -> spinning > > Now: > - shutdown -> immediately access client mount point -> work for about 10-15s > -> spinning > > The new spinning time nothing like 7s, it the same old 30s Does it resume after 30s? Can you attach glusterfs client logs after it resumed?
This is an upstream bug and product version is wrong.
@ko_co_ten The patch makes tunables available for tuning the Gluster system. The defaults are equal to what a normal/out-of-the-box system configuration would provide. For aggressive recovery times, you will need to tweak the tunables for smaller values. For details about the tunables, please go through the tcp(7) man page.
Did you perform the tuning and did it help?
> Does it resume after 30s? Can you attach glusterfs client logs after it resumed? 1 year since last question. Recommend upgrading the glusterfs to 6.x and test the behavior, and report back here. If there are no further updates in next 1 month, inclined to close the issue as WORKSFORME / WONTFIX.
Closing as there are no updates on the bug. Please feel free to open it with higher version data, or what is asked in needinfo.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days