1493656 – Storage hiccup (inaccessible a short while) when a single brick go down

Bug 1493656 - Storage hiccup (inaccessible a short while) when a single brick go down

Summary: Storage hiccup (inaccessible a short while) when a single brick go down

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	rpc
Sub Component:
Version:	3.12
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Assignee:	Milind Changire
QA Contact:	Rahul Hinduja
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-09-20 16:12 UTC by ko_co_ten_1992
Modified:	2023-09-14 04:08 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2019-07-15 05:05:54 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description ko_co_ten_1992 2017-09-20 16:12:13 UTC

Description of problem:

Use 1x3 replica setup, when a single brick go down, client mount directory hang for about 30s to a minute before continue to work.

"Mount" method:
mount --bind abc xyz (linux)

"Hang" explain: 
-When use "cd" to client mount point directory, unable to list (using ls command) any files, sub-directory (directory simply not working).
-When access website that use mount point directory, browser spinning until glusterfs client mount directory came back.

Version-Release number of selected component (if applicable): 3.11.2


How reproducible:


Steps to Reproduce:
1. Set up 1x3 replica
2. Bring a single brick down
3. QUICKLY try to list file in client mount point (you have to be quick, or the service will be back in 30s)

Actual results:

-Client mount point directory inaccessible

Expected results:

-Should smoothly continuous working (more than 50% brick are online)

Additional info:

I'm unable to find this issue with keywords "brick down", "storage hang", "hiccup", if there is please tag me in.

30s outage is too much, is there a config so that, if a brick slow to answer (more than 500ms), reject it immediately (consider it dead), and instantly continue the service ?

Thank you.

Comment 2 ko_co_ten_1992 2017-09-20 17:03:18 UTC

Bring a single brick down, mean: reboot or shutdown the machine having that brick

Comment 3 Nithya Balachandran 2017-09-21 04:56:02 UTC

Moving this to the replicate component as the volume is a 1x3 volume.

Comment 4 Ravishankar N 2017-09-21 05:28:57 UTC

This is not a problem in replication per se. If you reboot a node of a plain distribute volume and list the files from the mount, you should see identical behaviour.

When you power off/reboot a node, the  client does not receive the disconnect event immediately. (See https://bugzilla.redhat.com/show_bug.cgi?id=1054694 for some idea). It won't occur if you kill the brick process before power off or reboot of the node, so if it is a planned reboot, I would recommend to use https://github.com/gluster/glusterfs/blob/master/extras/stop-all-gluster-processes.sh.

Assigning to RPC component to see if this can be fixed in gluster using TCP keepalive or other means.

Comment 5 ko_co_ten_1992 2017-09-21 06:39:05 UTC

@Ravishankar, Thank for the script!!

But what about case like power loss, no time to graceful shutdown gluster, I think to truly achieve HA, gluster should deal with that as well :D

Comment 6 Raghavendra G 2017-09-22 05:07:20 UTC

A patch has been merged upstream [1], to manage keepalive/tcp-user-timeout values. With this patch ungraceful shutdown should be detected much earlier than ping-timeout. 

@Milind what is the maximum time for a client to identify brick disconnect with [1]? What are the defaults?

@ko_co_ten,
What do you think are sane values for these options in [1] without leading to spurious disconnects where client disconnects even though brick is up and reachable?

[1] https://review.gluster.org/#/c/16731

Comment 7 ko_co_ten_1992 2017-09-22 06:10:19 UTC

Base on http://royal.pingdom.com/2007/06/01/theoretical-vs-real-world-speed-limit-of-ping/. 

Theoretically, to ping travel half way around Earth, it would take 133ms.

So in real life, double it and we have 266ms.

But base on my exp, that not always the case, my worse stable ping was around 330ms (I live in Vietnam where cables under ocean constantly broken), adding a bit more, I would say 400ms is paranoid enough, pretty safe for everyone.

If the administrator fond of his setup, he can always bring it down to 50ms (like in DC).

Comment 8 Milind Changire 2017-09-22 07:37:17 UTC

(In reply to Raghavendra G from comment #6)
> A patch has been merged upstream [1], to manage keepalive/tcp-user-timeout
> values. With this patch ungraceful shutdown should be detected much earlier
> than ping-timeout. 
> 
> @Milind what is the maximum time for a client to identify brick disconnect
> with [1]? What are the defaults?

With the patch and appropriate settings for the different options, we can bring down the brick disconnect to below 7 seconds. However, the patch is not yet available downstream.

> 
> @ko_co_ten,
> What do you think are sane values for these options in [1] without leading
> to spurious disconnects where client disconnects even though brick is up and
> reachable?
> 
> [1] https://review.gluster.org/#/c/16731

Comment 9 Ravishankar N 2017-09-25 04:40:12 UTC

ko_co_ten, Since you have filed the bug in the Red Hat Gluster Storage product, Are you using downstream version  of gluster? I think all downstream bugs need to be filed via our customer support team.

If you are using upstream version, please change the product to glusterfs and the version to the appropriate one. Also see if Milind's patch helps solve your issue. The patch seems to be present in both 3.10 and 3.12 upstream branches of glusterfs

Comment 10 ko_co_ten_1992 2017-10-15 15:06:23 UTC

I've recently upgrade all machine to gluster 3.12, with this ppa

https://launchpad.net/~gluster/+archive/ubuntu/glusterfs-3.12

The problem still there, but it weirder than before,

Before:
- shutdown -> immediately access client mount -> spinning

Now:
- shutdown -> immediately access client mount point -> work for about 10-15s -> spinning

The new spinning time nothing like 7s, it the same old 30s

Comment 12 Raghavendra G 2018-05-09 03:19:37 UTC

(In reply to ko_co_ten_1992 from comment #10)
> I've recently upgrade all machine to gluster 3.12, with this ppa
> 
> https://launchpad.net/~gluster/+archive/ubuntu/glusterfs-3.12
> 
> The problem still there, but it weirder than before,
> 
> Before:
> - shutdown -> immediately access client mount -> spinning
> 
> Now:
> - shutdown -> immediately access client mount point -> work for about 10-15s
> -> spinning
> 
> The new spinning time nothing like 7s, it the same old 30s

Does it resume after 30s? Can you attach glusterfs client logs after it resumed?

Comment 13 Atin Mukherjee 2018-11-10 08:01:49 UTC

This is an upstream bug and product version is wrong.

Comment 14 Milind Changire 2019-01-07 03:45:50 UTC

@ko_co_ten
The patch makes tunables available for tuning the Gluster system.
The defaults are equal to what a normal/out-of-the-box system configuration would provide.
For aggressive recovery times, you will need to tweak the tunables for smaller values.

For details about the tunables, please go through the tcp(7) man page.

Comment 15 Yaniv Kaul 2019-04-17 13:44:27 UTC

Did you perform the tuning and did it help?

Comment 16 Amar Tumballi 2019-06-17 10:43:11 UTC

> Does it resume after 30s? Can you attach glusterfs client logs after it resumed?

1 year since last question. Recommend upgrading the glusterfs to 6.x and test the behavior, and report back here. If there are no further updates in next 1 month, inclined to close the issue as WORKSFORME / WONTFIX.

Comment 17 Amar Tumballi 2019-07-15 05:05:54 UTC

Closing as there are no updates on the bug. Please feel free to open it with higher version data, or what is asked in needinfo.

Comment 18 Red Hat Bugzilla 2023-09-14 04:08:16 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.