Bug 1454338

Summary: first_lookup slow in disperse volumes
Product: [Community] GlusterFS Reporter: Raghavendra Talur <rtalur>
Component: disperseAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED NOTABUG QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: mainlineCC: bugs, jahernan, sheggodu
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-17 08:51:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Raghavendra Talur 2017-05-22 13:43:51 UTC
Description of problem:

When a brick is down or when a brick is not healed yet, it is observed that the "Going up" message in client logs come 10 seconds after being connected to most of the bricks. It would be good to have clients come up sooner if minimum number of bricks are up.

How reproducible:
Have seen multiple times, not sure if 100%

Comment 1 Xavi Hernandez 2017-05-24 06:29:54 UTC
This is by design. It only affects the initial mount from a client and it's done to avoid unnecessary heals when bricks are busy or being started at the same time than the mount. Once the initial timeout has expired or all bricks have reported, the volume will work without any delay, even if bricks go down and up later (as long as there are enough healthy bricks).

EC waits for up to 10 seconds until all bricks have reported UP or DOWN state. If all bricks have reported before the 10 seconds timeout, the volume will be brought UP or DOWN depending on how many UP bricks are available. If not all bricks have reported in 10 seconds, the volume will go UP or DOWN depending on the number of UP bricks.