Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1541929

Summary:	A down brick is incorrectly considered to be online and makes the volume to be started without any brick available
Product:	[Community] GlusterFS	Reporter:	Xavi Hernandez <jahernan>
Component:	replicate	Assignee:	Xavi Hernandez <jahernan>
Status:	CLOSED EOL	QA Contact:
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	3.13	CC:	bugs
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1541038	Environment:
Last Closed:	2018-06-20 18:29:24 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1541038
Bug Blocks:

Description Xavi Hernandez 2018-02-05 08:54:58 UTC

+++ This bug was initially created as a clone of Bug #1541038 +++

Description of problem:

In a replica 2 volume, if one of the bricks is down and it reports its state before the online one, AFR tries to find another online brick in find_best_down_child(). Since priv->child_up array has been initialized with -1 and this function only checks if it's 0, it considers that the other brick is alive and sends a CHILD_UP notification.

At this point the other xlators start sending requests, which fail with ENOTCONN when they reach afr. This can cause several unexpected errors.

Version-Release number of selected component (if applicable): mainline


How reproducible:

It happens randomly, depending on the order in which bricks are started.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:

Comment 1 Worker Ant 2018-02-05 09:15:32 UTC

REVIEW: https://review.gluster.org/19498 (cluster/afr: remove unnecessary child_up initialization) posted (#1) for review on release-3.13 by Xavier Hernandez

Comment 2 Worker Ant 2018-02-06 14:28:08 UTC

COMMIT: https://review.gluster.org/19498 committed in release-3.13 by "Xavier Hernandez" <jahernan> with a commit message- cluster/afr: remove unnecessary child_up initialization

The child_up array was initialized with all elements being -1 to
allow afr_notify() to differentiate down bricks from bricks that
haven't reported yet. With current implementation this is not needed
anymore and it was causing unexpected results when other parts of
the code considered that if child_up[i] != 0, it meant that it was up.

Backport of:
> BUG: 1541038

Change-Id: I2a9d712ee64c512f24bd5cd3a48dcb37e3139472
BUG: 1541929
Signed-off-by: Xavier Hernandez <jahernan>

Comment 3 Shyamsundar 2018-06-20 18:29:24 UTC

This bug reported is against a version of Gluster that is no longer maintained
(or has been EOL'd). See https://www.gluster.org/release-schedule/ for the
versions currently maintained.

As a result this bug is being closed.

If the bug persists on a maintained version of gluster or against the mainline
gluster repository, request that it be reopened and the Version field be marked
appropriately.