1530320 – Brick Multiplexing: brick still down in heal info context(glfs) even though brick is online

Bug 1530320 - Brick Multiplexing: brick still down in heal info context(glfs) even though brick is online

Summary: Brick Multiplexing: brick still down in heal info context(glfs) even though b...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	rhgs-3.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.3.1 Async
Assignee:	Atin Mukherjee
QA Contact:	Nag Pavan Chilakam
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-01-02 14:53 UTC by Nag Pavan Chilakam
Modified:	2019-01-09 14:57 UTC (History)
CC List:	5 users (show)
Fixed In Version:	glusterfs-3.8.4-52.4
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-01-11 02:47:45 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2018:0083	0	normal	SHIPPED_LIVE	glusterfs bug fix update	2018-01-11 07:46:21 UTC

Description Nag Pavan Chilakam 2018-01-02 14:53:36 UTC

Description of problem:
======================
when verifying bz#1526373 - Brick Multiplexing: Gluster volume start force complains with command "Error : Request timed out" when there are multiple volumes 

heal info shows the brick as transport end point error, ie still not online.
even when the brick is online.
this obviously leads to more entries pending heals as more IOs are pumped in

the chance of hitting this problem is about 50%

Version-Release number of selected component (if applicable):
===========
glusterfs-server-3.8.4-52.3.el7rhgs.x86_64

How reproducible:
=================
2/4

Steps to Reproduce:

1. create a brick mux setup
2. create about 60 ecvols all 1x(4+2)
3. start the volumes
4. pump IOs to the base volume and another volume(i created an extra ecvol for this)
5.now kill a brick say b1
6. use volume force start of any volume(some vol in higher ascending order say vol15 or vol20 ...and not the base volume)
7. now start other volumes ie the mounted vols using vol force 




Actual results:
=========
healinfo still sees brick down
shd fails to start on one of the vols (raised a seperate BZ#1530217 - Brick multiplexing: glustershd fails to start on a volume force start after a brick is down)

logs of  BZ#1530217 can be used

Comment 10 errata-xmlrpc 2018-01-11 02:47:45 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0083

Note You need to log in before you can comment on or make changes to this bug.