Bug 1739123

Summary:	Bricks are going down resulting in mount failure
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	John Mulligan <jmulligan>
Component:	core	Assignee:	Mohit Agrawal <moagrawa>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Rahul Hinduja <rhinduja>
Severity:	high	Docs Contact:
Priority:	medium
Version:	rhgs-3.5	CC:	amukherj, knarra, ksubrahm, moagrawa, pasik, pkarampu, ravishankar, rhs-bugs, rtalur, sheggodu, storage-qa-internal
Target Milestone:	---	Keywords:	Reopened
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-11-25 10:37:17 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1732703

Description John Mulligan 2019-08-08 14:55:03 UTC

Description of problem:

Mount failing even though a single brick (out of 3) is up

[root@dhcp47-30 ~]# ls /var/lib/origin/openshift.local.volumes/pods/1bc94e09-b7d7-11e9-a78c-005056b20f7c/volumes/kubernetes.io~glusterfs/db
ls: cannot access /var/lib/origin/openshift.local.volumes/pods/1bc94e09-b7d7-11e9-a78c-005056b20f7c/volumes/kubernetes.io~glusterfs/db: Transport endpoint is not connected


[root@dhcp46-151 ~]# oc rsh pod/glusterfs-storage-7l2h2 gluster volume status heketidbstorage
Status of volume: heketidbstorage
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.47.134:/var/lib/heketi/mounts/v
g_fc55eb0aa163a697d4e38c8f8f118d79/brick_49
ffc91a89f142ee40d42b2ccf1ac64d/brick        N/A       N/A        N       N/A  
Brick 10.70.47.1:/var/lib/heketi/mounts/vg_
18c96c7188b87e757921ea2688cf4b4c/brick_5c96
e4982dab85eb8944f84856ef0355/brick          49152     0          Y       166  
Brick 10.70.46.245:/var/lib/heketi/mounts/v
g_e71f46d991fc4ef534858d2e62912860/brick_a8
18266a862e91a15f216b8436fa1830/brick        N/A       N/A        N       N/A  
Self-heal Daemon on localhost               N/A       N/A        Y       63959
Self-heal Daemon on dhcp46-245.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       3086 
Self-heal Daemon on 10.70.47.14             N/A       N/A        Y       47398
Self-heal Daemon on 10.70.47.134            N/A       N/A        Y       129961
 
Task Status of Volume heketidbstorage
------------------------------------------------------------------------------
There are no active volume tasks


Tried to connect another client to the volume an it failed. Logs to be attached.




Version-Release number of selected component (if applicable):
sh-4.2# rpm -qa | grep glusterfs
glusterfs-api-6.0-8.el7rhgs.x86_64
glusterfs-fuse-6.0-8.el7rhgs.x86_64
glusterfs-server-6.0-8.el7rhgs.x86_64
glusterfs-libs-6.0-8.el7rhgs.x86_64
glusterfs-6.0-8.el7rhgs.x86_64
glusterfs-client-xlators-6.0-8.el7rhgs.x86_64
glusterfs-cli-6.0-8.el7rhgs.x86_64
glusterfs-geo-replication-6.0-8.el7rhgs.x86_64





Actual results:
Existing connection reports "Transport endpoint not connected".
Subsequent mounts fail.


Expected results:
Connected but read-only volume.

Comment 25 Mohit Agrawal 2019-11-25 10:25:33 UTC

In 3.5 latest we fixed an issue specific to health check thread failure.
We fixed the issue from the bugzilla
https://bugzilla.redhat.com/show_bug.cgi?id=1752713

Can we try to reproduce the same on the latest RHGS-3.5 release?

Thanks,
Mohit Agrawal

Comment 26 Mohit Agrawal 2019-11-25 10:37:17 UTC

I am closing the bug. 
Please reopen if you face the issue on the latest 3.5 release.