Bug 1739123

Summary: Bricks are going down resulting in mount failure
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: John Mulligan <jmulligan>
Component: coreAssignee: Mohit Agrawal <moagrawa>
Status: CLOSED CURRENTRELEASE QA Contact: Rahul Hinduja <rhinduja>
Severity: high Docs Contact:
Priority: medium    
Version: rhgs-3.5CC: amukherj, knarra, ksubrahm, moagrawa, pasik, pkarampu, ravishankar, rhs-bugs, rtalur, sheggodu, storage-qa-internal
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-25 10:37:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1732703    

Description John Mulligan 2019-08-08 14:55:03 UTC
Description of problem:

Mount failing even though a single brick (out of 3) is up

[root@dhcp47-30 ~]# ls /var/lib/origin/openshift.local.volumes/pods/1bc94e09-b7d7-11e9-a78c-005056b20f7c/volumes/kubernetes.io~glusterfs/db
ls: cannot access /var/lib/origin/openshift.local.volumes/pods/1bc94e09-b7d7-11e9-a78c-005056b20f7c/volumes/kubernetes.io~glusterfs/db: Transport endpoint is not connected


[root@dhcp46-151 ~]# oc rsh pod/glusterfs-storage-7l2h2 gluster volume status heketidbstorage
Status of volume: heketidbstorage
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.47.134:/var/lib/heketi/mounts/v
g_fc55eb0aa163a697d4e38c8f8f118d79/brick_49
ffc91a89f142ee40d42b2ccf1ac64d/brick        N/A       N/A        N       N/A  
Brick 10.70.47.1:/var/lib/heketi/mounts/vg_
18c96c7188b87e757921ea2688cf4b4c/brick_5c96
e4982dab85eb8944f84856ef0355/brick          49152     0          Y       166  
Brick 10.70.46.245:/var/lib/heketi/mounts/v
g_e71f46d991fc4ef534858d2e62912860/brick_a8
18266a862e91a15f216b8436fa1830/brick        N/A       N/A        N       N/A  
Self-heal Daemon on localhost               N/A       N/A        Y       63959
Self-heal Daemon on dhcp46-245.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       3086 
Self-heal Daemon on 10.70.47.14             N/A       N/A        Y       47398
Self-heal Daemon on 10.70.47.134            N/A       N/A        Y       129961
 
Task Status of Volume heketidbstorage
------------------------------------------------------------------------------
There are no active volume tasks


Tried to connect another client to the volume an it failed. Logs to be attached.




Version-Release number of selected component (if applicable):
sh-4.2# rpm -qa | grep glusterfs
glusterfs-api-6.0-8.el7rhgs.x86_64
glusterfs-fuse-6.0-8.el7rhgs.x86_64
glusterfs-server-6.0-8.el7rhgs.x86_64
glusterfs-libs-6.0-8.el7rhgs.x86_64
glusterfs-6.0-8.el7rhgs.x86_64
glusterfs-client-xlators-6.0-8.el7rhgs.x86_64
glusterfs-cli-6.0-8.el7rhgs.x86_64
glusterfs-geo-replication-6.0-8.el7rhgs.x86_64





Actual results:
Existing connection reports "Transport endpoint not connected".
Subsequent mounts fail.


Expected results:
Connected but read-only volume.

Comment 25 Mohit Agrawal 2019-11-25 10:25:33 UTC
In 3.5 latest we fixed an issue specific to health check thread failure.
We fixed the issue from the bugzilla
https://bugzilla.redhat.com/show_bug.cgi?id=1752713

Can we try to reproduce the same on the latest RHGS-3.5 release?

Thanks,
Mohit Agrawal

Comment 26 Mohit Agrawal 2019-11-25 10:37:17 UTC
I am closing the bug. 
Please reopen if you face the issue on the latest 3.5 release.