Bug 1739123 - Bricks are going down resulting in mount failure
Summary: Bricks are going down resulting in mount failure
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: core
Version: rhgs-3.5
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: ---
Assignee: Mohit Agrawal
QA Contact: Rahul Hinduja
URL:
Whiteboard:
Depends On:
Blocks: 1732703
TreeView+ depends on / blocked
 
Reported: 2019-08-08 14:55 UTC by John Mulligan
Modified: 2019-11-25 10:37 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-11-25 10:37:17 UTC
Embargoed:


Attachments (Terms of Use)

Description John Mulligan 2019-08-08 14:55:03 UTC
Description of problem:

Mount failing even though a single brick (out of 3) is up

[root@dhcp47-30 ~]# ls /var/lib/origin/openshift.local.volumes/pods/1bc94e09-b7d7-11e9-a78c-005056b20f7c/volumes/kubernetes.io~glusterfs/db
ls: cannot access /var/lib/origin/openshift.local.volumes/pods/1bc94e09-b7d7-11e9-a78c-005056b20f7c/volumes/kubernetes.io~glusterfs/db: Transport endpoint is not connected


[root@dhcp46-151 ~]# oc rsh pod/glusterfs-storage-7l2h2 gluster volume status heketidbstorage
Status of volume: heketidbstorage
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.47.134:/var/lib/heketi/mounts/v
g_fc55eb0aa163a697d4e38c8f8f118d79/brick_49
ffc91a89f142ee40d42b2ccf1ac64d/brick        N/A       N/A        N       N/A  
Brick 10.70.47.1:/var/lib/heketi/mounts/vg_
18c96c7188b87e757921ea2688cf4b4c/brick_5c96
e4982dab85eb8944f84856ef0355/brick          49152     0          Y       166  
Brick 10.70.46.245:/var/lib/heketi/mounts/v
g_e71f46d991fc4ef534858d2e62912860/brick_a8
18266a862e91a15f216b8436fa1830/brick        N/A       N/A        N       N/A  
Self-heal Daemon on localhost               N/A       N/A        Y       63959
Self-heal Daemon on dhcp46-245.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       3086 
Self-heal Daemon on 10.70.47.14             N/A       N/A        Y       47398
Self-heal Daemon on 10.70.47.134            N/A       N/A        Y       129961
 
Task Status of Volume heketidbstorage
------------------------------------------------------------------------------
There are no active volume tasks


Tried to connect another client to the volume an it failed. Logs to be attached.




Version-Release number of selected component (if applicable):
sh-4.2# rpm -qa | grep glusterfs
glusterfs-api-6.0-8.el7rhgs.x86_64
glusterfs-fuse-6.0-8.el7rhgs.x86_64
glusterfs-server-6.0-8.el7rhgs.x86_64
glusterfs-libs-6.0-8.el7rhgs.x86_64
glusterfs-6.0-8.el7rhgs.x86_64
glusterfs-client-xlators-6.0-8.el7rhgs.x86_64
glusterfs-cli-6.0-8.el7rhgs.x86_64
glusterfs-geo-replication-6.0-8.el7rhgs.x86_64





Actual results:
Existing connection reports "Transport endpoint not connected".
Subsequent mounts fail.


Expected results:
Connected but read-only volume.

Comment 25 Mohit Agrawal 2019-11-25 10:25:33 UTC
In 3.5 latest we fixed an issue specific to health check thread failure.
We fixed the issue from the bugzilla
https://bugzilla.redhat.com/show_bug.cgi?id=1752713

Can we try to reproduce the same on the latest RHGS-3.5 release?

Thanks,
Mohit Agrawal

Comment 26 Mohit Agrawal 2019-11-25 10:37:17 UTC
I am closing the bug. 
Please reopen if you face the issue on the latest 3.5 release.


Note You need to log in before you can comment on or make changes to this bug.