Description of problem:
Mount failing even though a single brick (out of 3) is up
[root@dhcp47-30 ~]# ls /var/lib/origin/openshift.local.volumes/pods/1bc94e09-b7d7-11e9-a78c-005056b20f7c/volumes/kubernetes.io~glusterfs/db
ls: cannot access /var/lib/origin/openshift.local.volumes/pods/1bc94e09-b7d7-11e9-a78c-005056b20f7c/volumes/kubernetes.io~glusterfs/db: Transport endpoint is not connected
[root@dhcp46-151 ~]# oc rsh pod/glusterfs-storage-7l2h2 gluster volume status heketidbstorage
Status of volume: heketidbstorage
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.70.47.134:/var/lib/heketi/mounts/v
g_fc55eb0aa163a697d4e38c8f8f118d79/brick_49
ffc91a89f142ee40d42b2ccf1ac64d/brick N/A N/A N N/A
Brick 10.70.47.1:/var/lib/heketi/mounts/vg_
18c96c7188b87e757921ea2688cf4b4c/brick_5c96
e4982dab85eb8944f84856ef0355/brick 49152 0 Y 166
Brick 10.70.46.245:/var/lib/heketi/mounts/v
g_e71f46d991fc4ef534858d2e62912860/brick_a8
18266a862e91a15f216b8436fa1830/brick N/A N/A N N/A
Self-heal Daemon on localhost N/A N/A Y 63959
Self-heal Daemon on dhcp46-245.lab.eng.blr.
redhat.com N/A N/A Y 3086
Self-heal Daemon on 10.70.47.14 N/A N/A Y 47398
Self-heal Daemon on 10.70.47.134 N/A N/A Y 129961
Task Status of Volume heketidbstorage
------------------------------------------------------------------------------
There are no active volume tasks
Tried to connect another client to the volume an it failed. Logs to be attached.
Version-Release number of selected component (if applicable):
sh-4.2# rpm -qa | grep glusterfs
glusterfs-api-6.0-8.el7rhgs.x86_64
glusterfs-fuse-6.0-8.el7rhgs.x86_64
glusterfs-server-6.0-8.el7rhgs.x86_64
glusterfs-libs-6.0-8.el7rhgs.x86_64
glusterfs-6.0-8.el7rhgs.x86_64
glusterfs-client-xlators-6.0-8.el7rhgs.x86_64
glusterfs-cli-6.0-8.el7rhgs.x86_64
glusterfs-geo-replication-6.0-8.el7rhgs.x86_64
Actual results:
Existing connection reports "Transport endpoint not connected".
Subsequent mounts fail.
Expected results:
Connected but read-only volume.
In 3.5 latest we fixed an issue specific to health check thread failure.
We fixed the issue from the bugzilla
https://bugzilla.redhat.com/show_bug.cgi?id=1752713
Can we try to reproduce the same on the latest RHGS-3.5 release?
Thanks,
Mohit Agrawal