Description of problem: Customer is seeing a gluster mount point becoming either unmounted or unresponsive. Initially it was becoming unmounted and, when they saw a glusterfsd crash, they made a couple of changes recommended by us in previous cases with similar issues: 1) gluster volume set all cluster.op-version 31303 2) gluster volume set DC03SP2 performance.parallel-readdir off The next issue occurred when running df -h hangs. They are able to individually run df -h for all other filesystems except the gluster mount. One of the nodes became inaccessible and required a required a reboot - putty and irmc console just gave blank screen after login They configured kdump for the gluster nodes and generated nmi kdump from iRMC to provdie information for this case. Note that these gluster nodes mount the volume locally and are used as the target for the customer's application. Version-Release number of selected component (if applicable): OS: RHEL 7.6 Kernal: kernel-3.10.0-957.1.3.el7.x86_64 Gluster: glusterfs-3.12.2-25.el7rhgs.x86_64 glusterfs-api-3.12.2-25.el7rhgs.x86_64 glusterfs-client-xlators-3.12.2-25.el7rhgs.x86_64 glusterfs-cli-3.12.2-25.el7rhgs.x86_64 glusterfs-events-3.12.2-25.el7rhgs.x86_64 glusterfs-fuse-3.12.2-25.el7rhgs.x86_64 glusterfs-geo-replication-3.12.2-25.el7rhgs.x86_64 glusterfs-libs-3.12.2-25.el7rhgs.x86_64 glusterfs-rdma-3.12.2-25.el7rhgs.x86_64 glusterfs-server-3.12.2-25.el7rhgs.x86_64 gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64 gluster-nagios-common-0.2.4-1.el7rhgs.noarch How reproducible: Happens roughly once every 24 hours on one or another node but can't be reproduced on demand. Steps to Reproduce: N/A Actual results: Mount point becomes unresponsive Expected results: Mount point should remain responsive Additional info: Supporting info extracted on collab shell. First two nmi cores are on optimus and the last one is being processed now.
Sameer, > > [2019-06-12 17:47:10.135284] E [socket.c:2369:socket_connect_finish] 0-DC03SP2-client-68: connection to 192.168.88.209:24007 failed (No route to host); disconnecting socket > > [2019-06-12 17:47:10.135298] E [socket.c:2369:socket_connect_finish] 0-DC03SP2-client-71: connection to 192.168.88.209:24007 failed (No route to host); disconnecting socket 'No route to host' error generally indicate n/w issues. Considering this hang is happening from one of the clients, please check if all the gluster node's DNS is being properly resolved. If there is an issue in DNS resolution, then surely gluster process would be in hung state.
*** Bug 1751666 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:3249