Bug 1722209 - [GSS] Issues accessing gluster mount / df -h hanging
Summary: [GSS] Issues accessing gluster mount / df -h hanging
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterfs
Version: rhgs-3.4
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: RHGS 3.5.0
Assignee: Xavi Hernandez
QA Contact: Bala Konda Reddy M
URL:
Whiteboard:
: 1751666 (view as bug list)
Depends On: 1727068
Blocks: 1696809
TreeView+ depends on / blocked
 
Reported: 2019-06-19 17:09 UTC by Cal Calhoun
Modified: 2019-11-20 06:03 UTC (History)
20 users (show)

Fixed In Version: glusterfs-6.0-8
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-30 12:22:00 UTC
Embargoed:
sankarshan: needinfo-
sankarshan: needinfo-
sankarshan: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2019:3249 0 None None None 2019-10-30 12:22:22 UTC

Description Cal Calhoun 2019-06-19 17:09:39 UTC
Description of problem:
  Customer is seeing a gluster mount point becoming either unmounted or unresponsive.

  Initially it was becoming unmounted and, when they saw a glusterfsd crash, they made a couple of changes recommended by us in previous cases with similar issues:

  1) gluster volume set all cluster.op-version 31303
  2) gluster volume set DC03SP2 performance.parallel-readdir off

  The next issue occurred when running df -h hangs.  They are able to  individually run df -h for all other filesystems except the gluster mount.

  One of the nodes became inaccessible and required a required a reboot
  - putty and irmc console just gave blank screen after login

  They configured kdump for the gluster nodes and generated nmi kdump from iRMC to provdie information for this case.

  Note that these gluster nodes mount the volume locally and are used as the target for the customer's application.

Version-Release number of selected component (if applicable):

  OS: 
    RHEL 7.6

  Kernal:
    kernel-3.10.0-957.1.3.el7.x86_64 

  Gluster:
    glusterfs-3.12.2-25.el7rhgs.x86_64
    glusterfs-api-3.12.2-25.el7rhgs.x86_64
    glusterfs-client-xlators-3.12.2-25.el7rhgs.x86_64
    glusterfs-cli-3.12.2-25.el7rhgs.x86_64
    glusterfs-events-3.12.2-25.el7rhgs.x86_64
    glusterfs-fuse-3.12.2-25.el7rhgs.x86_64
    glusterfs-geo-replication-3.12.2-25.el7rhgs.x86_64
    glusterfs-libs-3.12.2-25.el7rhgs.x86_64
    glusterfs-rdma-3.12.2-25.el7rhgs.x86_64
    glusterfs-server-3.12.2-25.el7rhgs.x86_64
    gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64
    gluster-nagios-common-0.2.4-1.el7rhgs.noarch 

How reproducible:

  Happens roughly once every 24 hours on one or another node but can't be reproduced on demand.

Steps to Reproduce:

  N/A

Actual results:

  Mount point becomes unresponsive

Expected results:

  Mount point should remain responsive

Additional info:

  Supporting info extracted on collab shell.

  First two nmi cores are on optimus and the last one is being processed now.

Comment 13 Amar Tumballi 2019-07-01 11:20:06 UTC
Sameer,

> > [2019-06-12 17:47:10.135284] E [socket.c:2369:socket_connect_finish] 0-DC03SP2-client-68: connection to 192.168.88.209:24007 failed (No route to host); disconnecting socket
> > [2019-06-12 17:47:10.135298] E [socket.c:2369:socket_connect_finish] 0-DC03SP2-client-71: connection to 192.168.88.209:24007 failed (No route to host); disconnecting socket

'No route to host' error generally indicate n/w issues. Considering this hang is happening from one of the clients, please check if all the gluster node's DNS is being properly resolved. If there is an issue in DNS resolution, then surely gluster process would be in hung state.

Comment 47 Xavi Hernandez 2019-10-24 07:23:43 UTC
*** Bug 1751666 has been marked as a duplicate of this bug. ***

Comment 49 errata-xmlrpc 2019-10-30 12:22:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:3249


Note You need to log in before you can comment on or make changes to this bug.