1722209 – [GSS] Issues accessing gluster mount / df -h hanging

Bug 1722209 - [GSS] Issues accessing gluster mount / df -h hanging

Summary: [GSS] Issues accessing gluster mount / df -h hanging

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	rhgs-3.4
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	RHGS 3.5.0
Assignee:	Xavi Hernandez
QA Contact:	Bala Konda Reddy M
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1751666 (view as bug list)
Depends On:	1727068
Blocks:	1696809
TreeView+	depends on / blocked

Reported:	2019-06-19 17:09 UTC by Cal Calhoun
Modified:	2019-11-20 06:03 UTC (History)
CC List:	20 users (show)
Fixed In Version:	glusterfs-6.0-8
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-10-30 12:22:00 UTC
Embargoed:
Dependent Products:
Flags:	sankarshan: needinfo- sankarshan: needinfo- sankarshan: needinfo-

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2019:3249	0	None	None	None	2019-10-30 12:22:22 UTC

Description Cal Calhoun 2019-06-19 17:09:39 UTC

Description of problem:
  Customer is seeing a gluster mount point becoming either unmounted or unresponsive.

  Initially it was becoming unmounted and, when they saw a glusterfsd crash, they made a couple of changes recommended by us in previous cases with similar issues:

  1) gluster volume set all cluster.op-version 31303
  2) gluster volume set DC03SP2 performance.parallel-readdir off

  The next issue occurred when running df -h hangs.  They are able to  individually run df -h for all other filesystems except the gluster mount.

  One of the nodes became inaccessible and required a required a reboot
  - putty and irmc console just gave blank screen after login

  They configured kdump for the gluster nodes and generated nmi kdump from iRMC to provdie information for this case.

  Note that these gluster nodes mount the volume locally and are used as the target for the customer's application.

Version-Release number of selected component (if applicable):

  OS: 
    RHEL 7.6

  Kernal:
    kernel-3.10.0-957.1.3.el7.x86_64 

  Gluster:
    glusterfs-3.12.2-25.el7rhgs.x86_64
    glusterfs-api-3.12.2-25.el7rhgs.x86_64
    glusterfs-client-xlators-3.12.2-25.el7rhgs.x86_64
    glusterfs-cli-3.12.2-25.el7rhgs.x86_64
    glusterfs-events-3.12.2-25.el7rhgs.x86_64
    glusterfs-fuse-3.12.2-25.el7rhgs.x86_64
    glusterfs-geo-replication-3.12.2-25.el7rhgs.x86_64
    glusterfs-libs-3.12.2-25.el7rhgs.x86_64
    glusterfs-rdma-3.12.2-25.el7rhgs.x86_64
    glusterfs-server-3.12.2-25.el7rhgs.x86_64
    gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64
    gluster-nagios-common-0.2.4-1.el7rhgs.noarch 

How reproducible:

  Happens roughly once every 24 hours on one or another node but can't be reproduced on demand.

Steps to Reproduce:

  N/A

Actual results:

  Mount point becomes unresponsive

Expected results:

  Mount point should remain responsive

Additional info:

  Supporting info extracted on collab shell.

  First two nmi cores are on optimus and the last one is being processed now.

Comment 13 Amar Tumballi 2019-07-01 11:20:06 UTC

Sameer,

> > [2019-06-12 17:47:10.135284] E [socket.c:2369:socket_connect_finish] 0-DC03SP2-client-68: connection to 192.168.88.209:24007 failed (No route to host); disconnecting socket
> > [2019-06-12 17:47:10.135298] E [socket.c:2369:socket_connect_finish] 0-DC03SP2-client-71: connection to 192.168.88.209:24007 failed (No route to host); disconnecting socket

'No route to host' error generally indicate n/w issues. Considering this hang is happening from one of the clients, please check if all the gluster node's DNS is being properly resolved. If there is an issue in DNS resolution, then surely gluster process would be in hung state.

Comment 47 Xavi Hernandez 2019-10-24 07:23:43 UTC

*** Bug 1751666 has been marked as a duplicate of this bug. ***

Comment 49 errata-xmlrpc 2019-10-30 12:22:00 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:3249

Note You need to log in before you can comment on or make changes to this bug.

amanzane
amukherj
atoborek
bkunal
jahernan
mnapolis
moagrawa
nbalacha
nravinas
olim
pasik
rhs-bugs
sarora
sheggodu
skandark
spalai
srakonde
vbellur
vdas
vpandey