Bug 1758923 - [GSS] gluster volume heal info showing "Transport endpoint not connected"
Summary: [GSS] gluster volume heal info showing "Transport endpoint not connected"
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: replicate
Version: rhgs-3.4
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: RHGS 3.5.z Batch Update 1
Assignee: Pranith Kumar K
QA Contact: Nag Pavan Chilakam
URL:
Whiteboard:
Depends On: 1765017 1793085 1793096
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-06 20:49 UTC by Cal Calhoun
Modified: 2023-03-24 15:36 UTC (History)
17 users (show)

Fixed In Version: glusterfs-6.0-23
Doc Type: Bug Fix
Doc Text:
The 'gluster volume heal <volname> info' always does a DNS name resolution. When DNS address resolution takes 30 seconds to fail, the command used to fail as it crosses the time out value set for gluster AFR volumes. With this fix, "gluster volume heal <volname> info" doesn't resolve the erroneous address and it works correctly even when DNS-server configuration has issues.
Clone Of:
Environment:
Last Closed: 2020-01-30 06:42:47 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:0288 0 None None None 2020-01-30 06:43:02 UTC

Description Cal Calhoun 2019-10-06 20:49:51 UTC
Description of problem:

  When running gluster volume heal <vol> info, several bricks are showing "Transport endpoint not connected"

Version-Release number of selected component (if applicable):

    RHEL: 7.2
    RHGS: glusterfs-3.12.2-47.el7rhgs.x86_64
          glusterfs-api-3.12.2-47.el7rhgs.x86_64
          glusterfs-cli-3.12.2-47.el7rhgs.x86_64
          glusterfs-client-xlators-3.12.2-47.el7rhgs.x86_64
          glusterfs-fuse-3.12.2-47.el7rhgs.x86_64
          glusterfs-geo-replication-3.12.2-47.el7rhgs.x86_64
          glusterfs-libs-3.12.2-47.el7rhgs.x86_64
          glusterfs-rdma-3.12.2-47.el7rhgs.x86_64
          glusterfs-server-3.12.2-47.el7rhgs.x86_64
  kernel: kernel-3.10.0-327.18.2.el7.x86_64

How reproducible:

  Ongoing

Steps to Reproduce:

  Run gluster volume heal <vol> info from nodes 7 or 8 and all bricks from node 4 show as "Transport endpoint not connected"

Additional info:

  Client VM's running various applications are having trouble connecting to gluster volumes.  This is what originally presented as the problem.  After sequentially restarting the gluster nodes and checking for healing, the transport endpoint messages were noticed.

  During troubleshooting we performed the following:

  1. Initially noted that there were several bricks down.  Force restarted volumes and most bricks came back online.  Afterwards, for the most part, gluster volume status shows all bricks and self-heal daemons online.  There are a couple of outliers but most volumes appeared fine.

  2. We then tried stopping gluster services with systemctl stop glusterd; pkill glusterfs; pkill glusterfsd followed by systemctl start glusterd sequentially on each node.  Again, gluster volume status showed only a couple of bricks offline but the transport messages continue on nodes 7 and 8 for all bricks on node 4.

  3. We then tried stopping glusterd on all nodes then starting back up sequentially.  No improvement in the transport messages.

  4. We noticed that op.version was set to 30712 (RHGS 3.1 update 3).  Had them set op.version to 31305.

  5. Requested the customer to check with their end-users to see if the applications were responding but at the time this BZ is being opened, we do not have a response yet.

  6. New gluster node sosreports (post changes) and at least one client sosreport from a client having difficulties have been requested.  Original sosreports are on collab-shell.  We will continue adding information to this BZ as it becomes available.

At this point, we are not sure if the transport messages and the clients having trouble with gluster volumes are related.

Comment 45 Pranith Kumar K 2019-11-06 08:20:28 UTC
https://review.gluster.org/c/glusterfs/+/23606

Comment 54 Anjana KD 2020-01-29 17:32:13 UTC
 kindly verify the updated doc text in the doc text field.

Comment 57 errata-xmlrpc 2020-01-30 06:42:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0288


Note You need to log in before you can comment on or make changes to this bug.