Bug 2011549

Summary: [AFR] Constant intermittent "Transport endpoint is not connected" errors disrupting operations
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Andrew Robinson <anrobins>
Component: coreAssignee: Mohit Agrawal <moagrawa>
Status: CLOSED DUPLICATE QA Contact: Pranav Prakash <prprakas>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: rhgs-3.5CC: erik.cederberg, jpankaja, mhackett, moagrawa, peter.karlsson.zetterberg, rcarrier, rhs-bugs, sajmoham, saraut, sheggodu, vdas
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-02-09 06:21:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2017641    
Bug Blocks:    

Description Andrew Robinson 2021-10-06 19:21:23 UTC
Before you record your issue, ensure you are using the latest version of Gluster.

Provide version-Release number of selected component (if applicable):

> glusterfs-6.0-56.2.el7rhgs.x86_64
 
Have you searched the Bugzilla archives for same/similar issues reported.

> 
 
Have you discovered any workarounds?. If not, Read the troubleshooting documentation to help solve your issue. ( https://mojo.redhat.com/groups/gss-gluster (Gluster feature and its troubleshooting)  https://access.redhat.com/articles/1365073 (Specific debug data that needs to be collected for GlusterFS to help troubleshooting)

> No


Describe the issue:(please be detailed as possible and provide log snippets)
[Provide TimeStamp when the issue is seen]

> The customer is getting intermittent but constant "Transport endpoint is not connected" errors when trying to perform gluster operations. The 'gluster volume status' shows all bricks online. However, if they run 'gluster volume heal <vol> info' or even an 'ls' on a volume mount, the command will run to completion sometimes and fail with a "Transport endpoint is not connected" error the other times. This is preventing them from getting work done with the cluster. This started happening about 15 hours before I write this.

The customer has rebooted all three gluster nodes and the three network switches the gluster nodes connect to. That does not seem to make any difference. 


Is this issue reproducible? If yes, share more details.:


Steps to Reproduce:
1.
2.
3.
Actual results:
 
Expected results:
 
Mandatory Information for all Bugs:
1 - gluster v <volname> info
2 - gluster v <volname> heal info
3 - gluster v <volname> status
4 - Fuse Mount/SMB/nfs-ganesha/OCS ???
 

Additional info: