Bug 1746615

Summary: SSL Volumes Fail Intermittently in 6.5
Product: [Community] GlusterFS Reporter: billycole
Component: glusterdAssignee: bugs <bugs>
Status: CLOSED INSUFFICIENT_DATA QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6CC: bugs, moagrawa, pasik, srakonde
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-01-28 10:47:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description billycole 2019-08-29 00:21:07 UTC
Description of problem: Volumes fail to mount properly with client/server.ssl enabled on volumes.  This seems to apply to multiple volume types, though have only tested it with distributed and dispersed.  The mount command succeeds, but accessing the volume gives several intermittent "Transport endpoint is not connected" errors.  This results in odd behavior such as `ls` returning nothing, then erroring, then occasionally returning a result.

Similarly, when issuing `df` commands in succession on the mount, it will start reporting the full drive size, then slowly "shrink" until it starts to throw "transport endpoint is not connected" errors.

[test@ip-10-10-30-220 ~]$ df -h /gscratch
Filesystem                              Size  Used Avail Use% Mounted on
ip-10-10-31-10.ec2.internal:/scratch   44T  496G   44T   2% /gscratch
[test@ip-10-10-30-220 ~]$ df -h /gscratch
Filesystem                              Size  Used Avail Use% Mounted on
ip-10-10-31-10.ec2.internal:/scratch   44T  496G   44T   2% /gscratch
[test@ip-10-10-30-220 ~]$ df -h /gscratch
Filesystem                              Size  Used Avail Use% Mounted on
ip-10-10-31-10.ec2.internal:/scratch   44T  496G   44T   2% /gscratch
[test@ip-10-10-30-220 ~]$ df -h /gscratch
Filesystem                              Size  Used Avail Use% Mounted on
ip-10-10-31-10.ec2.internal:/scratch   44T  496G   44T   2% /gscratch
[test@ip-10-10-30-220 ~]$ df -h /gscratch
Errors.

It almost seems as if the connection is established and then immediately killed after an attempt to push data over it, and waiting a few seconds causes the connections to re-establish.

Disabling the "client.ssl" and "server.ssl" settings on the volume cause these errors to go away.


Version-Release number of selected component (if applicable): glusterfs 6.5


How reproducible:  It seems to be consistent on the cluster that I have. 


Steps to Reproduce:
1. Follow docs here on setting up certs: https://docs.gluster.org/en/latest/Administrator%20Guide/SSL/
2. Create new volume, enable client ssl and server ssl.  Start volume.
3. Mount volume on client.
4. Try to create a new file on the mount, ls the drive, or issue the df command.

Actual results: Intermittent transport errors.


Expected results: The drive should be mountable.


Additional info:

Comment 1 Sanju 2020-01-06 11:26:23 UTC
Mohit, can you please look at it?

Comment 2 Mohit Agrawal 2020-01-06 13:31:06 UTC
Hi,

 Can you please share client nodes along with complete /var/log/gluster directory dump from all the server nodes?

Thanks,
Mohit Agrawal

Comment 3 Sanju 2020-01-28 10:47:31 UTC
Closing this bug as insufficient data, please feel free to reopen the bug with all the requested information if you hit the issue again.

Thanks,
Sanju

Comment 4 Red Hat Bugzilla 2023-09-14 05:42:34 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days