Bug 1746615 - SSL Volumes Fail Intermittently in 6.5 [NEEDINFO]
Summary: SSL Volumes Fail Intermittently in 6.5
Keywords:
Status: NEW
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: 6
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-29 00:21 UTC by billycole
Modified: 2020-01-06 13:31 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
srakonde: needinfo? (moagrawa)


Attachments (Terms of Use)

Description billycole 2019-08-29 00:21:07 UTC
Description of problem: Volumes fail to mount properly with client/server.ssl enabled on volumes.  This seems to apply to multiple volume types, though have only tested it with distributed and dispersed.  The mount command succeeds, but accessing the volume gives several intermittent "Transport endpoint is not connected" errors.  This results in odd behavior such as `ls` returning nothing, then erroring, then occasionally returning a result.

Similarly, when issuing `df` commands in succession on the mount, it will start reporting the full drive size, then slowly "shrink" until it starts to throw "transport endpoint is not connected" errors.

[test@ip-10-10-30-220 ~]$ df -h /gscratch
Filesystem                              Size  Used Avail Use% Mounted on
ip-10-10-31-10.ec2.internal:/scratch   44T  496G   44T   2% /gscratch
[test@ip-10-10-30-220 ~]$ df -h /gscratch
Filesystem                              Size  Used Avail Use% Mounted on
ip-10-10-31-10.ec2.internal:/scratch   44T  496G   44T   2% /gscratch
[test@ip-10-10-30-220 ~]$ df -h /gscratch
Filesystem                              Size  Used Avail Use% Mounted on
ip-10-10-31-10.ec2.internal:/scratch   44T  496G   44T   2% /gscratch
[test@ip-10-10-30-220 ~]$ df -h /gscratch
Filesystem                              Size  Used Avail Use% Mounted on
ip-10-10-31-10.ec2.internal:/scratch   44T  496G   44T   2% /gscratch
[test@ip-10-10-30-220 ~]$ df -h /gscratch
Errors.

It almost seems as if the connection is established and then immediately killed after an attempt to push data over it, and waiting a few seconds causes the connections to re-establish.

Disabling the "client.ssl" and "server.ssl" settings on the volume cause these errors to go away.


Version-Release number of selected component (if applicable): glusterfs 6.5


How reproducible:  It seems to be consistent on the cluster that I have. 


Steps to Reproduce:
1. Follow docs here on setting up certs: https://docs.gluster.org/en/latest/Administrator%20Guide/SSL/
2. Create new volume, enable client ssl and server ssl.  Start volume.
3. Mount volume on client.
4. Try to create a new file on the mount, ls the drive, or issue the df command.

Actual results: Intermittent transport errors.


Expected results: The drive should be mountable.


Additional info:

Comment 1 Sanju 2020-01-06 11:26:23 UTC
Mohit, can you please look at it?

Comment 2 Mohit Agrawal 2020-01-06 13:31:06 UTC
Hi,

 Can you please share client nodes along with complete /var/log/gluster directory dump from all the server nodes?

Thanks,
Mohit Agrawal


Note You need to log in before you can comment on or make changes to this bug.