1622308 – Problem with SSL/TLS encryption on Gluster 4.0 & 4.1

Bug 1622308 - Problem with SSL/TLS encryption on Gluster 4.0 & 4.1

Summary: Problem with SSL/TLS encryption on Gluster 4.0 & 4.1

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	core
Sub Component:
Version:	rhgs-3.3
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.4.z Batch Update 2
Assignee:	Milind Changire
QA Contact:	Vinayak Papnoi
Docs Contact:
URL:
Whiteboard:
Depends On:	1601356
Blocks:
TreeView+	depends on / blocked

Reported:	2018-08-25 16:59 UTC by Milind Changire
Modified:	2018-12-17 17:07 UTC (History)
CC List:	15 users (show)
Fixed In Version:	glusterfs-3.12.2-27
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1601356
Environment:
Last Closed:	2018-12-17 17:07:04 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2018:3827	0	None	None	None	2018-12-17 17:07:16 UTC

Description Milind Changire 2018-08-25 16:59:53 UTC

+++ This bug was initially created as a clone of Bug #1601356 +++

Hello,

This is my first time reporting a bug on bugzilla, so let me know if I post something wrong.

Description of problem:

I am doing some tests with GlusterFS 4.0 and 4.1 and I can't seem to solve some 
SSL/TLS issues. I am trying to set up a 2 node replicated gluster volume 
with SSL/TLS. For this setup, I use 3 KVM VMs (2 storage nodes + 1 
client node). For the networking part, I use a dedicated private LAN for 
the KVM VMs. Each VM is able to ping the other, so there's no problem 
with the connectivity.




Version-Release number of selected component (if applicable):

These are the installed packages on gluster-client:

[root@gluster-client ~]# rpm -qa | grep "gluster\|fuse"
glusterfs-4.1.1-1.el7.x86_64
centos-release-gluster41-1.0-1.el7.centos.x86_64
glusterfs-libs-4.1.1-1.el7.x86_64
glusterfs-client-xlators-4.1.1-1.el7.x86_64
glusterfs-fuse-4.1.1-1.el7.x86_64




And these are the installed packages on gluster1 and gluster2 storage nodes:


[root@gluster1 ~]# rpm -qa | grep "gluster\|fuse"
glusterfs-api-4.1.1-1.el7.x86_64
centos-release-gluster41-1.0-1.el7.centos.x86_64
glusterfs-libs-4.1.1-1.el7.x86_64
glusterfs-4.1.1-1.el7.x86_64
glusterfs-cli-4.1.1-1.el7.x86_64
glusterfs-fuse-4.1.1-1.el7.x86_64
glusterfs-server-4.1.1-1.el7.x86_64
glusterfs-client-xlators-4.1.1-1.el7.x86_64


=====================================================

These are the informations regarding the gluster volume:

[root@gluster1 ~]# gluster volume info vol01
 
Volume Name: vol01
Type: Replicate
Volume ID: ab7426a5-23ab-40ff-91af-a5b977152553
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: gluster1:/data/glusterfs/gluster1/vol01/brick1
Brick2: gluster2:/data/glusterfs/gluster2/vol01/brick1
Options Reconfigured:
ssl.cipher-list: ALL
network.ping-timeout: 5
server.ssl: on
client.ssl: on
auth.ssl-allow: *
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off


=====================================================

Here is the peers information:

[root@gluster1 ~]# gluster peer status
Number of Peers: 1

Hostname: gluster2
Uuid: f506bf62-6551-46b0-8a5b-457ae1fde839
State: Peer in Cluster (Connected)



=====================================================

Here is the volume status:

[root@gluster1 ~]# gluster volume status vol01
Status of volume: vol01
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gluster1:/data/glusterfs/gluster1/vol
01/brick1                                   49152     0          Y       11196
Brick gluster2:/data/glusterfs/gluster2/vol
01/brick1                                   49152     0          Y       11013
Self-heal Daemon on localhost               N/A       N/A        Y       11315
Self-heal Daemon on gluster2                N/A       N/A        Y       11086
 
Task Status of Volume vol01
------------------------------------------------------------------------------
There are no active volume tasks



=====================================================




How reproducible:


Steps to Reproduce:
1. Install GlusterFS 4.0 or 4.1
2. Make a 2-node replicated gluster volume 
with SSL/TLS
3. After doing all the necessary settings, try to copy a file to the Fuse mount on the client node.

I've also put a .txt file with my procedure of installing the Gluster nodes and client. Let me know if you see anything wrong with it.

Actual results:

I receive this error: "Transport endpoint is not connected" after I issue the copy command.

Expected results:

I expected the file to be copied without a problem, like in version 3.12.

Additional info:

There is a Gluster mailing list thread about this. I will post it here just so that the two are linked:

https://lists.gluster.org/pipermail/gluster-users/2018-July/034353.html


The 
mount works fine until I try to copy an archive, multiple smaller files 
or a bigger file on it (meaning it shows correctly in df -Th and I can 
create several files with "touch file1 file2..."). Basically, after any 
data transfer, I get these errors.


I followed the indications from the redhat page:

https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.1/html/administration_guide/chap-network_encryption

UPDATE 1:
I tried doing the exact same steps in Gluster 3.12 and had no problem. 
The steps worked and SSL/TLS was enabled. There was no transport error 
or anything and I also checked if SSL/TLS was enabled. Afterwards, I 
also tried with the new release 4.1 and the problem persists (same error 
with "Transport endpoint is not connected").
Let me know if you need any other info. Any help is much appreciated.

Regards,
Andrei H.

--- Additional comment from Omar K on 2018-07-19 17:37:33 IST ---

I can confirm the same issue. When copying a few small files onto the FUSE mount it is no problem but as soon as you put any "load" onto it (that means more than a few files, or big files like ISO images) the connection is interrupted with the error message as shown above.

Our current workaround is to disable server.ssl and client.ssl for the volumes.

We never had this problem with Gluster 3.12 .

--- Additional comment from Milind Changire on 2018-07-31 11:51:47 IST ---

As per Step 8

8. Set up TLS/SSL encryption on all nodes and clients (gluster1, 
gluster2, gluster-client):

openssl genrsa -out /etc/ssl/glusterfs.key 2048

In gluster1 node:
openssl req -new -x509 -key /etc/ssl/glusterfs.key -subj "/CN=gluster1" 
-out /etc/ssl/glusterfs.pem
In gluster2 node:
openssl req -new -x509 -key /etc/ssl/glusterfs.key -subj "/CN=gluster2" 
-out /etc/ssl/glusterfs.pem
In gluster-client node:
openssl req -new -x509 -key /etc/ssl/glusterfs.key -subj 
"/CN=gluster-client" -out /etc/ssl/glusterfs.pem

----------

As per Step 15

15. Setup SSL/TLS access to the volume:

gluster volume set vol01 auth.ssl-allow 'gluster01,gluster02,gluster-client'


gluster volume set vol01 client.ssl on
gluster volume set vol01 server.ssl on

gluster volume set vol01 network.ping-timeout "5"

gluster volume start vol01

----------

Please note that the Common Name mentioned during SSL key/cert generation is "gluster1" but mentioned in auth.ssl-allow is "gluster01". Please note the '0' prefixed to '1'.

Is this a typo during bug reporting or an actual typo during volume configuration ?

If this is a typo during volume configuration, it needs to be corrected.
Please set auth.ssl-allow to:

gluster volume set vol01 auth.ssl-allow 'gluster1,gluster2,gluster-client'

--- Additional comment from Omar K on 2018-07-31 12:32:01 IST ---

We use auth.ssl-allow "*" and we have the same issue so I'm guessing that's not the problem...

--- Additional comment from Havri on 2018-08-01 01:40:00 IST ---

Hello,

It's just a typo during bug reporting. I also tried Omar's setting with auth.ssl-allow "*" and the issue was the same.

Let me know if you need any other info.

Thank you.

--- Additional comment from Atin Mukherjee on 2018-08-11 16:59:45 IST ---

Milind - Please see comment 4. Do we have any further investigation done ?

--- Additional comment from Milind Changire on 2018-08-17 12:07:42 IST ---

I've built RPMs using the release-4.1 branch with commit f33a61086da43af5a5de2ba99b4045a63cf5bd79 at HEAD

There are no issues with SSL configuration.

As per the steps listed in the attachment, the server and client pem files are not signed by a CA.
This being an upstream BZ, I'll recommend user to look at:
https://stackoverflow.com/questions/21297139/how-do-you-sign-a-certificate-signing-request-with-your-certification-authority

-----

There's also no problem using self-signed certificates either.

--- Additional comment from Milind Changire on 2018-08-17 13:38:54 IST ---

I tried copying a 900MB ISO and saw the following problems:

I can see the following errors in the client/mount log:

[2018-08-17 07:21:20.602283] E [socket.c:2167:__socket_read_frag] 0-rpc: wrong MSG-TYPE (574) received from 192.168.122.87:24007
[2018-08-17 07:21:20.602297] T [socket.c:2801:socket_poller] 0-patchy-client-0: disconnecting socket
[2018-08-17 07:21:20.602365] D [MSGID: 0] [client.c:2241:client_rpc_notify] 0-patchy-client-0: got RPC_CLNT_DISCONNECT
[2018-08-17 07:21:20.602379] I [MSGID: 114018] [client.c:2254:client_rpc_notify] 0-patchy-client-0: disconnected from patchy-client-0. Client process will keep trying to connect to glusterd until brick's port is available


On the brick side:

[2018-08-17 07:21:00.723552] E [MSGID: 115067] [server-rpc-fops_v2.c:1316:server4_writev_cbk] 0-patchy-server: 562: WRITEV 0 (3fd3cf86-419e-43eb-88ad-72b12263fab6), client: CTX_ID:47717648-2a74-49b5-8e39-4069a86b2246-GRAPH_ID:0-PID:1553-HOST:centos7-2-PC_NAME:patchy-client-0-RECON_NO:-0, error-xlator: - [Bad file descriptor]

--- Additional comment from Omar K on 2018-08-24 19:58:01 IST ---

I just re-tested using the commit tagged as v4.1.2 (044f9df65) and the problems persist as described above. The log messages are the same as the ones Milind is getting. From the client's perspective the copy operation of an ISO file aborts with an error message. Few small files can be copied with no problems.

Milind do you therefore confirm that a problem exists, or is it unclear?

--- Additional comment from Milind Changire on 2018-08-24 20:00:58 IST ---

I confirm that the problem is present.

--- Additional comment from Worker Ant on 2018-08-25 18:54:09 IST ---

REVIEW: https://review.gluster.org/20993 (rpc: handle EAGAIN when SSL_ERROR_SYSCALL is returned) posted (#2) for review on release-4.1 by Milind Changire

--- Additional comment from Milind Changire on 2018-08-25 18:57:30 IST ---

Please note that the master branch and release-4.1 branch have diverged significantly.
So the patch is not applicable to master branch.

Also, this issue has already been addressed in the master branch.

Comment 11 errata-xmlrpc 2018-12-17 17:07:04 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3827

Note You need to log in before you can comment on or make changes to this bug.