+++ This bug was initially created as a clone of Bug #1601356 +++ Hello, This is my first time reporting a bug on bugzilla, so let me know if I post something wrong. Description of problem: I am doing some tests with GlusterFS 4.0 and 4.1 and I can't seem to solve some SSL/TLS issues. I am trying to set up a 2 node replicated gluster volume with SSL/TLS. For this setup, I use 3 KVM VMs (2 storage nodes + 1 client node). For the networking part, I use a dedicated private LAN for the KVM VMs. Each VM is able to ping the other, so there's no problem with the connectivity. Version-Release number of selected component (if applicable): These are the installed packages on gluster-client: [root@gluster-client ~]# rpm -qa | grep "gluster\|fuse" glusterfs-4.1.1-1.el7.x86_64 centos-release-gluster41-1.0-1.el7.centos.x86_64 glusterfs-libs-4.1.1-1.el7.x86_64 glusterfs-client-xlators-4.1.1-1.el7.x86_64 glusterfs-fuse-4.1.1-1.el7.x86_64 And these are the installed packages on gluster1 and gluster2 storage nodes: [root@gluster1 ~]# rpm -qa | grep "gluster\|fuse" glusterfs-api-4.1.1-1.el7.x86_64 centos-release-gluster41-1.0-1.el7.centos.x86_64 glusterfs-libs-4.1.1-1.el7.x86_64 glusterfs-4.1.1-1.el7.x86_64 glusterfs-cli-4.1.1-1.el7.x86_64 glusterfs-fuse-4.1.1-1.el7.x86_64 glusterfs-server-4.1.1-1.el7.x86_64 glusterfs-client-xlators-4.1.1-1.el7.x86_64 ===================================================== These are the informations regarding the gluster volume: [root@gluster1 ~]# gluster volume info vol01 Volume Name: vol01 Type: Replicate Volume ID: ab7426a5-23ab-40ff-91af-a5b977152553 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: gluster1:/data/glusterfs/gluster1/vol01/brick1 Brick2: gluster2:/data/glusterfs/gluster2/vol01/brick1 Options Reconfigured: ssl.cipher-list: ALL network.ping-timeout: 5 server.ssl: on client.ssl: on auth.ssl-allow: * transport.address-family: inet nfs.disable: on performance.client-io-threads: off ===================================================== Here is the peers information: [root@gluster1 ~]# gluster peer status Number of Peers: 1 Hostname: gluster2 Uuid: f506bf62-6551-46b0-8a5b-457ae1fde839 State: Peer in Cluster (Connected) ===================================================== Here is the volume status: [root@gluster1 ~]# gluster volume status vol01 Status of volume: vol01 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick gluster1:/data/glusterfs/gluster1/vol 01/brick1 49152 0 Y 11196 Brick gluster2:/data/glusterfs/gluster2/vol 01/brick1 49152 0 Y 11013 Self-heal Daemon on localhost N/A N/A Y 11315 Self-heal Daemon on gluster2 N/A N/A Y 11086 Task Status of Volume vol01 ------------------------------------------------------------------------------ There are no active volume tasks ===================================================== How reproducible: Steps to Reproduce: 1. Install GlusterFS 4.0 or 4.1 2. Make a 2-node replicated gluster volume with SSL/TLS 3. After doing all the necessary settings, try to copy a file to the Fuse mount on the client node. I've also put a .txt file with my procedure of installing the Gluster nodes and client. Let me know if you see anything wrong with it. Actual results: I receive this error: "Transport endpoint is not connected" after I issue the copy command. Expected results: I expected the file to be copied without a problem, like in version 3.12. Additional info: There is a Gluster mailing list thread about this. I will post it here just so that the two are linked: https://lists.gluster.org/pipermail/gluster-users/2018-July/034353.html The mount works fine until I try to copy an archive, multiple smaller files or a bigger file on it (meaning it shows correctly in df -Th and I can create several files with "touch file1 file2..."). Basically, after any data transfer, I get these errors. I followed the indications from the redhat page: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.1/html/administration_guide/chap-network_encryption UPDATE 1: I tried doing the exact same steps in Gluster 3.12 and had no problem. The steps worked and SSL/TLS was enabled. There was no transport error or anything and I also checked if SSL/TLS was enabled. Afterwards, I also tried with the new release 4.1 and the problem persists (same error with "Transport endpoint is not connected"). Let me know if you need any other info. Any help is much appreciated. Regards, Andrei H. --- Additional comment from Omar K on 2018-07-19 17:37:33 IST --- I can confirm the same issue. When copying a few small files onto the FUSE mount it is no problem but as soon as you put any "load" onto it (that means more than a few files, or big files like ISO images) the connection is interrupted with the error message as shown above. Our current workaround is to disable server.ssl and client.ssl for the volumes. We never had this problem with Gluster 3.12 . --- Additional comment from Milind Changire on 2018-07-31 11:51:47 IST --- As per Step 8 8. Set up TLS/SSL encryption on all nodes and clients (gluster1, gluster2, gluster-client): openssl genrsa -out /etc/ssl/glusterfs.key 2048 In gluster1 node: openssl req -new -x509 -key /etc/ssl/glusterfs.key -subj "/CN=gluster1" -out /etc/ssl/glusterfs.pem In gluster2 node: openssl req -new -x509 -key /etc/ssl/glusterfs.key -subj "/CN=gluster2" -out /etc/ssl/glusterfs.pem In gluster-client node: openssl req -new -x509 -key /etc/ssl/glusterfs.key -subj "/CN=gluster-client" -out /etc/ssl/glusterfs.pem ---------- As per Step 15 15. Setup SSL/TLS access to the volume: gluster volume set vol01 auth.ssl-allow 'gluster01,gluster02,gluster-client' gluster volume set vol01 client.ssl on gluster volume set vol01 server.ssl on gluster volume set vol01 network.ping-timeout "5" gluster volume start vol01 ---------- Please note that the Common Name mentioned during SSL key/cert generation is "gluster1" but mentioned in auth.ssl-allow is "gluster01". Please note the '0' prefixed to '1'. Is this a typo during bug reporting or an actual typo during volume configuration ? If this is a typo during volume configuration, it needs to be corrected. Please set auth.ssl-allow to: gluster volume set vol01 auth.ssl-allow 'gluster1,gluster2,gluster-client' --- Additional comment from Omar K on 2018-07-31 12:32:01 IST --- We use auth.ssl-allow "*" and we have the same issue so I'm guessing that's not the problem... --- Additional comment from Havri on 2018-08-01 01:40:00 IST --- Hello, It's just a typo during bug reporting. I also tried Omar's setting with auth.ssl-allow "*" and the issue was the same. Let me know if you need any other info. Thank you. --- Additional comment from Atin Mukherjee on 2018-08-11 16:59:45 IST --- Milind - Please see comment 4. Do we have any further investigation done ? --- Additional comment from Milind Changire on 2018-08-17 12:07:42 IST --- I've built RPMs using the release-4.1 branch with commit f33a61086da43af5a5de2ba99b4045a63cf5bd79 at HEAD There are no issues with SSL configuration. As per the steps listed in the attachment, the server and client pem files are not signed by a CA. This being an upstream BZ, I'll recommend user to look at: https://stackoverflow.com/questions/21297139/how-do-you-sign-a-certificate-signing-request-with-your-certification-authority ----- There's also no problem using self-signed certificates either. --- Additional comment from Milind Changire on 2018-08-17 13:38:54 IST --- I tried copying a 900MB ISO and saw the following problems: I can see the following errors in the client/mount log: [2018-08-17 07:21:20.602283] E [socket.c:2167:__socket_read_frag] 0-rpc: wrong MSG-TYPE (574) received from 192.168.122.87:24007 [2018-08-17 07:21:20.602297] T [socket.c:2801:socket_poller] 0-patchy-client-0: disconnecting socket [2018-08-17 07:21:20.602365] D [MSGID: 0] [client.c:2241:client_rpc_notify] 0-patchy-client-0: got RPC_CLNT_DISCONNECT [2018-08-17 07:21:20.602379] I [MSGID: 114018] [client.c:2254:client_rpc_notify] 0-patchy-client-0: disconnected from patchy-client-0. Client process will keep trying to connect to glusterd until brick's port is available On the brick side: [2018-08-17 07:21:00.723552] E [MSGID: 115067] [server-rpc-fops_v2.c:1316:server4_writev_cbk] 0-patchy-server: 562: WRITEV 0 (3fd3cf86-419e-43eb-88ad-72b12263fab6), client: CTX_ID:47717648-2a74-49b5-8e39-4069a86b2246-GRAPH_ID:0-PID:1553-HOST:centos7-2-PC_NAME:patchy-client-0-RECON_NO:-0, error-xlator: - [Bad file descriptor] --- Additional comment from Omar K on 2018-08-24 19:58:01 IST --- I just re-tested using the commit tagged as v4.1.2 (044f9df65) and the problems persist as described above. The log messages are the same as the ones Milind is getting. From the client's perspective the copy operation of an ISO file aborts with an error message. Few small files can be copied with no problems. Milind do you therefore confirm that a problem exists, or is it unclear? --- Additional comment from Milind Changire on 2018-08-24 20:00:58 IST --- I confirm that the problem is present. --- Additional comment from Worker Ant on 2018-08-25 18:54:09 IST --- REVIEW: https://review.gluster.org/20993 (rpc: handle EAGAIN when SSL_ERROR_SYSCALL is returned) posted (#2) for review on release-4.1 by Milind Changire --- Additional comment from Milind Changire on 2018-08-25 18:57:30 IST --- Please note that the master branch and release-4.1 branch have diverged significantly. So the patch is not applicable to master branch. Also, this issue has already been addressed in the master branch.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3827