Bug 1601356
Summary: | Problem with SSL/TLS encryption on Gluster 4.0 & 4.1 | ||||||
---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Havri <andreihavriliuc> | ||||
Component: | core | Assignee: | Milind Changire <mchangir> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 4.1 | CC: | amukherj, andreihavriliuc, atumball, bugs, david.spisla, i_chips, jstrunk, mchangir, omar.kohl, pasik | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | glusterfs-4.1.5 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1622308 1622405 (view as bug list) | Environment: | |||||
Last Closed: | 2018-09-26 14:02:57 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1622308, 1622405 | ||||||
Attachments: |
|
Description
Havri
2018-07-16 07:39:54 UTC
I can confirm the same issue. When copying a few small files onto the FUSE mount it is no problem but as soon as you put any "load" onto it (that means more than a few files, or big files like ISO images) the connection is interrupted with the error message as shown above. Our current workaround is to disable server.ssl and client.ssl for the volumes. We never had this problem with Gluster 3.12 . As per Step 8 8. Set up TLS/SSL encryption on all nodes and clients (gluster1, gluster2, gluster-client): openssl genrsa -out /etc/ssl/glusterfs.key 2048 In gluster1 node: openssl req -new -x509 -key /etc/ssl/glusterfs.key -subj "/CN=gluster1" -out /etc/ssl/glusterfs.pem In gluster2 node: openssl req -new -x509 -key /etc/ssl/glusterfs.key -subj "/CN=gluster2" -out /etc/ssl/glusterfs.pem In gluster-client node: openssl req -new -x509 -key /etc/ssl/glusterfs.key -subj "/CN=gluster-client" -out /etc/ssl/glusterfs.pem ---------- As per Step 15 15. Setup SSL/TLS access to the volume: gluster volume set vol01 auth.ssl-allow 'gluster01,gluster02,gluster-client' gluster volume set vol01 client.ssl on gluster volume set vol01 server.ssl on gluster volume set vol01 network.ping-timeout "5" gluster volume start vol01 ---------- Please note that the Common Name mentioned during SSL key/cert generation is "gluster1" but mentioned in auth.ssl-allow is "gluster01". Please note the '0' prefixed to '1'. Is this a typo during bug reporting or an actual typo during volume configuration ? If this is a typo during volume configuration, it needs to be corrected. Please set auth.ssl-allow to: gluster volume set vol01 auth.ssl-allow 'gluster1,gluster2,gluster-client' We use auth.ssl-allow "*" and we have the same issue so I'm guessing that's not the problem... Hello, It's just a typo during bug reporting. I also tried Omar's setting with auth.ssl-allow "*" and the issue was the same. Let me know if you need any other info. Thank you. Milind - Please see comment 4. Do we have any further investigation done ? I've built RPMs using the release-4.1 branch with commit f33a61086da43af5a5de2ba99b4045a63cf5bd79 at HEAD There are no issues with SSL configuration. As per the steps listed in the attachment, the server and client pem files are not signed by a CA. This being an upstream BZ, I'll recommend user to look at: https://stackoverflow.com/questions/21297139/how-do-you-sign-a-certificate-signing-request-with-your-certification-authority ----- There's also no problem using self-signed certificates either. I tried copying a 900MB ISO and saw the following problems: I can see the following errors in the client/mount log: [2018-08-17 07:21:20.602283] E [socket.c:2167:__socket_read_frag] 0-rpc: wrong MSG-TYPE (574) received from 192.168.122.87:24007 [2018-08-17 07:21:20.602297] T [socket.c:2801:socket_poller] 0-patchy-client-0: disconnecting socket [2018-08-17 07:21:20.602365] D [MSGID: 0] [client.c:2241:client_rpc_notify] 0-patchy-client-0: got RPC_CLNT_DISCONNECT [2018-08-17 07:21:20.602379] I [MSGID: 114018] [client.c:2254:client_rpc_notify] 0-patchy-client-0: disconnected from patchy-client-0. Client process will keep trying to connect to glusterd until brick's port is available On the brick side: [2018-08-17 07:21:00.723552] E [MSGID: 115067] [server-rpc-fops_v2.c:1316:server4_writev_cbk] 0-patchy-server: 562: WRITEV 0 (3fd3cf86-419e-43eb-88ad-72b12263fab6), client: CTX_ID:47717648-2a74-49b5-8e39-4069a86b2246-GRAPH_ID:0-PID:1553-HOST:centos7-2-PC_NAME:patchy-client-0-RECON_NO:-0, error-xlator: - [Bad file descriptor] I just re-tested using the commit tagged as v4.1.2 (044f9df65) and the problems persist as described above. The log messages are the same as the ones Milind is getting. From the client's perspective the copy operation of an ISO file aborts with an error message. Few small files can be copied with no problems. Milind do you therefore confirm that a problem exists, or is it unclear? I confirm that the problem is present. REVIEW: https://review.gluster.org/20993 (rpc: handle EAGAIN when SSL_ERROR_SYSCALL is returned) posted (#2) for review on release-4.1 by Milind Changire Please note that the master branch and release-4.1 branch have diverged significantly. So the patch is not applicable to master branch. Also, this issue has already been addressed in the master branch. COMMIT: https://review.gluster.org/20993 committed in release-4.1 by "Shyamsundar Ranganathan" <srangana> with a commit message- rpc: handle EAGAIN when SSL_ERROR_SYSCALL is returned Problem: A return value of ENODATA was forcibly returned in the case where SSL_get_error(r) returned SSL_ERROR_SYSCALL. Sometimes SSL_ERROR_SYSCALL is a transient error which is identified by setting errno to EAGAIN. EAGAIN is not a fatal error and indicates that the syscall needs to be retried. Solution: Bubble up the errno in case SSL_get_error(r) returns SSL_ERROR_SYSCALL and let the upper layers handle it appropriately. fixes: bz#1601356 Change-Id: I76eff278378930ee79abbf9fa267a7e77356eed6 Signed-off-by: Milind Changire <mchangir> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-4.1.5, please open a new bug report. glusterfs-4.1.5 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://lists.gluster.org/pipermail/announce/2018-September/000113.html [2] https://www.gluster.org/pipermail/gluster-users/ Mounted Dir Gets Error in GlusterFS Storage Cluster with SSL/TLS Encryption as doing add-brick and remove-brick repeatly Hi all, I'm afraid that there's something wrong with GlusterFS storage cluster using SSL/TLS encryption in latest version 4.1.5 or older versions. First, I enabled SSL for GlusterFS Replicated Volume afr_vol, which is mounted on /mnt/gluster by GlusterFS native client. Then, I did add-brick and remove-brick repeatedly. At the same time, mounted dir are read or written continuously for a while. Here I used the command "find /mnt/gluster" or "ls /mnt/gluster" to reproduce it. Later, The error message is as below: find: ‘/mnt/gluster’: Transport endpoint is not connected However, everything would be OK if I disabled SSL for GlusterFS Replicated Volume afr_vol. So, it seems that bug 1601356("Problem with SSL/TLS encryption on Gluster 4.0 & 4.1") hasn't been fixed yet. Hope someone could help me. Thanks a lot. As mounted dir cannot be accessed, process of glusterfs is in 100% CPU. and the result of strace of glusterfs process is as below: strace -f -p 6576 [pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}]) [pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}]) [pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}]) [pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}]) [pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}]) [pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}]) [pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}]) [pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}]) [pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}]) [pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}]) [pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}]) [pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}]) [pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}]) [pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}]) [pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}]) [pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}]) [pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}]) [pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}]) [pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}]) [pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}]) [pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}]) [pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}]) [pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}]) [pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}]) [pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}]) [pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}]) [pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}]) [pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}]) [pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}]) [pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}]) [pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}]) [pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}]) [pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}]) [pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295^CProcess 6576 detached Process 6577 detached Process 6578 detached Process 6579 detached Process 6580 detached Process 6581 detached Process 6584 detached Process 6585 detached Process 6596 detached Process 6597 detached Process 1578 detached Process 1581 detached Process 11623 detached Process 14581 detached Process 14601 detached <detached ...> Process 15032 detached |