+++ This bug was initially created as a clone of Bug #1258931 +++ Description of problem: When have enabled ssl/tls command "gluster v heal <VOLUME> info" return "VOLNAME: Not able to fetch volfile from glusterd Volume heal failed." Version-Release number of selected component (if applicable): glusterfs 3.7.4-2 on rhel 7.1 with tls/ssl enabled. How reproducible: Enable ssl/tls. Steps to Reproduce: 1. Setup installation from 2 replica. 2. Enable ssl/tls: Generate a private key for each system. openssl genrsa -out /etc/ssl/glusterfs.key 2048 Use the generated private key to create a signed certificate by running the following command: openssl req -new -x509 -key /etc/ssl/glusterfs.key -subj "/CN=COMMONNAME" -out /etc/ssl/glusterfs.pem Concatinate all glusterfs.pem to /etc/ssl/glusterfs.ca and copy on all node. Umount mount-point Enable encrypting managment traffic: touch /var/lib/glusterd/secure-access Stop VOLUME gluster volume stop VOLNAME Setup list of allow servers and clients gluster volume set VOLNAME auth.ssl-allow 'server1,server2,server3,client1,client2,client3' Enable variables: gluster volume set VOLNAME client.ssl on gluster volume set VOLNAME server.ssl on Stop all glusterfs services: /etc/init.d/glusterfs-server stop pkill glusterd pkill glusterfs pkill glusterfsd Start glusterfs service /etc/init.d/glusterfs-server start Start VOLUME: gluster volume start VOLNAME If we want mount our share. 3. Try get "gluster v heal <VOLUME> info" Actual results: VOLNAME: Not able to fetch volfile from glusterd Volume heal failed. Expected results: List of files need healing Additional info: OS: Ubuntu 14.04.3 LTS In glfsheal-VOLNAME.log: [2015-09-01 14:02:03.666757] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2015-09-01 14:02:03.670743] W [socket.c:642:__socket_rwv] 0-gfapi: readv on 127.0.0.1:24007 failed (No data available) [2015-09-01 14:02:03.671053] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x196)[0x7feb297ebf46] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7feb27fad54e] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7feb27fad65e] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x9c)[0x7feb27faef1c] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7feb27faf6b8] ))))) 0-gfapi: forced unwinding frame type(GlusterFS Handshake) op(GETSPEC(2)) called at 2015-09-01 14:02:03.670590 (xid=0x1) [2015-09-01 14:02:03.671076] E [MSGID: 104007] [glfs-mgmt.c:637:glfs_mgmt_getspec_cbk] 0-glfs-mgmt: failed to fetch volume file (key:repofiles) [Invalid argument] [2015-09-01 14:02:03.671102] E [MSGID: 104024] [glfs-mgmt.c:738:mgmt_rpc_notify] 0-glfs-mgmt: failed to connect with remote-host: localhost (No data available) [No data available] [2015-09-01 14:02:03.671115] I [MSGID: 104025] [glfs-mgmt.c:744:mgmt_rpc_notify] 0-glfs-mgmt: Exhausted all volfile servers [Transport endpoint is not connected] In cli.log: [2015-09-01 14:02:03.596437] I [cli.c:720:main] 0-cli: Started running gluster with version 3.7.3 [2015-09-01 14:02:03.599274] I [socket.c:3971:socket_init] 0-glusterfs: SSL support for glusterd is ENABLED [2015-09-01 14:02:03.600027] I [socket.c:3971:socket_init] 0-glusterfs: SSL support for glusterd is ENABLED [2015-09-01 14:02:03.659342] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2015-09-01 14:02:03.659426] I [socket.c:2409:socket_event_handler] 0-transport: disconnecting now [2015-09-01 14:02:03.672269] I [input.c:36:cli_batch] 0-: Exiting with: 255 In etc-glusterfs-glusterd.vol.log every 3 sec next errors: [2015-09-01 14:22:01.690274] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/c6f1a58b839ee4334fc0c8731ca06078.socket failed (Invalid argument) [2015-09-01 14:22:04.690595] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/c6f1a58b839ee4334fc0c8731ca06078.socket failed (Invalid argument) [2015-09-01 14:22:07.690876] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/c6f1a58b839ee4334fc0c8731ca06078.socket failed (Invalid argument) [2015-09-01 14:22:10.691273] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/c6f1a58b839ee4334fc0c8731ca06078.socket failed (Invalid argument) [2015-09-01 14:22:13.691728] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/c6f1a58b839ee4334fc0c8731ca06078.socket failed (Invalid argument) [2015-09-01 14:22:16.692098] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/c6f1a58b839ee4334fc0c8731ca06078.socket failed (Invalid argument) [2015-09-01 14:22:19.692517] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/c6f1a58b839ee4334fc0c8731ca06078.socket failed (Invalid argument) [2015-09-01 14:22:22.692831] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/c6f1a58b839ee4334fc0c8731ca06078.socket failed (Invalid argument) [2015-09-01 14:22:25.693201] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/c6f1a58b839ee4334fc0c8731ca06078.socket failed (Invalid argument) --- Additional comment from JWeir on 2015-10-07 12:09:03 EDT --- Experiencing the exact same issue: Gluster 3.7.4 Ubuntu 14.04.3 The bug does not occur when the secure-access file is absent.
Can I get an update regarding this BZ to provide to my customer?
Hi Pranith, From the customer: >Yes, basically I expect a fix as soon as it is possible. Release dates are >too far. We also have to plan any updates especially because they have an >impact on our customers' application. >Off course, the fix must be reliable, we only have prod environment, we >cannot play with them. So yes, if we can provide one that will fit the customer's requirements, a HOTFIX would be great. Cal
Hello Ashish, I've queried the customer to confirm the version they're running in their prod environment. Can you confirm that this fix has been tested? The customer has expressed concerns because they have no environment to test in prior to applying it. Regards, Cal
RHGS 3.1.2 rpm glusterfs-server-3.7.5-19.el7rhgs.x86_64
Customer has already applied a HOTFIX from BZ 1310740. If a HOTFIX results from this BZ, will they be cumulative?
I am sorry, I don't understand your comment Cal. There is only one bugfix is needed to fix this bug, is that what you are asking.
My apologies. One of my customers, on SF case 01587696, is interested in a hotfix when this BZ has been through QE. They have previously applied a hotfix from BZ 1310740 and would like to know if a hotfix resulting from this BZ would be safe to apply on top of the previous one.
Yes, please go ahead. If you face any problem in applying let us know, but I don't think there should be any problem.
QATP and the results: =================== BUG#1279628 - [GSS]-gluster v heal volname info does not work with enabled ssl/tls Description of Problem: When have enabled ssl/tls command "gluster v heal <VOLUME> info" return "VOLNAME: Not able to fetch volfile from glusterd Volume heal failed." Patch Info:http://review.gluster.org/#/c/13815/ glfs/heal: Use encrypted connection in shd When management encryption is enabled, GlusterD only allows encrypted connections for port 24007. SHD is trying to fetch it's volfile using an unencrypted connection. If /var/lib/glusterd/secure-access is present , i.e. if management ssl is enabled, use encrypted connection fecth info from glusterd. QATP: TC#1:for a New volume gluster v heal info should display the right information when ssl is enabled (both mngt and data traffic) -->PASS 1. Create a cluster and identify client(s) 2. Now enable SSL for both mngt and data traffic using steps mentioned in https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3.1/html/Administration_Guide/chap-Network_Encryption.html 3. Now create a dist-rep volume 4. Now issue heal info on that volume Heal info must give the right information and not throw "unable to fetch volfile" error TC#2:for an Existing volume gluster v heal info should display the right information when ssl is enabled (both mngt and data traffic) -->PASS 1. Create a cluster and identify client(s) 2. Create a distrep volume and populate some data 3. Now enable SSL for both mngt and data traffic using steps mentioned in https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3.1/html/Administration_Guide/chap-Network_Encryption.html 4. Now issue heal info on that volume Heal info must give the right information and not throw "unable to fetch volfile" error TC#3:for a New volume gluster v heal info should display the right information when only mngt layer (glusterd) ssl is enabled -->PASS 1. Create a cluster and identify client(s) 2. Now enable SSL for only mngt using steps mentioned in https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3.1/html/Administration_Guide/chap-Network_Encryption.html 3. Now create a dist-rep volume 4. Now issue heal info on that volume Heal info must give the right information and not throw "unable to fetch volfile" error TC#4:for an existing volume gluster v heal info should display the right information when only mngt layer (glusterd) ssl is enabled -->PASS 1. Create a cluster and identify client(s) 2. Now create a dist-rep volume 3. Now enable SSL for only mngt using steps mentioned in https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3.1/html/Administration_Guide/chap-Network_Encryption.html 4. Now issue heal info on that volume Heal info must give the right information and not throw "unable to fetch volfile" error TC#5: While IOs are going on gluster v heal info should display the right information when SSL is enabled (both mngt and data) -->FAIL 1. Create a cluster and identify client(s) 2. Now create a dist-rep volume 3. Now enable SSL for only mngt using steps mentioned in https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3.1/html/Administration_Guide/chap-Network_Encryption.html 4. Now mount the volume on fuse on multiple clients 5. Now trigger IOs using dd or anyother method IOs must not hang However IOs and heal info hangs and a bug has been raised already 1337863 - [SSL] : I/O hangs when run from multiple clients on an SSL enabled volume 6. Now also issue heal info on that volume Heal info must give the right information and not throw "unable to fetch volfile" error However IOs and heal info hangs and a bug has been raised already 1337863 - [SSL] : I/O hangs when run from multiple clients on an SSL enabled volume Note: Also I had set gluster volume set <volname> locking-scheme granular" while the IOs were going on so as to avoid false positives as mentioned in BUG#1311839 - False positives in heal info Test version: ============ root@dhcp35-191 ~]# rpm -qa|grep gluster glusterfs-cli-3.7.9-6.el7rhgs.x86_64 glusterfs-libs-3.7.9-6.el7rhgs.x86_64 glusterfs-fuse-3.7.9-6.el7rhgs.x86_64 glusterfs-client-xlators-3.7.9-6.el7rhgs.x86_64 glusterfs-server-3.7.9-6.el7rhgs.x86_64 python-gluster-3.7.9-5.el7rhgs.noarch glusterfs-3.7.9-6.el7rhgs.x86_64 glusterfs-api-3.7.9-6.el7rhgs.x86_64
Hi Laura, Text is not exactly giving the correct picture about the issue we faced. It was self heal daemon which was not using ssl connection to communicate with glusterd. Modified text is - ------------------ When management encryption is enabled, Glusterd only allows encrypted connections on port 24007. Self Heal Daemon is trying to fetch it's volfile using an unencrypted connection. This meant that when management SSL was enabled, running the "gluster volume heal info" command resulted in error messages, and users could not see the list of files that needed to be healed. Self Heal Daemon now communicates correctly over an encrypted connection and "gluster volume heal info" works as expected. ------------------
Laura, Description provided by you in comment #21 looks perfect to me. I don't have any more comment on that.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240