Description of problem: While attempting to setup SSL for remote management, I managed to get the CLI to segfault. Version-Release number of selected component (if applicable): 3.6.0 How reproducible: Every time Steps to Reproduce: 1. On 2 hosts: add an SSL cert & key to /etc/ssl/glusterfs.{key,pem,ca} 2. On host 1: add `option transport.socket.ssl on` to /etc/glusterfs/glusterd.vol & restart glusterd 3. On host 2: run `gluster --remote-host=host1 peer probe host2`. Actual results: Segmentation fault (core dumped) Expected results: No segmentation fault Additional info:
Here's the backtrace from GDB: (gdb) bt #0 0x00007f03692c98b0 in SSL_read () from /lib64/libssl.so.10 #1 0x00007f03694f8602 in ssl_do (buf=0x7f036d6bdd14, len=4, func=0x7f03692c98b0 <SSL_read>, this=0x7f036d6b84e0, this=0x7f036d6b84e0) at socket.c:281 #2 0x00007f03694f8885 in __socket_ssl_readv (this=this@entry=0x7f036d6b84e0, opvector=opvector@entry=0x7f036d6bdcc0, opcount=opcount@entry=1) at socket.c:416 #3 0x00007f03694f8b69 in __socket_cached_read (opcount=1, opvector=0x7f036d6bdcc0, this=0x7f036d6b84e0) at socket.c:504 #4 __socket_rwv (this=this@entry=0x7f036d6b84e0, vector=<optimized out>, count=count@entry=1, pending_vector=pending_vector@entry=0x7f036d6bdd08, pending_count=pending_count@entry=0x7f036d6bdd10, bytes=bytes@entry=0x0, write=write@entry=0) at socket.c:578 #5 0x00007f03694fc211 in __socket_readv (bytes=0x0, pending_count=0x7f036d6bdd10, pending_vector=0x7f036d6bdd08, count=1, vector=<optimized out>, this=0x7f036d6b84e0) at socket.c:671 #6 __socket_proto_state_machine (pollin=<synthetic pointer>, this=0x7f036d6b84e0) at socket.c:2049 #7 socket_proto_state_machine (pollin=<synthetic pointer>, this=0x7f036d6b84e0) at socket.c:2205 #8 socket_event_poll_in (this=this@entry=0x7f036d6b84e0) at socket.c:2221 #9 0x00007f03694fece4 in socket_event_handler (fd=<optimized out>, idx=0, data=data@entry=0x7f036d6b84e0, poll_in=1, poll_out=4, poll_err=16) at socket.c:2338 #10 0x00007f036c3bb322 in event_dispatch_epoll_handler (i=<optimized out>, events=0x7f036d6efb90, event_pool=0x7f036d68d410) at event-epoll.c:384 #11 event_dispatch_epoll (event_pool=0x7f036d68d410) at event-epoll.c:445 #12 0x00007f036c814f66 in main (argc=<optimized out>, argv=0x7fffc38a82f8) at cli.c:724
Some more info. I forgot, on the client (host 2) I had performed `touch /var/lib/glusterd/secure-access`. But I had not done this on the server (host 1). After I do this on the server (and bounce glusterd), glusterd seg faults.
As far as I can tell, this problem has to do with sending requests on connections that failed - either CLI to glusterd, or glusterd to glusterd in the case of a "peer probe" operation. That's a bug somewhere above the transport layer - probably a new one, which would explain why this wasn't seen before. Still, it shouldn't cause a segfault. I've written a patch to address that part, so it doesn't segfault, but there are other "infelicities" involved that I'm still looking into. Also, it's not advisable to change the transport.socket.ssl option directly in the glusterd volfile. It shouldn't be harmful, because that should only affect the I/O path and thus be irrelevant for glusterd, but it's also possible that it could interfere with the way we make decisions about when to use SSL and when not to (because in the portmapper case we need to switch based on two different settings). The correct way to enable SSL for the management layer is via the secure-access file. See this doc (unfortunately still in review) for more details. http://review.gluster.org/#/c/8961/2/doc/admin-guide/en-US/markdown/admin_ssl.md
REVIEW: http://review.gluster.org/9059 (socket: fix segfaults when TLS management connections fail) posted (#1) for review on master by Jeff Darcy (jdarcy)
REVIEW: http://review.gluster.org/9059 (socket: fix segfaults when TLS management connections fail) posted (#2) for review on master by Jeff Darcy (jdarcy)
Fix works. Used the RPMs from http://build.gluster.org/job/glusterfs-devrpms/3762/ Thanks for the link to the doc as well. Though some feedback: having to create an empty file to enable the feature feels dirty.
It feels dirty because it is dirty, but there aren't many alternatives that work for the CLI as well as all of the daemons. The CLI doesn't use a config file for permanent options, so they have to be re-specified on every invocation. That means anyone switching from insecure to secure management communications has to change all of their scripts that use the CLI. Environment variables aren't as foolproof, and wrappers are even dirtier in their own way. That said, perhaps it would be better to make this the start of a permanent CLI config file. If we define a place and a format, then we could use it for future options with similar behavior, even if this is the only one for now. Thanks for the feedback. BTW, I'm working on a better fix that will pass all of our regression tests, and hopefully also eliminate the long delay before the CLI command fails. Keep watching this space. ;)
REVIEW: http://review.gluster.org/9059 (socket: fix segfaults when TLS management connections fail) posted (#3) for review on master by Jeff Darcy (jdarcy)
Most recent build works fine still (http://build.gluster.org/job/glusterfs-devrpms/3768/) I actually think a CLI config makes sense. Now that you can have full remote management over SSL, I think using it might become a more common practice. Potential things which I can think of which would go in the config: * Path to glusterd unix domain socket or address of remote host. * Whether to enable SSL * Path to SSL certificate/key/CA The docs even indicate that you can authenticate against bricks with a username & password (https://forge.gluster.org/glusterfs-core/glusterfs/blobs/master/doc/authentication.txt). Though I don't see a way to do that with the management connection, but perhaps that would be a future feature, in which you'd then have config params for username & pass.
OK, this latest one looks good to go. Not sure if that's the one you already had.
COMMIT: http://review.gluster.org/9059 committed in master by Vijay Bellur (vbellur) ------ commit 0b9a6a63b50e0c4947233aee33fc86f603f77dd1 Author: Jeff Darcy <jdarcy> Date: Wed Nov 5 22:37:48 2014 -0500 socket: fix segfaults when TLS management connections fail Change-Id: I1fd085b04ad1ee68c982d3736b322c19dd12e071 BUG: 1160900 Signed-off-by: Jeff Darcy <jdarcy> Reviewed-on: http://review.gluster.org/9059 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Harshavardhana <harsha> Reviewed-by: Vijay Bellur <vbellur>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report. glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user