Bug 1333317 - rpc_clnt will sometimes not reconnect when using encryption
Summary: rpc_clnt will sometimes not reconnect when using encryption
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: rpc
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kaushal
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1351672 1379216 1379217
TreeView+ depends on / blocked
 
Reported: 2016-05-05 09:28 UTC by Kaushal
Modified: 2017-03-27 18:22 UTC (History)
1 user (show)

Fixed In Version: glusterfs-3.9.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1379216 1379217 (view as bug list)
Environment:
Last Closed: 2017-03-27 18:22:15 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Kaushal 2016-05-05 09:28:12 UTC
When using encrypted transport, the RPC clients in certain cases not reconnect.

This happens mainly during the initial connection establishment phase. If a given hostname has multiple resolved addresses or when using multiple volfile servers, if the first address is down, the client will not attempt reconnects with the next addresses.

This happens because in such situations the rpc layer tries to use epoll for the encrypted connections, which don't work together, instead of using own-threads.

Comment 1 Vijay Bellur 2016-05-05 09:32:29 UTC
REVIEW: http://review.gluster.org/14224 (socket: use own threads for all encrypted connections) posted (#1) for review on master by Kaushal M (kaushal@redhat.com)

Comment 2 Vijay Bellur 2016-05-09 06:09:46 UTC
REVIEW: http://review.gluster.org/14224 (socket: use own threads for all encrypted connections) posted (#2) for review on master by Kaushal M (kaushal@redhat.com)

Comment 3 Vijay Bellur 2016-05-09 06:09:49 UTC
REVIEW: http://review.gluster.org/14253 (glusterfsd: explicitly turn on encryption for volfile fetch) posted (#1) for review on master by Kaushal M (kaushal@redhat.com)

Comment 4 Vijay Bellur 2016-05-09 06:09:52 UTC
REVIEW: http://review.gluster.org/14254 (protocol/client: explicitly specify encryption for portmap) posted (#1) for review on master by Kaushal M (kaushal@redhat.com)

Comment 5 Vijay Bellur 2016-05-12 07:11:55 UTC
REVIEW: http://review.gluster.org/14224 (socket: use own threads for all encrypted connections) posted (#3) for review on master by Kaushal M (kaushal@redhat.com)

Comment 6 Vijay Bellur 2016-05-12 07:11:57 UTC
REVIEW: http://review.gluster.org/14254 (protocol/client: explicitly specify encryption for portmap) posted (#2) for review on master by Kaushal M (kaushal@redhat.com)

Comment 7 Vijay Bellur 2016-05-12 07:11:59 UTC
REVIEW: http://review.gluster.org/14253 (glusterfsd: explicitly turn on encryption for volfile fetch) posted (#2) for review on master by Kaushal M (kaushal@redhat.com)

Comment 8 Vijay Bellur 2016-05-13 13:22:00 UTC
COMMIT: http://review.gluster.org/14253 committed in master by Jeff Darcy (jdarcy@redhat.com) 
------
commit 60d235515e582319474ba7231aad490d19240642
Author: Kaushal M <kaushal@redhat.com>
Date:   Thu May 5 14:19:55 2016 +0530

    glusterfsd: explicitly turn on encryption for volfile fetch
    
    Change-Id: I58e1fe7f5edf0abb5732432291ff677e81429b79
    BUG: 1333317
    Signed-off-by: Kaushal M <kaushal@redhat.com>
    Reviewed-on: http://review.gluster.org/14253
    Smoke: Gluster Build System <jenkins@build.gluster.com>
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
    Reviewed-by: Jeff Darcy <jdarcy@redhat.com>

Comment 9 Vijay Bellur 2016-05-16 07:09:15 UTC
REVIEW: http://review.gluster.org/14224 (socket: use own threads for all encrypted connections) posted (#4) for review on master by Kaushal M (kaushal@redhat.com)

Comment 10 Vijay Bellur 2016-05-16 07:09:18 UTC
REVIEW: http://review.gluster.org/14254 (protocol/client: explicitly specify encryption for portmap) posted (#3) for review on master by Kaushal M (kaushal@redhat.com)

Comment 11 Kaushal 2016-06-20 06:52:30 UTC
(In reply to Kaushal from comment #0)
> When using encrypted transport, the RPC clients in certain cases not
> reconnect.
> 
> This happens mainly during the initial connection establishment phase. If a
> given hostname has multiple resolved addresses or when using multiple
> volfile servers, if the first address is down, the client will not attempt
> reconnects with the next addresses.
> 
> This happens because in such situations the rpc layer tries to use epoll for
> the encrypted connections, which don't work together, instead of using
> own-threads.

These reconnection failures are more visible when IPv6 addresses get resolved for a given hostname. IPv6 resolution was enabled in glusterfs-3.8, by having getaddrifo use AF_UNSPEC. Normal configuration for getaddrinfo() gives higher preference to IPv6 addresses in this case. So a rpc_clnt will try to connect to the IPv6 address first. But glusterfs rpcsvc listeners, listen on 0.0.0.0, which only listens on IPv4 addresses. This causes rpc_clnt connections to fail initially. But a reconnection should be triggered, which will use the next returned IPv4 address, and successfully connect. This reconnection happens for non-encrypted connections, but fails for encrypted connections in a similar manner to what was described before.

This issue was reported by Michael Wyraz on the gluster-devel mailing list. [1]

Fixing the reconnection issues, will solve this issue as well. This can also be solved by having rpcsvc listen on IPv6 addresses.

A temporary workaround to this would be give a higher preference to IPv4 addresses by editing /etc/gai.conf and adding the line `precedence ::ffff:0:0/96  100`. Or the /etc/hosts file can be edited to remove the IPv6 address for localhost.

[1]: https://www.gluster.org/pipermail/gluster-devel/2016-June/049833.html

Comment 12 Vijay Bellur 2016-07-07 06:53:12 UTC
REVIEW: http://review.gluster.org/14224 (socket: use own threads for all encrypted connections) posted (#5) for review on master by Kaushal M (kaushal@redhat.com)

Comment 13 Vijay Bellur 2016-07-07 06:53:14 UTC
REVIEW: http://review.gluster.org/14254 (protocol/client: explicitly specify encryption for portmap) posted (#4) for review on master by Kaushal M (kaushal@redhat.com)

Comment 14 Vijay Bellur 2016-07-08 06:19:57 UTC
REVIEW: http://review.gluster.org/14877 (socket: Remove own-thread references from the socket code and use ssl enable         option to use socket_poller code.This commit makes all encrypted connections         use socket_poller, and removes the ablitily to configure encrypted connections to use epoll.         All references to own-thread has been removed from the code.The socket code now requires         all users to explicitly set enable encryption on the RCP clients and servers.         It no longer tries to guess if a connection needs to use encryption or not.) posted (#1) for review on master by MOHIT AGRAWAL (moagrawa@redhat.com)

Comment 15 Vijay Bellur 2016-07-08 08:01:12 UTC
REVIEW: http://review.gluster.org/14224 (socket: use own threads for all encrypted connections) posted (#6) for review on master by Kaushal M (kaushal@redhat.com)

Comment 16 Vijay Bellur 2016-07-08 08:01:14 UTC
REVIEW: http://review.gluster.org/14254 (protocol/client: explicitly specify encryption for portmap) posted (#5) for review on master by Kaushal M (kaushal@redhat.com)

Comment 17 Shyamsundar 2017-03-27 18:22:15 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.9.0, please open a new bug report.

glusterfs-3.9.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2016-November/029281.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.