Bug 1379216 - rpc_clnt will sometimes not reconnect when using encryption
Summary: rpc_clnt will sometimes not reconnect when using encryption
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: rpc
Version: 3.8
Hardware: x86_64
OS: Unspecified
medium
medium
Target Milestone: ---
Assignee: Mohit Agrawal
QA Contact:
URL:
Whiteboard:
Depends On: 1333317 1379217
Blocks: 1351672
TreeView+ depends on / blocked
 
Reported: 2016-09-26 05:06 UTC by Mohit Agrawal
Modified: 2016-10-20 14:03 UTC (History)
2 users (show)

Fixed In Version: glusterfs-3.8.5
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1333317
Environment:
Last Closed: 2016-10-20 14:03:22 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Mohit Agrawal 2016-09-26 05:06:51 UTC
+++ This bug was initially created as a clone of Bug #1333317 +++

When using encrypted transport, the RPC clients in certain cases not reconnect.

This happens mainly during the initial connection establishment phase. If a given hostname has multiple resolved addresses or when using multiple volfile servers, if the first address is down, the client will not attempt reconnects with the next addresses.

This happens because in such situations the rpc layer tries to use epoll for the encrypted connections, which don't work together, instead of using own-threads.

--- Additional comment from Vijay Bellur on 2016-05-05 05:32:29 EDT ---

REVIEW: http://review.gluster.org/14224 (socket: use own threads for all encrypted connections) posted (#1) for review on master by Kaushal M (kaushal@redhat.com)

--- Additional comment from Vijay Bellur on 2016-05-09 02:09:46 EDT ---

REVIEW: http://review.gluster.org/14224 (socket: use own threads for all encrypted connections) posted (#2) for review on master by Kaushal M (kaushal@redhat.com)

--- Additional comment from Vijay Bellur on 2016-05-09 02:09:49 EDT ---

REVIEW: http://review.gluster.org/14253 (glusterfsd: explicitly turn on encryption for volfile fetch) posted (#1) for review on master by Kaushal M (kaushal@redhat.com)

--- Additional comment from Vijay Bellur on 2016-05-09 02:09:52 EDT ---

REVIEW: http://review.gluster.org/14254 (protocol/client: explicitly specify encryption for portmap) posted (#1) for review on master by Kaushal M (kaushal@redhat.com)

--- Additional comment from Vijay Bellur on 2016-05-12 03:11:55 EDT ---

REVIEW: http://review.gluster.org/14224 (socket: use own threads for all encrypted connections) posted (#3) for review on master by Kaushal M (kaushal@redhat.com)

--- Additional comment from Vijay Bellur on 2016-05-12 03:11:57 EDT ---

REVIEW: http://review.gluster.org/14254 (protocol/client: explicitly specify encryption for portmap) posted (#2) for review on master by Kaushal M (kaushal@redhat.com)

--- Additional comment from Vijay Bellur on 2016-05-12 03:11:59 EDT ---

REVIEW: http://review.gluster.org/14253 (glusterfsd: explicitly turn on encryption for volfile fetch) posted (#2) for review on master by Kaushal M (kaushal@redhat.com)

--- Additional comment from Vijay Bellur on 2016-05-13 09:22:00 EDT ---

COMMIT: http://review.gluster.org/14253 committed in master by Jeff Darcy (jdarcy@redhat.com) 
------
commit 60d235515e582319474ba7231aad490d19240642
Author: Kaushal M <kaushal@redhat.com>
Date:   Thu May 5 14:19:55 2016 +0530

    glusterfsd: explicitly turn on encryption for volfile fetch
    
    Change-Id: I58e1fe7f5edf0abb5732432291ff677e81429b79
    BUG: 1333317
    Signed-off-by: Kaushal M <kaushal@redhat.com>
    Reviewed-on: http://review.gluster.org/14253
    Smoke: Gluster Build System <jenkins@build.gluster.com>
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
    Reviewed-by: Jeff Darcy <jdarcy@redhat.com>

--- Additional comment from Vijay Bellur on 2016-05-16 03:09:15 EDT ---

REVIEW: http://review.gluster.org/14224 (socket: use own threads for all encrypted connections) posted (#4) for review on master by Kaushal M (kaushal@redhat.com)

--- Additional comment from Vijay Bellur on 2016-05-16 03:09:18 EDT ---

REVIEW: http://review.gluster.org/14254 (protocol/client: explicitly specify encryption for portmap) posted (#3) for review on master by Kaushal M (kaushal@redhat.com)

--- Additional comment from Kaushal on 2016-06-20 02:52:30 EDT ---

(In reply to Kaushal from comment #0)
> When using encrypted transport, the RPC clients in certain cases not
> reconnect.
> 
> This happens mainly during the initial connection establishment phase. If a
> given hostname has multiple resolved addresses or when using multiple
> volfile servers, if the first address is down, the client will not attempt
> reconnects with the next addresses.
> 
> This happens because in such situations the rpc layer tries to use epoll for
> the encrypted connections, which don't work together, instead of using
> own-threads.

These reconnection failures are more visible when IPv6 addresses get resolved for a given hostname. IPv6 resolution was enabled in glusterfs-3.8, by having getaddrifo use AF_UNSPEC. Normal configuration for getaddrinfo() gives higher preference to IPv6 addresses in this case. So a rpc_clnt will try to connect to the IPv6 address first. But glusterfs rpcsvc listeners, listen on 0.0.0.0, which only listens on IPv4 addresses. This causes rpc_clnt connections to fail initially. But a reconnection should be triggered, which will use the next returned IPv4 address, and successfully connect. This reconnection happens for non-encrypted connections, but fails for encrypted connections in a similar manner to what was described before.

This issue was reported by Michael Wyraz on the gluster-devel mailing list. [1]

Fixing the reconnection issues, will solve this issue as well. This can also be solved by having rpcsvc listen on IPv6 addresses.

A temporary workaround to this would be give a higher preference to IPv4 addresses by editing /etc/gai.conf and adding the line `precedence ::ffff:0:0/96  100`. Or the /etc/hosts file can be edited to remove the IPv6 address for localhost.

[1]: https://www.gluster.org/pipermail/gluster-devel/2016-June/049833.html

--- Additional comment from Vijay Bellur on 2016-07-07 02:53:12 EDT ---

REVIEW: http://review.gluster.org/14224 (socket: use own threads for all encrypted connections) posted (#5) for review on master by Kaushal M (kaushal@redhat.com)

--- Additional comment from Vijay Bellur on 2016-07-07 02:53:14 EDT ---

REVIEW: http://review.gluster.org/14254 (protocol/client: explicitly specify encryption for portmap) posted (#4) for review on master by Kaushal M (kaushal@redhat.com)

--- Additional comment from Vijay Bellur on 2016-07-08 02:19:57 EDT ---

REVIEW: http://review.gluster.org/14877 (socket: Remove own-thread references from the socket code and use ssl enable         option to use socket_poller code.This commit makes all encrypted connections         use socket_poller, and removes the ablitily to configure encrypted connections to use epoll.         All references to own-thread has been removed from the code.The socket code now requires         all users to explicitly set enable encryption on the RCP clients and servers.         It no longer tries to guess if a connection needs to use encryption or not.) posted (#1) for review on master by MOHIT AGRAWAL (moagrawa@redhat.com)

--- Additional comment from Vijay Bellur on 2016-07-08 04:01:12 EDT ---

REVIEW: http://review.gluster.org/14224 (socket: use own threads for all encrypted connections) posted (#6) for review on master by Kaushal M (kaushal@redhat.com)

--- Additional comment from Vijay Bellur on 2016-07-08 04:01:14 EDT ---

REVIEW: http://review.gluster.org/14254 (protocol/client: explicitly specify encryption for portmap) posted (#5) for review on master by Kaushal M (kaushal@redhat.com)

Comment 1 Worker Ant 2016-09-26 05:10:42 UTC
REVIEW: http://review.gluster.org/15567 (glusterfsd: explicitly turn on encryption for volfile fetch) posted (#1) for review on release-3.8 by MOHIT AGRAWAL (moagrawa@redhat.com)

Comment 2 Worker Ant 2016-09-26 05:25:27 UTC
REVIEW: http://review.gluster.org/15567 (glusterfsd: explicitly turn on encryption for volfile fetch) posted (#2) for review on release-3.8 by MOHIT AGRAWAL (moagrawa@redhat.com)

Comment 3 Worker Ant 2016-09-28 06:59:24 UTC
REVIEW: http://review.gluster.org/15567 (glusterfsd: explicitly turn on encryption for volfile fetch Problem: In case of encrypted transport RPC clients not able to          reconnect.due to this daemon(glustershd etc) not able to          fetch volfile and not started.) posted (#3) for review on release-3.8 by MOHIT AGRAWAL (moagrawa@redhat.com)

Comment 4 Worker Ant 2016-09-28 07:19:33 UTC
REVIEW: http://review.gluster.org/15567 (glusterfsd: explicitly turn on encryption for volfile fetch) posted (#4) for review on release-3.8 by MOHIT AGRAWAL (moagrawa@redhat.com)

Comment 5 Worker Ant 2016-09-28 14:55:38 UTC
COMMIT: http://review.gluster.org/15567 committed in release-3.8 by Niels de Vos (ndevos@redhat.com) 
------
commit 7ade01fc75e35eede7071acb4381f5580102e6c2
Author: Kaushal M <kaushal@redhat.com>
Date:   Thu May 5 14:19:55 2016 +0530

    glusterfsd: explicitly turn on encryption for volfile fetch
    
    Problem: In case of encrypted transport RPC clients not able to
             reconnect.due to this daemon(glustershd etc) not able to
             fetch volfile and not started.
    
    Solution: After turn on encryption explictly to fetch volfile
              issue is resolved.
    
    > Change-Id: I58e1fe7f5edf0abb5732432291ff677e81429b79
    > BUG: 1333317
    > Signed-off-by: Kaushal M <kaushal@redhat.com>
    > Reviewed-on: http://review.gluster.org/14253
    > Smoke: Gluster Build System <jenkins@build.gluster.com>
    > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    > CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
    > Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
    > (cherry picked from commit 60d235515e582319474ba7231aad490d19240642)
    
    Change-Id: I15193837dc692b0cd7df942843bcf27a1c47e695
    BUG: 1379216
    Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
    Reviewed-on: http://review.gluster.org/15567
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    Reviewed-by: Niels de Vos <ndevos@redhat.com>
    Smoke: Gluster Build System <jenkins@build.gluster.org>

Comment 6 Niels de Vos 2016-10-20 14:03:22 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.5, please open a new bug report.

glusterfs-3.8.5 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/announce/2016-October/000061.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.