Bug 1379217 - rpc_clnt will sometimes not reconnect when using encryption
Summary: rpc_clnt will sometimes not reconnect when using encryption
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: GlusterFS
Classification: Community
Component: rpc
Version: 3.9
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Mohit Agrawal
QA Contact:
URL:
Whiteboard:
Depends On: 1333317
Blocks: 1351672 1379216
TreeView+ depends on / blocked
 
Reported: 2016-09-26 05:08 UTC by Mohit Agrawal
Modified: 2016-09-26 05:17 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1333317
Environment:
Last Closed: 2016-09-26 05:17:39 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Mohit Agrawal 2016-09-26 05:08:02 UTC
+++ This bug was initially created as a clone of Bug #1333317 +++

When using encrypted transport, the RPC clients in certain cases not reconnect.

This happens mainly during the initial connection establishment phase. If a given hostname has multiple resolved addresses or when using multiple volfile servers, if the first address is down, the client will not attempt reconnects with the next addresses.

This happens because in such situations the rpc layer tries to use epoll for the encrypted connections, which don't work together, instead of using own-threads.

--- Additional comment from Vijay Bellur on 2016-05-05 05:32:29 EDT ---

REVIEW: http://review.gluster.org/14224 (socket: use own threads for all encrypted connections) posted (#1) for review on master by Kaushal M (kaushal@redhat.com)

--- Additional comment from Vijay Bellur on 2016-05-09 02:09:46 EDT ---

REVIEW: http://review.gluster.org/14224 (socket: use own threads for all encrypted connections) posted (#2) for review on master by Kaushal M (kaushal@redhat.com)

--- Additional comment from Vijay Bellur on 2016-05-09 02:09:49 EDT ---

REVIEW: http://review.gluster.org/14253 (glusterfsd: explicitly turn on encryption for volfile fetch) posted (#1) for review on master by Kaushal M (kaushal@redhat.com)

--- Additional comment from Vijay Bellur on 2016-05-09 02:09:52 EDT ---

REVIEW: http://review.gluster.org/14254 (protocol/client: explicitly specify encryption for portmap) posted (#1) for review on master by Kaushal M (kaushal@redhat.com)

--- Additional comment from Vijay Bellur on 2016-05-12 03:11:55 EDT ---

REVIEW: http://review.gluster.org/14224 (socket: use own threads for all encrypted connections) posted (#3) for review on master by Kaushal M (kaushal@redhat.com)

--- Additional comment from Vijay Bellur on 2016-05-12 03:11:57 EDT ---

REVIEW: http://review.gluster.org/14254 (protocol/client: explicitly specify encryption for portmap) posted (#2) for review on master by Kaushal M (kaushal@redhat.com)

--- Additional comment from Vijay Bellur on 2016-05-12 03:11:59 EDT ---

REVIEW: http://review.gluster.org/14253 (glusterfsd: explicitly turn on encryption for volfile fetch) posted (#2) for review on master by Kaushal M (kaushal@redhat.com)

--- Additional comment from Vijay Bellur on 2016-05-13 09:22:00 EDT ---

COMMIT: http://review.gluster.org/14253 committed in master by Jeff Darcy (jdarcy@redhat.com) 
------
commit 60d235515e582319474ba7231aad490d19240642
Author: Kaushal M <kaushal@redhat.com>
Date:   Thu May 5 14:19:55 2016 +0530

    glusterfsd: explicitly turn on encryption for volfile fetch
    
    Change-Id: I58e1fe7f5edf0abb5732432291ff677e81429b79
    BUG: 1333317
    Signed-off-by: Kaushal M <kaushal@redhat.com>
    Reviewed-on: http://review.gluster.org/14253
    Smoke: Gluster Build System <jenkins@build.gluster.com>
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
    Reviewed-by: Jeff Darcy <jdarcy@redhat.com>

--- Additional comment from Vijay Bellur on 2016-05-16 03:09:15 EDT ---

REVIEW: http://review.gluster.org/14224 (socket: use own threads for all encrypted connections) posted (#4) for review on master by Kaushal M (kaushal@redhat.com)

--- Additional comment from Vijay Bellur on 2016-05-16 03:09:18 EDT ---

REVIEW: http://review.gluster.org/14254 (protocol/client: explicitly specify encryption for portmap) posted (#3) for review on master by Kaushal M (kaushal@redhat.com)

--- Additional comment from Kaushal on 2016-06-20 02:52:30 EDT ---

(In reply to Kaushal from comment #0)
> When using encrypted transport, the RPC clients in certain cases not
> reconnect.
> 
> This happens mainly during the initial connection establishment phase. If a
> given hostname has multiple resolved addresses or when using multiple
> volfile servers, if the first address is down, the client will not attempt
> reconnects with the next addresses.
> 
> This happens because in such situations the rpc layer tries to use epoll for
> the encrypted connections, which don't work together, instead of using
> own-threads.

These reconnection failures are more visible when IPv6 addresses get resolved for a given hostname. IPv6 resolution was enabled in glusterfs-3.8, by having getaddrifo use AF_UNSPEC. Normal configuration for getaddrinfo() gives higher preference to IPv6 addresses in this case. So a rpc_clnt will try to connect to the IPv6 address first. But glusterfs rpcsvc listeners, listen on 0.0.0.0, which only listens on IPv4 addresses. This causes rpc_clnt connections to fail initially. But a reconnection should be triggered, which will use the next returned IPv4 address, and successfully connect. This reconnection happens for non-encrypted connections, but fails for encrypted connections in a similar manner to what was described before.

This issue was reported by Michael Wyraz on the gluster-devel mailing list. [1]

Fixing the reconnection issues, will solve this issue as well. This can also be solved by having rpcsvc listen on IPv6 addresses.

A temporary workaround to this would be give a higher preference to IPv4 addresses by editing /etc/gai.conf and adding the line `precedence ::ffff:0:0/96  100`. Or the /etc/hosts file can be edited to remove the IPv6 address for localhost.

[1]: https://www.gluster.org/pipermail/gluster-devel/2016-June/049833.html

--- Additional comment from Vijay Bellur on 2016-07-07 02:53:12 EDT ---

REVIEW: http://review.gluster.org/14224 (socket: use own threads for all encrypted connections) posted (#5) for review on master by Kaushal M (kaushal@redhat.com)

--- Additional comment from Vijay Bellur on 2016-07-07 02:53:14 EDT ---

REVIEW: http://review.gluster.org/14254 (protocol/client: explicitly specify encryption for portmap) posted (#4) for review on master by Kaushal M (kaushal@redhat.com)

--- Additional comment from Vijay Bellur on 2016-07-08 02:19:57 EDT ---

REVIEW: http://review.gluster.org/14877 (socket: Remove own-thread references from the socket code and use ssl enable         option to use socket_poller code.This commit makes all encrypted connections         use socket_poller, and removes the ablitily to configure encrypted connections to use epoll.         All references to own-thread has been removed from the code.The socket code now requires         all users to explicitly set enable encryption on the RCP clients and servers.         It no longer tries to guess if a connection needs to use encryption or not.) posted (#1) for review on master by MOHIT AGRAWAL (moagrawa@redhat.com)

--- Additional comment from Vijay Bellur on 2016-07-08 04:01:12 EDT ---

REVIEW: http://review.gluster.org/14224 (socket: use own threads for all encrypted connections) posted (#6) for review on master by Kaushal M (kaushal@redhat.com)

--- Additional comment from Vijay Bellur on 2016-07-08 04:01:14 EDT ---

REVIEW: http://review.gluster.org/14254 (protocol/client: explicitly specify encryption for portmap) posted (#5) for review on master by Kaushal M (kaushal@redhat.com)

Comment 1 Mohit Agrawal 2016-09-26 05:17:39 UTC
Fix is already merged in 3.9 release.


Note You need to log in before you can comment on or make changes to this bug.