Bug 1313206 - Encrypted rpc clients do not reconnect sometimes
Encrypted rpc clients do not reconnect sometimes
Product: GlusterFS
Classification: Community
Component: rpc (Show other bugs)
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Kaushal
Depends On:
Blocks: 1310740 1314641
  Show dependency treegraph
Reported: 2016-03-01 03:14 EST by Kaushal
Modified: 2016-06-16 09:58 EDT (History)
2 users (show)

See Also:
Fixed In Version: glusterfs-3.8rc2
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1314641 (view as bug list)
Last Closed: 2016-06-16 09:58:44 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Kaushal 2016-03-01 03:14:38 EST
When encryption is enabled on an rpc client, it can sometimes fail to reconnect.

This happens because, on the first reconnect attempt the underlying transport is freed if connect fails. Further reconnect attempts will not happen as the transport doesn't exist anymore.

This can lead to clusters showing inconsistent peer status information, and not performing correctly when management encryption is enabled.
Comment 1 Vijay Bellur 2016-03-01 03:16:34 EST
REVIEW: http://review.gluster.org/13554 (socket: Launch socket_poller only if connect succeeded) posted (#1) for review on master by Kaushal M (kaushal@redhat.com)
Comment 2 Vijay Bellur 2016-03-02 10:40:40 EST
REVIEW: http://review.gluster.org/13554 (socket: Launch socket_poller only if connect succeeded) posted (#2) for review on master by Kaushal M (kaushal@redhat.com)
Comment 3 Vijay Bellur 2016-03-03 23:42:19 EST
COMMIT: http://review.gluster.org/13554 committed in master by Raghavendra G (rgowdapp@redhat.com) 
commit d117466422b2fe97390b9ccc7a3c277e7a64285a
Author: Kaushal M <kaushal@redhat.com>
Date:   Tue Mar 1 13:04:03 2016 +0530

    socket: Launch socket_poller only if connect succeeded
    For an encrypted connection, sockect_connect() used to launch
    socket_poller() in it's own thread (ON by default), even if the connect
    failed. This would cause two unrefs to be done on the transport, once in
    socket_poller() and once in socket_connect(), causing the transport to
    be freed and cleaned up. This would cause further reconnect attempts
    from failing as the transport wouldn't be available.
    By starting socket_poller() only if connect succeeded, this is avoided.
    Change-Id: Ie22090dbb1833bdd0f636a76cb3935d766711917
    BUG: 1313206
    Signed-off-by: Kaushal M <kaushal@redhat.com>
    Reviewed-on: http://review.gluster.org/13554
    Smoke: Gluster Build System <jenkins@build.gluster.com>
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
    Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Comment 4 Niels de Vos 2016-06-16 09:58:44 EDT
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.