Bug 1312845 - Protocol server/client handshake gap
Summary: Protocol server/client handshake gap
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: protocol
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Avra Sengupta
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-02-29 11:32 UTC by Avra Sengupta
Modified: 2016-06-16 13:58 UTC (History)
1 user (show)

Fixed In Version: glusterfs-3.8rc2
Clone Of:
Environment:
Last Closed: 2016-06-16 13:58:44 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Avra Sengupta 2016-02-29 11:32:33 UTC
Description of problem:
Currently on a successful connection between protocol server and client, the protocol client initiates a CHILD_UP event in the client stack. At this point in time, only the connection between server and client is established, and there is no guarantee that the server side stack is ready to serve requests.
    
It works fine now, as most server side translators are not dependent on any other factors, before being able to serve requests today and hence they are up by the time the client stack translators receive the CHILD_UP (initiated by client handshake).
    
The gap here is exposed when certain server side translators like NSR-Server for example, have a couple of protocol clients as their child(connecting them to other bricks), and they can't really serve requests till a quorum of their children are up. Hence these translators should defer sending CHILD_UP till they have enough children up, and the same needs to be propagated to the client stack translators.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Vijay Bellur 2016-02-29 11:49:47 UTC
REVIEW: http://review.gluster.org/13549 (protocol client/server: Fix client-server handshake) posted (#1) for review on master by Avra Sengupta (asengupt)

Comment 2 Vijay Bellur 2016-03-02 08:12:48 UTC
REVIEW: http://review.gluster.org/13549 (protocol client/server: Fix client-server handshake) posted (#2) for review on master by Avra Sengupta (asengupt)

Comment 3 Vijay Bellur 2016-03-07 09:15:52 UTC
REVIEW: http://review.gluster.org/13549 (protocol client/server: Fix client-server handshake) posted (#3) for review on master by Avra Sengupta (asengupt)

Comment 4 Vijay Bellur 2016-03-10 17:24:39 UTC
COMMIT: http://review.gluster.org/13549 committed in master by Jeff Darcy (jdarcy) 
------
commit 2bfdc30e0e7fba6f97d8829b2618a1c5907dc404
Author: Avra Sengupta <asengupt>
Date:   Mon Feb 29 14:43:58 2016 +0530

    protocol client/server: Fix client-server handshake
    
    Problem:
    Currently on a successful connection between protocol
    server and client, the protocol client initiates a
    CHILD_UP event in the client stack. At this point in
    time, only the connection between server and client is
    established, and there is no guarantee that the server
    side stack is ready to serve requests.
    
    It works fine now, as most server side translators are
    not dependent on any other factors, before being able
    to serve requests today and hence they are up by the time
    the client stack translators receive the CHILD_UP (initiated
    by client handshake).
    
    The gap here is exposed when certain server side translators
    like NSR-Server for example, have a couple of protocol clients
    as their child(connecting them to other bricks), and they
    can't really serve requests till a quorum of their children are
    up. Hence these translators should defer sending CHILD_UP
    till they have enough children up, and the same needs to be
    propagated to the client stack translators.
    
    Fix:
    Maintain a child_up variable in both the protocol client
    and protocol server translators. The protocol server should
    update this value based on the CHILD_UP and CHILD_DOWN
    events it receives from the translators below it. On receiving
    such an event it should forward that event to the client.
    The protocol client on receiving such an event should forward
    it up the client stack, thereby letting the client translators
    correctly know that the server is up and ready to serve.
    
    The clients connecting later(long after a server has initialized
    and processed it's CHILD_UP events), will receive a child_up status
    as part of the handshake, and based on the status of the server's
    child_up, can either propagate a CHILD_UP event or defer it.
    
    Change-Id: I0807141e62118d8de9d9cde57a53a607be44a0e0
    BUG: 1312845
    Signed-off-by: Avra Sengupta <asengupt>
    Reviewed-on: http://review.gluster.org/13549
    Smoke: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>
    Reviewed-by: Jeff Darcy <jdarcy>

Comment 5 Niels de Vos 2016-06-16 13:58:44 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.