Bug 1037267

Summary: network disconnect/reconnect does not resume data access to server
Product: [Community] GlusterFS Reporter: Anand Avati <aavati>
Component: rpcAssignee: Anand Avati <aavati>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: chrisw, gluster-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.5.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1037274 (view as bug list) Environment:
Last Closed: 2014-04-17 11:51:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1037274    

Description Anand Avati 2013-12-03 03:37:06 UTC
Description of problem:

When the network to a server goes down (pull out network cable, or iptables -j DROP) and comes back up, if the rpc client had witnessed a disconnect then the reconnection to the server is always incomplete

Version-Release number of selected component (if applicable):


How reproducible:

always


Steps to Reproduce:
0. Create a distribute volume, mount a client, run df and note size
1. On one server, run iptables -I INPUT -p tcp --dport 111:65535 -j DROP
2. Run df on client and wait for size to decrease
3. iptables -F to "resume" network,

Actual results:

df never shows original size even after waiting however long and retrying many times

Expected results:

df should show original size indicating connectivity to the server which was down

Additional info:

Logs have the line "Server and Client lk-version numbers are same, no need to reopen the fds"

Comment 1 Anand Avati 2013-12-03 03:38:30 UTC
REVIEW: http://review.gluster.org/6396 (protocol/client: handle network disconnect/reconnect properly) posted (#1) for review on master by Anand Avati (avati)

Comment 2 Anand Avati 2013-12-03 04:00:54 UTC
REVIEW: http://review.gluster.org/6397 (protocol/client: handle network disconnect/reconnect properly) posted (#1) for review on release-3.4 by Anand Avati (avati)

Comment 3 Anand Avati 2013-12-03 04:01:22 UTC
REVIEW: http://review.gluster.org/6398 (protocol/client: handle network disconnect/reconnect properly) posted (#1) for review on release-3.5 by Anand Avati (avati)

Comment 4 Anand Avati 2013-12-03 09:50:24 UTC
COMMIT: http://review.gluster.org/6396 committed in master by Vijay Bellur (vbellur) 
------
commit ed31918c2cf80d6c875e0b31eff4ab634d9375f2
Author: Anand Avati <avati>
Date:   Tue Nov 26 19:38:01 2013 -0800

    protocol/client: handle network disconnect/reconnect properly
    
    if client/server state versions match, we still need to notify
    parent xlators of reconnection (CHILD_UP) because they were
    notified of CHILD_DOWN at the time of disconnection.
    
    Change-Id: I36c4bde6d8c3db9cb0c48eeb10663b56897c932e
    BUG: 1037267
    Signed-off-by: Anand Avati <avati>
    Reviewed-on: http://review.gluster.org/6396
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Krishnan Parthasarathi <kparthas>

Comment 5 Anand Avati 2013-12-03 19:35:33 UTC
COMMIT: http://review.gluster.org/6397 committed in release-3.4 by Anand Avati (avati) 
------
commit 07c8960d0c744709492148df71540e57f96d52e9
Author: Anand Avati <avati>
Date:   Tue Nov 26 19:38:01 2013 -0800

    protocol/client: handle network disconnect/reconnect properly
    
    if client/server state versions match, we still need to notify
    parent xlators of reconnection (CHILD_UP) because they were
    notified of CHILD_DOWN at the time of disconnection.
    
    Change-Id: I36c4bde6d8c3db9cb0c48eeb10663b56897c932e
    BUG: 1037267
    Signed-off-by: Anand Avati <avati>
    Reviewed-on: http://review.gluster.org/6397
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Krishnan Parthasarathi <kparthas>

Comment 6 Anand Avati 2013-12-03 19:35:41 UTC
COMMIT: http://review.gluster.org/6398 committed in release-3.5 by Anand Avati (avati) 
------
commit 13a4830e9c913c9a24c6b69cd300b80302a49b65
Author: Anand Avati <avati>
Date:   Tue Nov 26 19:38:01 2013 -0800

    protocol/client: handle network disconnect/reconnect properly
    
    if client/server state versions match, we still need to notify
    parent xlators of reconnection (CHILD_UP) because they were
    notified of CHILD_DOWN at the time of disconnection.
    
    Change-Id: I36c4bde6d8c3db9cb0c48eeb10663b56897c932e
    BUG: 1037267
    Signed-off-by: Anand Avati <avati>
    Reviewed-on: http://review.gluster.org/6398
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Krishnan Parthasarathi <kparthas>

Comment 7 Niels de Vos 2014-04-17 11:51:46 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.5.0, please reopen this bug report.

glusterfs-3.5.0 has been announced on the Gluster Developers mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6137
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user