Bug 762004 (GLUSTER-272)

Summary: Server backend storage hang should not cause the mount point to hang
Product: [Community] GlusterFS Reporter: Krishna Srinivas <krishna>
Component: posixAssignee: Amar Tumballi <amarts>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: low Docs Contact:
Priority: low    
Version: mainlineCC: aavati, amarts, fharshav, gluster-bugs, shehjart, vijay, vraman
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Krishna Srinivas 2009-09-17 21:27:33 UTC
Typically when the backend FS hangs (say during kernel soft lockups) the mount point will hang for processes accessing files from that backend.

GlusterFS should recover from such a situation and not allow apps to hang at all.

Comment 1 Anand Avati 2010-01-23 07:58:34 UTC
PATCH: http://patches.gluster.com/patch/2676 in master (core: fix initialization of disjoint xlator graph)

Comment 2 Anand Avati 2010-01-23 07:58:37 UTC
PATCH: http://patches.gluster.com/patch/2677 in master (Server backend storage hang should not cause the mount point to hang.)

Comment 3 Anand Avati 2010-01-23 07:58:41 UTC
PATCH: http://patches.gluster.com/patch/2678 in master (protocol/server: cleanup whitespaces)

Comment 4 Anand Avati 2010-01-26 12:23:18 UTC
PATCH: http://patches.gluster.com/patch/2708 in master (Revert "Server backend storage hang should not cause the mount point to hang.")

Comment 5 Anand Avati 2010-01-26 12:23:22 UTC
PATCH: http://patches.gluster.com/patch/2707 in release-3.0 (Revert "Server backend storage hang should not cause the mount point to hang.")

Comment 6 Amar Tumballi 2010-03-09 08:30:49 UTC
We are evaluating the need to bring this in FS, instead we can write some scripts to take care of this issue.

Comment 7 Amar Tumballi 2010-07-27 08:52:56 UTC
Its hard to handle this within filesystem code, instead, the below script by Sacchi <sac> fixes the issue. Will be adding this as extras/ in the codebase, and closing the bug.

------------------------------------------
#!/bin/sh                                                                                                                    

# This script keeps checking in the background if the filesystem is alive. 
# If the filesystem does respond then kill the glusterfsd daemon using the
# pid file generated. This test is performed by running `df' continuously 
# on the backend.

EXPORTDIR=/mnt/dist-1
PIDFILE=/var/run/glusterfsd.pid
INTERVAL=30

while :
do
    # Test if glusterfs is running, before doing anything. If glusterfs is 
    # not running we don't have to do anything.

    test -f $PIDFILE || continue

    df $EXPORTDIR &
    DF_PID=$!
    sleep $INTERVAL

    if ps ae | egrep "${DF_PID}.*df" | grep -v grep
    then
        # df is hung, kill glusterfs process and remove the pid file
        kill -9 $(cat $PIDFILE)
        if [ $? -eq 0 ]; then rm -f $PIDFILE; fi
    fi
done
-------------------------------------