Bug 762004 (GLUSTER-272) - Server backend storage hang should not cause the mount point to hang
Summary: Server backend storage hang should not cause the mount point to hang
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-272
Product: GlusterFS
Classification: Community
Component: posix
Version: mainline
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: Amar Tumballi
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-09-17 21:27 UTC by Krishna Srinivas
Modified: 2015-12-01 16:45 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Krishna Srinivas 2009-09-17 21:27:33 UTC
Typically when the backend FS hangs (say during kernel soft lockups) the mount point will hang for processes accessing files from that backend.

GlusterFS should recover from such a situation and not allow apps to hang at all.

Comment 1 Anand Avati 2010-01-23 07:58:34 UTC
PATCH: http://patches.gluster.com/patch/2676 in master (core: fix initialization of disjoint xlator graph)

Comment 2 Anand Avati 2010-01-23 07:58:37 UTC
PATCH: http://patches.gluster.com/patch/2677 in master (Server backend storage hang should not cause the mount point to hang.)

Comment 3 Anand Avati 2010-01-23 07:58:41 UTC
PATCH: http://patches.gluster.com/patch/2678 in master (protocol/server: cleanup whitespaces)

Comment 4 Anand Avati 2010-01-26 12:23:18 UTC
PATCH: http://patches.gluster.com/patch/2708 in master (Revert "Server backend storage hang should not cause the mount point to hang.")

Comment 5 Anand Avati 2010-01-26 12:23:22 UTC
PATCH: http://patches.gluster.com/patch/2707 in release-3.0 (Revert "Server backend storage hang should not cause the mount point to hang.")

Comment 6 Amar Tumballi 2010-03-09 08:30:49 UTC
We are evaluating the need to bring this in FS, instead we can write some scripts to take care of this issue.

Comment 7 Amar Tumballi 2010-07-27 08:52:56 UTC
Its hard to handle this within filesystem code, instead, the below script by Sacchi <sac> fixes the issue. Will be adding this as extras/ in the codebase, and closing the bug.

------------------------------------------
#!/bin/sh                                                                                                                    

# This script keeps checking in the background if the filesystem is alive. 
# If the filesystem does respond then kill the glusterfsd daemon using the
# pid file generated. This test is performed by running `df' continuously 
# on the backend.

EXPORTDIR=/mnt/dist-1
PIDFILE=/var/run/glusterfsd.pid
INTERVAL=30

while :
do
    # Test if glusterfs is running, before doing anything. If glusterfs is 
    # not running we don't have to do anything.

    test -f $PIDFILE || continue

    df $EXPORTDIR &
    DF_PID=$!
    sleep $INTERVAL

    if ps ae | egrep "${DF_PID}.*df" | grep -v grep
    then
        # df is hung, kill glusterfs process and remove the pid file
        kill -9 $(cat $PIDFILE)
        if [ $? -eq 0 ]; then rm -f $PIDFILE; fi
    fi
done
-------------------------------------


Note You need to log in before you can comment on or make changes to this bug.