Bug 147798 - make_panic hangs for several minutes
make_panic hangs for several minutes
Status: CLOSED CURRENTRELEASE
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: dlm (Show other bugs)
4
All Linux
medium Severity medium
: ---
: ---
Assigned To: David Teigland
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-02-11 09:22 EST by David Teigland
Modified: 2009-04-16 16:30 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-02-15 23:07:35 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description David Teigland 2005-02-11 09:22:08 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5)
Gecko/20041111 Firefox/1.0

Description of problem:
I run "make_panic -f3" on a seven node cluster.  After running
for some time, the make_panic processes on all machines will
stop.  A lock dump shows they are all waiting on a single lock
that GFS is caching on some node but isn't releasing.  It appears
that a blocking callback isn't being delivered to GFS for this
lock.  After several minutes, gfs releases locks it's not
using which allows all the processes to get this lock and they
all run fine again until it happens again.

There are several cases in process_asts() where we decide to
skip sending a bast.  One of those may be incorrect; some
debugging should tell.


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. make_panic -f 3 on several nodes
2.
3.
    

Additional info:
Comment 1 David Teigland 2005-02-15 23:07:35 EST
Changes by:     teigland@sourceware.org 2005-02-16 03:53:16

Modified files:
        dlm-kernel/src : lockqueue.c

Log message:
        Blocking asts were being ignored for all locks being 
converted which resulted in some necessary basts being skipped.  In
particular, after a failed NOQUEUE conversion, gfs could be left
holding a lock and getting no callback for it while others were left
waiting.   

        This changes things so that a bast message is ignored if the
lock is being converted and NOQUEUE isn't set, or if the locks is
being unlocked.  Fixes bz 147798.


Changes by:     teigland@sourceware.org 2005-02-16 04:03:50

Modified files:
        gfs-kernel/src/dlm: lock.c

Log message:
        We were ignoring blocking callbacks for locks being converted
        which caused us to skip some that were necessary.
        Fixes bz 147798 (along with similar dlm fix)

Note You need to log in before you can comment on or make changes to this bug.