208596 – DLM posix lock calls hangs in wait_for_completion

Bug 208596 - DLM posix lock calls hangs in wait_for_completion

Summary: DLM posix lock calls hangs in wait_for_completion

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Cluster Suite
Classification:	Retired
Component:	gfs
Sub Component:
Version:	4
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Wendy Cheng
QA Contact:	GFS Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-09-29 15:32 UTC by Wendy Cheng
Modified:	2010-01-12 03:13 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2007-08-28 16:56:30 UTC
Embargoed:

Attachments	(Terms of Use)
the thread backtrace from my test machines that show this hang (1.74 KB, text/plain) 2006-09-29 19:49 UTC, Wendy Cheng	no flags	Details
test case to recreate the hang (4.67 KB, text/plain) 2006-10-03 19:31 UTC, Wendy Cheng	no flags	Details
View All

Description Wendy Cheng 2006-09-29 15:32:55 UTC

Description of problem:

Application threads get stuck in "D" state while doing GFS posix lock 
calls and can't be killed. The following are two threads back traces 
that show the issue (taken from Eric Ayers' live system yesterday):

PID: 355    TASK: 100ebd8b030       CPU: 1   COMMAND: "dcs_event_mgr"
 #0 [101de649c28] schedule at ffffffff8030a1c1
 #1 [101de649d00] wait_for_completion at ffffffff8030a405
 #2 [101de649d80] wait_async at ffffffffa029fb2b
 #3 [101de649da0] wait_async_list at ffffffffa029fc42
 #4 [101de649dd0] lm_dlm_punlock at ffffffffa02a03a6
 #5 [101de649e30] gfs_lm_punlock at ffffffffa0263df2
 #6 [101de649e40] gfs_lock at ffffffffa026edad
 #7 [101de649e70] locks_remove_posix at ffffffff8018e49b
 #8 [101de649e80] wait_on_page_writeback_range at ffffffff80159f49
 #9 [101de649f30] filp_close at ffffffff80178e3c
#10 [101de649f50] sys_close at ffffffff80178ecd
#11 [101de649f80] cstar_do_call at ffffffff80125d33

PID: 32632  TASK: 100dbb1b030       CPU: 1   COMMAND: "dcs_event_mgr"
 #0 [100e749db28] schedule at ffffffff8030a1c1
 #1 [100e749dc00] wait_for_completion at ffffffff8030a405
 #2 [100e749dc80] wait_async at ffffffffa029fb2b
 #3 [100e749dca0] wait_async_list at ffffffffa029fc42
 #4 [100e749dcd0] lm_dlm_plock at ffffffffa02a0fd2
 #5 [100e749de10] gfs_lm_plock at ffffffffa0263db7
 #6 [100e749de20] gfs_lock at ffffffffa026edbf
 #7 [100e749de50] fcntl_setlk at ffffffff8018e244
 #8 [100e749de70] __dentry_open at ffffffff801789ea
 #9 [100e749dea0] filp_open at ffffffff80178b88
#10 [100e749df10] sys_fcntl at ffffffff8018a613
#11 [100e749df40] compat_sys_fcntl64 at ffffffff8019ecad
#12 [100e749df80] cstar_do_call at ffffffff80125d33


Version-Release number of selected component (if applicable):
* RHEL4 U4 x86_64 kernel (2.6.9-42.0.2.ELsmp) 
* gfs/dlm R4.4 errata built by Chris on Sept 26 
http://post-office.corp.redhat.com/archives/cluster-list/2006-September/msg00312.html

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 3 Wendy Cheng 2006-09-29 19:38:38 UTC

I've recreated the hang by putting a 300 usec delay into wait_async() 
before it goes waiting for completion:

static int wait_async_new(dlm_lock_t *lp)
{
        udelay(300);
        wait_for_completion(&lp->uast_wait);
        del_async(lp->posix, lp->posix->resource);
        return lp->lksb.sb_status;
}

Then I alternated wait_async_list() to use old (no-delay) and new (300
usec) wait_async():

static void wait_async_list(struct dlm_resource *r, unsigned long owner)
{
        struct posix_lock *po;
        int error, found;
        static int i;
 
        .....
        spin_unlock(&r->async_spin);
                                                                                
        if (found) {
                DLM_ASSERT(po->lp, );
                i++;
                if (i%2) error = wait_async_new(po->lp);
                else error = wait_async(po->lp);
                DLM_ASSERT(!error, );
                goto restart;
        }

This is to simulate processes get interrupted by various kernel
interrupt events. 

So I would assume GFS *will* lose the wake-up calls. I need sometime to
think about this implication (whether my test is valid) and/or how to fix
this.

Comment 4 Wendy Cheng 2006-09-29 19:49:29 UTC

Created attachment 137433 [details]
the thread backtrace from my test machines that show this hang

Comment 5 Wendy Cheng 2006-09-30 15:31:25 UTC

Beginner luck yesterday - got hang on the first try. The second one 
happened 6 hours later after I rebooted the node. 

After examining more (on my test machine), the hang looks different 
from Eric's. The thread that blocks everyone else (this morning) is:

crash> bt 6080
PID: 6080   TASK: 1006fee87f0       CPU: 0   COMMAND: "fl_64"
 #0 [10064a69c58] schedule at ffffffff8030a1c1
 #1 [10064a69d30] wait_for_completion at ffffffff8030a405
 #2 [10064a69db0] do_dlm_lock_sync at ffffffffa02fea4c
 #3 [10064a69dd0] unlock_resource at ffffffffa03006be
 #4 [10064a69df0] lm_dlm_punlock at ffffffffa0301384
 #5 [10064a69e50] gfs_lm_punlock at ffffffffa02c4df2
 #6 [10064a69e60] gfs_lock at ffffffffa02cfdad
 #7 [10064a69e90] fcntl_setlk at ffffffff8018e244
 #8 [10064a69ea8] thread_return at ffffffff8030a219
 #9 [10064a69f50] sys_fcntl at ffffffff8018a613
#10 [10064a69f80] system_call at ffffffff8011026a

Look like the DLM callback (process_complete and "uast_wait" ) either 
gets lost or never happened.

Also discussed this with base kernel folks yesterday. It is ok for 
complete() to be called before wait_for_complete(). So we don't have a
clue or solution yet.

Comment 6 Wendy Cheng 2006-10-02 05:14:21 UTC

My thread consistently (actually another 10 hours for this to happen) 
hangs in:

PID: 5160   TASK: 1006fde27f0       CPU: 0   COMMAND: "fl_64"
 #0 [100674edb78] schedule at ffffffff8030a1c1
 #1 [100674edc50] wait_for_completion at ffffffff8030a405
 #2 [100674edcd0] do_dlm_lock_sync at ffffffffa02feb98
 #3 [100674edcf0] unlock_resource at ffffffffa030080a
 #4 [100674edd10] lm_dlm_plock at ffffffffa0302106
 #5 [100674edd90] inode_has_perm at ffffffff801d1e3f
 #6 [100674ede50] gfs_lm_plock at ffffffffa02c4db7
 #7 [100674ede60] gfs_lock at ffffffffa02cfdbf
 #8 [100674ede90] fcntl_setlk at ffffffff8018e244
 #9 [100674eded0] sys_select at ffffffff8018bcc5
#10 [100674edf50] sys_fcntl at ffffffff8018a613
#11 [100674edf80] system_call at ffffffff8011026a

The following is the "lp" structure that causes this hang. Every field 
looks ok, except the lp->flags which is (oddly) set as 0x9080 ?? The 
LFL_INLOCK bit had probably been cleaned. So I would expect this request
had been processed: 

lp=1007a78d9c0
          dlm_t *dlm =10037d75a00
          ln_number  =c3830
          ln_type    =17
          char *lvb  =0
          lksb.sb_status=0
          lksb.sb_lkid=13f00c5
          lksb.sb_flags=0
          int16_t cur=5 (DLM_LOCK_EX) req=0 prev_req=0
          lkf  = 4
          type = 0
          flags =9080
          base_mode=0
          uast_wait.done=0
          uast_wait.prev=1007a78da28
          uast_wait.next=1007a78da28

All the completion queue lists in lp->dlm structure are NULL. So now we go 
back to sqare one - the following two puzzles hold the info for this mystery:

1. why the lp->flags = 0x9080 ? 
2. why this line showed up in dlm_debug (and console printk):
   dlm: mygfs: process_lockqueue_reply id 13f00c5 state 0

Comment 7 David Teigland 2006-10-02 19:55:15 UTC

For every file, there's a "plock_update" lock that's held in
EX while doing anything with the dlm locks representing the
actual plocks on the file.  The update lock is locked (or
promoted) to EX in lock_resource() and demoted to NL in
unlock_resource().

This update lock has the INLOCK and NOBAST lp flags set on it
(it's a dlm lock internal to lock_dlm, it doesn't represent anything
to gfs) and we don't need any blocking callbacks for it.  The
WAIT_COMPLETE flag is set when we make a dlm request and cleared
when we get the completion callback for it.

So, it appears that unlock_resource() is demoting the lock from EX
to NL and is waiting for a completion callback for that conversion.

A dump of the dlm locks should show whether or not the dlm has done
the EX->NL conversion.  The process_lockqueue_reply() might be a
clue as to whether or why a completion callback is missed, I'm not
sure.

Comment 8 Wendy Cheng 2006-10-03 04:59:26 UTC

I see - so 0x9080 is actually "LFL_NOBAST + LFL_INLOCK + LFL_WAIT_COMPLETE".
This is exactly what do_dlm_lock_sync() passes to dlm_lock(). So somehow DLM
astd thread never calls back (queue_complete()) ... 

This doesn't look like a GFS issue at all.

Comment 9 Wendy Cheng 2006-10-03 19:31:58 UTC

Created attachment 137689 [details]
test case to recreate the hang

* compile with "gcc -o fl fl.c" 
* run with "fl <filename>", filename should be a gfs file.

Comment 10 Wendy Cheng 2006-10-03 20:58:59 UTC

A quick summary of the problems:
                                     
1) The GFS crash was stopped by:
   * remove EMC powerpath (implies new formatted GFS filesystem)
   * turn off panic-on-oops on x96_64 node
2) The newest problem is application hangs inside the kernel (D state)
   waiting for dlm posix lock operations to complete - un-killable
   until the box gets rebooted. 
                                                                                
The hang recreation has been attempted inside Red Hat lab. One hang 
is found as described in comment #6. The thread is blocked waiting for 
dlm internal synchronization to complete.
                                                                                
Intec has a different hang - it hangs while waiting for dlm asynchronous 
posix locks requests to complete.

Comment 11 Eric Z. Ayers 2006-10-03 21:04:29 UTC

Sorry if I was unclear about this, but before I tried it, I assumed the worst -
that I would have to reformat GFS.  In fact, I did not have to do that and was
able to fsck and remount the exisitng filesystem afer using 'rpm' to remove the
PowerPath installation from our blades.

Comment 12 Wendy Cheng 2006-10-03 21:11:02 UTC

Sorry - I thought you had reformatted GFS. In theory, fsck should be good enough. 
Dave is looking into this issue as well. We'll keep you informed.

Comment 15 Wendy Cheng 2007-08-28 16:56:30 UTC

With current status of development (posix lock module has been re-written 
in RHEL5) and the difficulty of recreating the problem (could not recreate
this in-house), would like to close this bugzilla for now. 

If there is more info (or new leads of how to recreae this problem), we will
re-open.

Note You need to log in before you can comment on or make changes to this bug.