Bug 144364

Summary: dlm stuck while processing locks/releasing lockspace
Product: [Retired] Red Hat Cluster Suite Reporter: Dean Jansa <djansa>
Component: dlmAssignee: David Teigland <teigland>
Status: CLOSED NEXTRELEASE QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: cluster-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-04-13 15:47:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dean Jansa 2005-01-06 15:25:10 UTC
DLM <CVS> (built Dec 20 2004 14:25:12) installed 
Lock_DLM (built Dec 20 2004 17:13:39) installed 
 
I hit this while running the program below, (grab a bunch of 
LKM_CRMODE locks and hold them, then release the lockspace).  Running 
this test in a loop on  6 nodes works for a few hours, then the test 
hangs.   
 
3 of the node are starting the test again, calling 
dlm_create_lockspace() I'd guess.  I can't attach to the processes to 
get the real answer, however if I cat /proc/<pid>/wchan I see that 
they are in kcl_join_service(). 
 
The other 3 nodes are near the last iteration (99 of 100). 
 
/proc/cluster/dlm_debug on each node: 
tank-01 (stuck in startup): 
0 0 0 "bogus" 
dlmtest (8481) un 10143 80000 0 0 "bogus" 
dlmtest (8481) un 1014e 80000 0 0 "bogus" 
dlmtest (8481) un 103b7 80000 0 0 "bogus" 
dlmtest (8481) un 1034e 80000 0 0 "bogus" 
dlmtest (8481) un 10276 80000 0 0 "bogus" 
dlmtest (8481) un 2010c 80000 0 0 "bogus" 
dlmtest (8481) un 102e8 80000 0 0 "bogus" 
dlmtest (8481) un 100ea 80000 0 0 "bogus" 
dlmtest (8481) un 1024d 80000 0 0 "bogus" 
dlmtest (8481) un 10119 80000 0 0 "bogus" 
dlmtest (8481) un 100da 80000 0 0 "bogus" 
dlmtest (8481) un 1034a 80000 0 0 "bogus" 
dlmtest (8481) un 101fc 80000 0 0 "bogus" 
dlmtest (8481) un 102d4 80000 0 0 "bogus" 
dlmtest (8481) un 102ae 80000 0 0 "bogus" 
dlmtest (8481) un 10134 80000 0 0 "bogus" 
dlmtest (8481) un 1007c 80000 0 0 "bogus" 
dlmtest (8481) un 10382 80000 0 0 "bogus" 
dlmtest (8481) un 10396 80000 0 0 "bogus" 
dlmtest (8481) un 10275 80000 0 0 "bogus" 
dlmtest (8481) un 1010c 80000 0 0 "bogus" 
dlmtest (8481) un 102f7 80000 0 0 "bogus" 
dlmtest (8481) un 1031f 80000 0 0 "bogus" 
dlmtest move flags 1,0,0 ids 2106,2106,2106 
 
tank-02 (stuck in startup): 
ogus" 
dlmtest (6688) un 10106 80000 0 0 "bogus" 
dlmtest (6688) un 1032a 80000 0 0 "bogus" 
dlmtest (6688) un 10092 80000 0 0 "bogus" 
dlmtest (6688) un 101fd 80000 0 0 "bogus" 
dlmtest (6688) un 1032d 80000 0 0 "bogus" 
dlmtest (6688) un 1006b 80000 0 0 "bogus" 
dlmtest (6688) un 10110 80000 0 0 "bogus" 
dlmtest (6688) un 103f2 80000 0 0 "bogus" 
dlmtest (6688) un 1030d 80000 0 0 "bogus" 
dlmtest (6688) un 1034a 80000 0 0 "bogus" 
dlmtest (6688) un 10005 80000 0 0 "bogus" 
dlmtest (6688) un 10341 80000 0 0 "bogus" 
dlmtest (6688) un 103d8 80000 0 0 "bogus" 
dlmtest (6688) un 1023c 80000 0 0 "bogus" 
dlmtest (6688) un 101c9 80000 0 0 "bogus" 
dlmtest (6688) un 103b9 80000 0 0 "bogus" 
dlmtest (6688) un 1012c 80000 0 0 "bogus" 
dlmtest (6688) un 1003f 80000 0 0 "bogus" 
dlmtest (6688) un 10157 80000 0 0 "bogus" 
dlmtest move flags 1,0,0 ids 2198,2198,2198 
dlmtest move flags 0,1,0 ids 0,2200,0 
dlmtest move use event 2200 
dlmtest recover event 2200 (first) 
dlmtest add nodes 
dlmtest rcom status 4 to 2 
dlmtest rcom send 1 to 4 id 2 
 
tank-03 (stuck in startup): 
0 0 0 "bogus" 
dlmtest (7451) un 1026e 80000 0 0 "bogus" 
dlmtest (7451) un 1000f 80000 0 0 "bogus" 
dlmtest (7451) un 101b0 80000 0 0 "bogus" 
dlmtest (7451) un 10127 80000 0 0 "bogus" 
dlmtest (7451) un 10239 80000 0 0 "bogus" 
dlmtest (7451) un 1002a 80000 0 0 "bogus" 
dlmtest (7451) un 1009c 80000 0 0 "bogus" 
dlmtest (7451) un 10338 80000 0 0 "bogus" 
dlmtest (7451) un 102f3 80000 0 0 "bogus" 
dlmtest (7451) un 10249 80000 0 0 "bogus" 
dlmtest (7451) un 1037c 80000 0 0 "bogus" 
dlmtest (7451) un 10177 80000 0 0 "bogus" 
dlmtest (7451) un 1014d 80000 0 0 "bogus" 
dlmtest (7451) un 10326 80000 0 0 "bogus" 
dlmtest (7451) un 10094 80000 0 0 "bogus" 
dlmtest (7451) un 1000b 80000 0 0 "bogus" 
dlmtest (7451) un 1038b 80000 0 0 "bogus" 
dlmtest (7451) un 101f0 80000 0 0 "bogus" 
dlmtest (7451) un 10164 80000 0 0 "bogus" 
dlmtest (7451) un 103e9 80000 0 0 "bogus" 
dlmtest (7451) un 1032b 80000 0 0 "bogus" 
dlmtest (7451) un 103ee 80000 0 0 "bogus" 
dlmtest (7451) un 10075 80000 0 0 "bogus" 
dlmtest move flags 1,0,0 ids 2024,2024,2024 
 
tank-04 (stuck at iteration 99): 
dlmtest move flags 1,1,0 ids 1958,1959,1958 
dlmtest move use event 1959 
dlmtest recover event 1959 
dlmtest remove node 1 
dlmtest rcom send 1 to 4 id 246 
dlmtest rcom status 4 to 4 
dlmtest rcom names len 8 to 4 id 219 
dlmtest rcom names len 8 to 5 id 262 
dlmtest rcom send 1 to 4 id 247 
dlmtest total nodes 3 
dlmtest rebuild resource directory 
dlmtest rcom send 2 to 4 id 248 
dlmtest rcom send 2 to 5 id 249 
dlmtest rcom names len 8 to 6 id 250 
dlmtest rebuilt 0 resources 
dlmtest purge requests 
dlmtest purged 0 requests 
dlmtest rcom send 1 to 4 id 251 
dlmtest rcom status 5 to 4 
dlmtest rcom send 1 to 4 id 252 
dlmtest mark waiting requests 
dlmtest marked 0 requests 
dlmtest purge locks of departed nodes 
dlmtest purged 0 locks 
dlmtest update remastered resources 
dlmtest updated 0 resources 
dlmtest rebuild locks 
dlmtest rebuilt 0 locks 
dlmtest recover event 1959 done 
dlmtest move flags 1,1,1 ids 1959,1960,1959 
dlmtest move use event 1960 
dlmtest recover event 1960 
dlmtest add node 2 
dlmtest rcom send 1 to 2 id 253 
 
tank-05 (stuck at iteration 99): 
m send 1 to 6 id 216 
dlmtest total nodes 3 
dlmtest rebuild resource directory 
dlmtest rcom names len 8 to 4 id 217 
dlmtest rcom send 2 to 5 id 218 
dlmtest rcom send 2 to 6 id 219 
dlmtest rebuilt 0 resources 
dlmtest purge requests 
dlmtest purged 0 requests 
dlmtest rcom status d to 4 
dlmtest rcom send 1 to 5 id 221 
dlmtest rcom status d to 5 
dlmtest rcom names len 8 to 5 id 260 
dlmtest rcom status d to 5 
dlmtest rcom status d to 6 
dlmtest rcom names len 8 to 6 id 248 
dlmtest rcom status d to 6 
dlmtest rcom send 1 to 5 id 222 
dlmtest rcom send 1 to 6 id 223 
dlmtest mark waiting requests 
dlmtest marked 0 requests 
dlmtest purge locks of departed nodes 
dlmtest purged 0 locks 
dlmtest update remastered resources 
dlmtest updated 0 resources 
dlmtest rebuild locks 
dlmtest rebuilt 0 locks 
dlmtest recover event 2017 done 
dlmtest rcom status f to 5 
dlmtest rcom status f to 6 
dlmtest move flags 1,1,1 ids 2017,2018,2017 
dlmtest move use event 2018 
dlmtest recover event 2018 
dlmtest add node 2 
dlmtest rcom send 1 to 2 id 224 
 
tank-06 (stuck at iteration 99): 
ids 1982,1983,1982 
dlmtest move use event 1983 
dlmtest recover event 1983 
dlmtest remove node 1 
dlmtest rcom send 1 to 4 id 258 
dlmtest rcom status 4 to 4 
dlmtest rcom names len 8 to 4 id 218 
dlmtest rcom status 4 to 4 
dlmtest rcom send 1 to 4 id 259 
dlmtest total nodes 3 
dlmtest rebuild resource directory 
dlmtest rcom send 2 to 4 id 260 
dlmtest rcom names len 8 to 5 id 261 
dlmtest rcom send 2 to 6 id 262 
dlmtest rebuilt 0 resources 
dlmtest purge requests 
dlmtest purged 0 requests 
dlmtest rcom send 1 to 4 id 263 
dlmtest rcom names len 8 to 6 id 249 
dlmtest rcom status 5 to 4 
dlmtest rcom send 1 to 4 id 264 
dlmtest mark waiting requests 
dlmtest marked 0 requests 
dlmtest purge locks of departed nodes 
dlmtest purged 0 locks 
dlmtest update remastered resources 
dlmtest updated 0 resources 
dlmtest rebuild locks 
dlmtest rebuilt 0 locks 
dlmtest recover event 1983 done 
dlmtest move flags 1,1,1 ids 1983,1984,1983 
dlmtest move use event 1984 
dlmtest recover event 1984 
dlmtest add node 2 
dlmtest rcom send 1 to 2 id 265 
 
 
-------------------------- dlmtest.c --------------------- 
 
 
#include <stdlib.h> 
#include <string.h> 
#include <limits.h> 
#include <assert.h> 
#include <pthread.h> 
#include <sys/types.h> 
#include <netdb.h> 
 
#define _REENTRANT 
#include <libdlm.h> 
 
#define NAMESPACE   "dlmtest" 
 
void eat_locks(int attempts); 
 
static dlm_lshandle_t mylockspace; 
 
int 
main(void) 
{ 
    int i; 
 
    mylockspace = NULL; 
    if ((mylockspace = dlm_create_lockspace(NAMESPACE, 0600)) == 
NULL) { 
        fprintf(stderr, "dlm_create_lockspace() failed.\n"); 
        exit(1); 
    } 
 
    dlm_ls_pthread_init(mylockspace); 
 
    for (i = 1; i <= 100; i++) { 
        printf("\neat_locks(%d)  ", i); 
        eat_locks(i); 
    } 
 
    if (dlm_release_lockspace(NAMESPACE, mylockspace, 0) < 0) { 
        fprintf(stderr, "dlm_release_lockspace() failed.\n"); 
        exit(1); 
    } 
 
    dlm_close_lockspace(mylockspace); 
 
    return 0; 
} 
 
 
void 
eat_locks(int attempts) 
{ 
    struct dlm_range range; 
    struct dlm_lksb lksb; 
    int retval; 
    int i; 
 
    for (i = 0; i < attempts; i+=2) { 
        range.ra_start = i; 
        range.ra_end = i+1; 
 
        retval = dlm_ls_lock_wait(mylockspace, LKM_CRMODE, &lksb, 
                                    LKF_NOQUEUE, "bogus", 
strlen("bogus"), 
                                    0, NULL, NULL, &range); 
 
        if (retval != 0) { 
            fprintf(stderr, "dlm_ls_lock_wait() failed.\n"); 
            perror("eat_locks, dlm_ls_lock_wait()"); 
            exit(1); 
        } 
 
        printf("."); 
 
    } 
 
    return; 
}

Comment 1 Dean Jansa 2005-01-10 20:43:53 UTC
Both Derek and I also see the dlm_release_lockspace() call fail from
time to time.  Always seems to happen at iteration 99.

Comment 2 David Teigland 2005-02-22 06:13:30 UTC
The problem appears to be fixed based on my testing.


Comment 3 Dean Jansa 2005-04-13 15:47:50 UTC
Fix verified