130148 – Oops while releasing DLM lock space

Bug 130148 - Oops while releasing DLM lock space

Summary: Oops while releasing DLM lock space

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Cluster Suite
Classification:	Retired
Component:	gfs
Sub Component:
Version:	4
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Christine Caulfield
QA Contact:	GFS Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-08-17 15:15 UTC by Dean Jansa
Modified:	2010-01-12 02:56 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2004-08-18 15:44:24 UTC
Embargoed:

Attachments	(Terms of Use)

Description Dean Jansa 2004-08-17 15:15:03 UTC

Description of problem: 
Running dlmlock (attempt to get a lock in a lockspace, and attempt 
to get the same lock, then release lockspace) results the dlmlock 
program to hang (on an SMP kern).  Running the test again while the 
first copy is hung results in a Oops (note, I had to run dlmlock a 
few times to hit the Oops). 
 
On a UP kernel the machine just "reboots."  No stack trace. 
 
Unable to handle kernel NULL pointer dereference at virtual address 
00000000 
 printing eip: 
f8a88ad2 
*pde = f23dc067 
Oops: 0000 [#1] 
SMP 
Modules linked in: gnbd lock_gulm lock_nolock lock_dlm dlm cman gfs 
lock_harness lpfc scsi_transport_fc sd_mod ipv6 parport_pc lp 
parport autofs4 sunrpc e1000 floppy sg scsi_mod microcode dm_mod 
uhci_hcd ehci_hcd button battery asus_acpi ac ext3 jbd 
CPU:    2 
EIP:    0060:[<f8a88ad2>]    Not tainted 
EFLAGS: 00010282   (2.6.7) 
EIP is at dlm_close+0x162/0x270 [dlm] 
eax: ffffffea   ebx: ffffffea   ecx: 00000017   edx: f632ec00 
esi: 00000000   edi: ffffffd8   ebp: f7b43394   esp: f5b0dec4 
ds: 007b   es: 007b   ss: 0068 
Process dlmlock (pid: 3904, threadinfo=f5b0c000 task=f78d8bd0) 
Stack: f5b0df2c f5b0df28 00000000 f63aaed8 ffffffd8 00000000 
00000000 ffffffff 
       ffffffff 00000000 f78d8bd0 c011f600 00000000 00000000 
f5b0df4c f5b0df4c 
       00000000 00000000 f78d8bd0 c011f600 f5b0df40 f5b0df40 
f5905670 00000000 
Call Trace: 
 [<c011f600>] default_wake_function+0x0/0x10 
 [<c011f600>] default_wake_function+0x0/0x10 
 [<c015c7c6>] __fput+0xf6/0x110 
 [<c015afcf>] filp_close+0x4f/0x80 
 [<c0105e3d>] sysenter_past_esp+0x52/0x71 
 
Code: 8b 47 28 83 e8 28 89 44 24 10 8d 47 28 39 e8 75 8d 8d 54 24 
 Segmentation fault 
 
 
Version-Release number of selected component (if applicable): 
DLM <CVS> (built Aug 17 2004 09:37:25) installed  
 
How reproducible: 
Always 
 
Steps to Reproduce: 
1 ./dlmlock 
2. Above will hang at iteration #3. 
3. ./dlmlock  (repeat a few times and the Oops will trip) 
     
 
Additional info:  dlmlock.c 
 
#include <stdio.h> 
#include <stdlib.h> 
#include <string.h> 
#include <limits.h> 
#include <assert.h> 
#include <pthread.h> 
#include <sys/types.h> 
#include <netdb.h> 
 
#define _REENTRANT 
#include <libdlm.h> 
 
#define NAMESPACE       "dlmtest" 
 
void eat_locks(int attempts); 
 
static dlm_lshandle_t mylockspace; 
 
int 
main(void) 
{ 
        int i; 
 
        for (i = 1; i <= 10; i++) { 
                printf("eat_locks(%d)\n", i); 
                mylockspace = NULL; 
                if ((mylockspace = dlm_create_lockspace(NAMESPACE, 
0600)) == NULL) { 
                        fprintf(stderr, "dlm_create_lockspace() 
failed.\n"); 
                        exit(1); 
                } 
 
                dlm_ls_pthread_init(mylockspace); 
 
                eat_locks(i); 
 
                if (dlm_release_lockspace(NAMESPACE, mylockspace, 0) 
< 0) { 
                        fprintf(stderr, "dlm_release_lockspace() 
failed.\n"); 
                        exit(1); 
                } 
 
                dlm_close_lockspace(mylockspace); 
        } 
 
 
        return 0; 
} 
 
 
void 
eat_locks(int attempts) 
{ 
        struct dlm_range range; 
        struct dlm_lksb lksb; 
        int retval; 
        int i; 
 
        for (i = 0; i < attempts; i+=2) { 
                range.ra_start = i; 
                range.ra_end = i+1; 
 
                retval = dlm_ls_lock_wait(mylockspace, LKM_EXMODE, 
&lksb, 
                                                                        
LKF_NOQUEUE, "bogus", strlen("bogus"), 
                                                                        
0, NULL, NULL, &range); 
 
                fprintf(stderr, "eat_locks: (1)dlm_ls_lock_wait 
returned %d\n", retval); 
                fprintf(stderr, "eat_locks: (1)lksb.sb_status is 
%d\n", lksb.sb_status); 
 
                retval = dlm_ls_lock_wait(mylockspace, LKM_EXMODE, 
&lksb, 
                                                                        
LKF_NOQUEUE, "bogus", strlen("bogus"), 
                                                                        
0, NULL, NULL, &range); 
 
                fprintf(stderr, "eat_locks: (2)dlm_ls_lock_wait 
returned %d\n", retval); 
                fprintf(stderr, "eat_locks: (2)lksb.sb_status is 
%d\n", lksb.sb_status); 
        } 
 
        return; 
}

Comment 1 Christine Caulfield 2004-08-18 12:58:44 UTC

Checking in device.c;
/cvs/cluster/cluster/dlm-kernel/src/device.c,v  <--  device.c
new revision: 1.13; previous revision: 1.12
done
Checking in dlm_internal.h;
/cvs/cluster/cluster/dlm-kernel/src/dlm_internal.h,v  <--  dlm_internal.h
new revision: 1.17; previous revision: 1.16
done


Don't hang lkbs off the ownerqueue list as we don't have any control
over their lifetime. Now that LKBs are destroyed before the ASTs are
run this causes real problems.

The ownerqueue is now strung through the lock_info structs themselves
and we free those up when we can see that the lkb has been removed
by the DLM core.

Comment 2 Dean Jansa 2004-08-18 15:44:24 UTC

Test case now passes.

Comment 3 Kiersten (Kerri) Anderson 2004-11-16 19:06:36 UTC

Updating version to the right level in the defects.  Sorry for the storm.

Note You need to log in before you can comment on or make changes to this bug.