Bug 201325

Summary: Kernel Oops when passing LKF_CANCEL to dlm_ls_unlock_wait
Product: [Retired] Red Hat Cluster Suite Reporter: Carsten Clasohm <clasohm>
Component: dlmAssignee: David Teigland <teigland>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: ccaulfie, cluster-maint
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2007-0137 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-05-10 21:26:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Test program to reproduce the kernel oops.
none
netdump log of the kernel oops
none
fixes the list_del call in ast_routine none

Description Carsten Clasohm 2006-08-04 12:25:27 UTC
Description of problem:

When calling dlm_ls_unlock_wait with flag LKF_CANCEL for a lock which the
process still waits for, the kernel panics. This seems to be because ast_routine
tries to delete an empty kernel list.

Version-Release number of selected component (if applicable):

dlm-1.0.0-5
kernel-2.6.9-34.0.2.EL
dlm-kernel-2.6.9-41.7.2

How reproducible:

always


Steps to Reproduce:

1. Set up a Cluster Suite 4
2. Optional: set up netdump
3. Compile and run the attached program and run it
  
Actual results:

Kernel oops

Expected results:

no kernel oops

Additional info:

I have also attached the netdump log, and a patch for dlm-kernel. The patch
works for this test case and looks correct to me, but I am no dlm expert.

Comment 1 Carsten Clasohm 2006-08-04 12:25:29 UTC
Created attachment 133625 [details]
Test program to reproduce the kernel oops.

Comment 2 Carsten Clasohm 2006-08-04 12:27:51 UTC
Created attachment 133626 [details]
netdump log of the kernel oops

ast_routine+0x149/0x204 is the list_del call on line 336 of src/device.c in
package dlm-kernel.

Comment 3 Carsten Clasohm 2006-08-04 12:30:03 UTC
Created attachment 133627 [details]
fixes the list_del call in ast_routine

tested with package dlm-kernel-2.6.9-41.7.2

Comment 6 David Teigland 2006-08-14 21:04:53 UTC
fixed in RHEL4 branch:
/cvs/cluster/cluster/dlm-kernel/src/Attic/device.c,v  <--  device.c
new revision: 1.24.2.9; previous revision: 1.24.2.8

and STABLE branch:
/cvs/cluster/cluster/dlm-kernel/src/Attic/device.c,v  <--  device.c
new revision: 1.24.2.1.4.1.2.9; previous revision: 1.24.2.1.4.1.2.8


Comment 10 Red Hat Bugzilla 2007-05-10 21:26:47 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0137.html