Bug 201325 - Kernel Oops when passing LKF_CANCEL to dlm_ls_unlock_wait
Summary: Kernel Oops when passing LKF_CANCEL to dlm_ls_unlock_wait
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: dlm
Version: 4
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: David Teigland
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-08-04 12:25 UTC by Carsten Clasohm
Modified: 2018-10-19 20:34 UTC (History)
2 users (show)

Fixed In Version: RHBA-2007-0137
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-05-10 21:26:47 UTC
Embargoed:


Attachments (Terms of Use)
Test program to reproduce the kernel oops. (2.21 KB, text/x-csrc)
2006-08-04 12:25 UTC, Carsten Clasohm
no flags Details
netdump log of the kernel oops (45.40 KB, text/plain)
2006-08-04 12:27 UTC, Carsten Clasohm
no flags Details
fixes the list_del call in ast_routine (514 bytes, patch)
2006-08-04 12:30 UTC, Carsten Clasohm
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2007:0137 0 normal SHIPPED_LIVE dlm-kernel bug fix update 2007-05-10 21:26:45 UTC

Description Carsten Clasohm 2006-08-04 12:25:27 UTC
Description of problem:

When calling dlm_ls_unlock_wait with flag LKF_CANCEL for a lock which the
process still waits for, the kernel panics. This seems to be because ast_routine
tries to delete an empty kernel list.

Version-Release number of selected component (if applicable):

dlm-1.0.0-5
kernel-2.6.9-34.0.2.EL
dlm-kernel-2.6.9-41.7.2

How reproducible:

always


Steps to Reproduce:

1. Set up a Cluster Suite 4
2. Optional: set up netdump
3. Compile and run the attached program and run it
  
Actual results:

Kernel oops

Expected results:

no kernel oops

Additional info:

I have also attached the netdump log, and a patch for dlm-kernel. The patch
works for this test case and looks correct to me, but I am no dlm expert.

Comment 1 Carsten Clasohm 2006-08-04 12:25:29 UTC
Created attachment 133625 [details]
Test program to reproduce the kernel oops.

Comment 2 Carsten Clasohm 2006-08-04 12:27:51 UTC
Created attachment 133626 [details]
netdump log of the kernel oops

ast_routine+0x149/0x204 is the list_del call on line 336 of src/device.c in
package dlm-kernel.

Comment 3 Carsten Clasohm 2006-08-04 12:30:03 UTC
Created attachment 133627 [details]
fixes the list_del call in ast_routine

tested with package dlm-kernel-2.6.9-41.7.2

Comment 6 David Teigland 2006-08-14 21:04:53 UTC
fixed in RHEL4 branch:
/cvs/cluster/cluster/dlm-kernel/src/Attic/device.c,v  <--  device.c
new revision: 1.24.2.9; previous revision: 1.24.2.8

and STABLE branch:
/cvs/cluster/cluster/dlm-kernel/src/Attic/device.c,v  <--  device.c
new revision: 1.24.2.1.4.1.2.9; previous revision: 1.24.2.1.4.1.2.8


Comment 10 Red Hat Bugzilla 2007-05-10 21:26:47 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0137.html



Note You need to log in before you can comment on or make changes to this bug.