Bug 449098 - Setting lock_timeout to 0 causes all sorts of problems
Summary: Setting lock_timeout to 0 causes all sorts of problems
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: dlm-kernel
Version: 4
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: Christine Caulfield
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-05-30 12:26 UTC by Christine Caulfield
Modified: 2009-04-16 20:32 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-01-13 16:50:27 UTC
Embargoed:


Attachments (Terms of Use)
Patch to fix (1.11 KB, patch)
2008-06-02 12:25 UTC, Christine Caulfield
no flags Details | Diff

Description Christine Caulfield 2008-05-30 12:26:22 UTC
Description of problem:

Some people have been given the impression that setting
/proc/cluster/config/dlm/lock_timeout to 0 will disable the lock_timeout feature
and get rid of the ETIMEDOUT errors returned from dlm_locks. This is not the case.

Setting lock_timeout to zero does exactly what it says, it sets the timeout to
zero! This means that the ETIMEDOUT errors are far MORE likely and also nodes
can oops and get into a tight loop (and thus removed the cluster).

How reproducible:
Very easily. It happens 3 times out of 4 roughly.

Steps to Reproduce:
1. Take a 3 node cluster
2. On all 3 nodes create a lockspace using dlm test:
   ./dlmtest -mnl -d 100000000 &
3. On all 3 nodes set lock_timeout to zero:
   echo 0 > /proc/cluster/config/dlm/lock_timeout
4. on all 3 nodes request another lock:
   ./dlmtest 

Actual results:

One node returns ETIMEDOUT from the lock operation, one Oopses and one gets
stuck in a tight loop and has to be power cycled! Sometimes you might only get
one or two of these symptoms.

Expected results:

Normal locking. The nodes each get the lock in turn then release it. It should
also be possible to disable lock_timeouts using this configuration variable.

Additional info:

At least one site seems to have been recommended to use this setting

Comment 1 Christine Caulfield 2008-06-02 12:25:36 UTC
Created attachment 307354 [details]
Patch to fix

This probably needs a review ... for the brackets as much as anything else.

Because the timer is triggered using the lock_timeout value we can't just
ignore it if it's zero or it would be impossible to re-enable it! It would also
disable deadlock checking for all lockspaces.

So what I've done here is to default the timer to 30 seconds (this will only
affects the deadlock checker) if lock_timeout is set to zero. 30 seconds is the
compiled-in default anyway.

Comment 2 Christine Caulfield 2009-01-13 16:50:27 UTC
The fix for this is just "don't do that". The patch is not worth the trouble integrating into 4.8 and testing.


Note You need to log in before you can comment on or make changes to this bug.