Bug 449098 - Setting lock_timeout to 0 causes all sorts of problems
Setting lock_timeout to 0 causes all sorts of problems
Status: CLOSED WONTFIX
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: dlm-kernel (Show other bugs)
4
All Linux
low Severity low
: ---
: ---
Assigned To: Christine Caulfield
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-05-30 08:26 EDT by Christine Caulfield
Modified: 2009-04-16 16:32 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-01-13 11:50:27 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch to fix (1.11 KB, patch)
2008-06-02 08:25 EDT, Christine Caulfield
no flags Details | Diff

  None (edit)
Description Christine Caulfield 2008-05-30 08:26:22 EDT
Description of problem:

Some people have been given the impression that setting
/proc/cluster/config/dlm/lock_timeout to 0 will disable the lock_timeout feature
and get rid of the ETIMEDOUT errors returned from dlm_locks. This is not the case.

Setting lock_timeout to zero does exactly what it says, it sets the timeout to
zero! This means that the ETIMEDOUT errors are far MORE likely and also nodes
can oops and get into a tight loop (and thus removed the cluster).

How reproducible:
Very easily. It happens 3 times out of 4 roughly.

Steps to Reproduce:
1. Take a 3 node cluster
2. On all 3 nodes create a lockspace using dlm test:
   ./dlmtest -mnl -d 100000000 &
3. On all 3 nodes set lock_timeout to zero:
   echo 0 > /proc/cluster/config/dlm/lock_timeout
4. on all 3 nodes request another lock:
   ./dlmtest 

Actual results:

One node returns ETIMEDOUT from the lock operation, one Oopses and one gets
stuck in a tight loop and has to be power cycled! Sometimes you might only get
one or two of these symptoms.

Expected results:

Normal locking. The nodes each get the lock in turn then release it. It should
also be possible to disable lock_timeouts using this configuration variable.

Additional info:

At least one site seems to have been recommended to use this setting
Comment 1 Christine Caulfield 2008-06-02 08:25:36 EDT
Created attachment 307354 [details]
Patch to fix

This probably needs a review ... for the brackets as much as anything else.

Because the timer is triggered using the lock_timeout value we can't just
ignore it if it's zero or it would be impossible to re-enable it! It would also
disable deadlock checking for all lockspaces.

So what I've done here is to default the timer to 30 seconds (this will only
affects the deadlock checker) if lock_timeout is set to zero. 30 seconds is the
compiled-in default anyway.
Comment 2 Christine Caulfield 2009-01-13 11:50:27 EST
The fix for this is just "don't do that". The patch is not worth the trouble integrating into 4.8 and testing.

Note You need to log in before you can comment on or make changes to this bug.