449098 – Setting lock_timeout to 0 causes all sorts of problems

Bug 449098 - Setting lock_timeout to 0 causes all sorts of problems

Summary: Setting lock_timeout to 0 causes all sorts of problems

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Cluster Suite
Classification:	Retired
Component:	dlm-kernel
Sub Component:
Version:	4
Hardware:	All
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	---
Assignee:	Christine Caulfield
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-05-30 12:26 UTC by Christine Caulfield
Modified:	2009-04-16 20:32 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2009-01-13 16:50:27 UTC
Embargoed:

Attachments	(Terms of Use)
Patch to fix (1.11 KB, patch) 2008-06-02 12:25 UTC, Christine Caulfield	no flags	Details \| Diff
View All

Description Christine Caulfield 2008-05-30 12:26:22 UTC

Description of problem:

Some people have been given the impression that setting
/proc/cluster/config/dlm/lock_timeout to 0 will disable the lock_timeout feature
and get rid of the ETIMEDOUT errors returned from dlm_locks. This is not the case.

Setting lock_timeout to zero does exactly what it says, it sets the timeout to
zero! This means that the ETIMEDOUT errors are far MORE likely and also nodes
can oops and get into a tight loop (and thus removed the cluster).

How reproducible:
Very easily. It happens 3 times out of 4 roughly.

Steps to Reproduce:
1. Take a 3 node cluster
2. On all 3 nodes create a lockspace using dlm test:
   ./dlmtest -mnl -d 100000000 &
3. On all 3 nodes set lock_timeout to zero:
   echo 0 > /proc/cluster/config/dlm/lock_timeout
4. on all 3 nodes request another lock:
   ./dlmtest 

Actual results:

One node returns ETIMEDOUT from the lock operation, one Oopses and one gets
stuck in a tight loop and has to be power cycled! Sometimes you might only get
one or two of these symptoms.

Expected results:

Normal locking. The nodes each get the lock in turn then release it. It should
also be possible to disable lock_timeouts using this configuration variable.

Additional info:

At least one site seems to have been recommended to use this setting

Comment 1 Christine Caulfield 2008-06-02 12:25:36 UTC

Created attachment 307354 [details]
Patch to fix

This probably needs a review ... for the brackets as much as anything else.

Because the timer is triggered using the lock_timeout value we can't just
ignore it if it's zero or it would be impossible to re-enable it! It would also
disable deadlock checking for all lockspaces.

So what I've done here is to default the timer to 30 seconds (this will only
affects the deadlock checker) if lock_timeout is set to zero. 30 seconds is the
compiled-in default anyway.

Comment 2 Christine Caulfield 2009-01-13 16:50:27 UTC

The fix for this is just "don't do that". The patch is not worth the trouble integrating into 4.8 and testing.

Note You need to log in before you can comment on or make changes to this bug.