Bug 232878

Summary: failed to acquire lockspace [rgmanager related]
Product: Red Hat Enterprise Linux 5 Reporter: Fabio Massimo Di Nitto <fdinitto>
Component: cmanAssignee: Christine Caulfield <ccaulfie>
Status: CLOSED CURRENTRELEASE QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.0CC: ccaulfie, cluster-maint, jobot, rmccabe
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 2.6.18-8.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-08-14 19:00:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
cluster configuration
none
strace -a 80 -s 80 -f -o /home/fabbione/rgmanager.stracer clurgmgrd
none
dlm_controld debug output from node diapolon
none
dlm_controld debug output from node golion
none
/proc/misc node diapolon before crash
none
/proc/misc node diapolon after crash
none
/proc/misc node golion before crash
none
/proc/misc node golion after crash
none
Very basic patch to test theory
none
strace of rgmanager that dies on node diapolon.
none
rgmanager strace on node diapolon at the first start and first stop none

Description Fabio Massimo Di Nitto 2007-03-19 06:48:38 UTC
Description of problem:

a service switching between a 2 node cluster (ip.sh in this case) will fail
after a couple of times if nodes are not rebooted.
detailed description in "Steps to reproduce".

Version-Release number of selected component (if applicable):

cvs head checkout as of 2007 03 16 for userland.
kernel 2.6.20 + gfs2-2.6-nmw at 549ae0ac3d574a682e82b02e79259a65445a675b

How reproducible:

always

Steps to Reproduce:
1. configure a 2 node cluster with a share ip service (config in attachment).
2. boot the cluster, make it start, make it quorate, rgmanager start, the ip
service is active on node diapolon (see config).

cman_tool services
type             level name       id       state       
fence            0     default    00010001 none        
[1 2]
dlm              1     clvmd      00020001 none        
[1 2]
dlm              1     rgmanager  00030001 none        
[1 2]

confirms that locking is ok for rgmanager. (stopping clvmd makes no difference).

3. stop rgmanager on node diapolon.

syslog and ip addr confirm that the service has been stopped on node diapolon
and properly started on node golion.

cman_tool services
type             level name       id       state       
fence            0     default    00010001 none        
[1 2]
dlm              1     clvmd      00020001 none        
[1 2]
dlm              1     rgmanager  00030001 none        
[1 2]

confirms dlm allocation for rgmanager on both node even if it's not running on
node diapolon.

4. start rgmanager on diapolon.

ps ax confirms rgmanager running.

5. stop rgmanager on node golion.

Actual results:

rgmanager on node diapolon is killed and can't start anymore (strace -f
clurgmgrd from node diapolon in attachement) and clearly the service is not
started on the node.

Expected results:

service migration should succeed and rgmanager not crash.

Additional info:

note one of the node is (golion) ia64 the other (diapolon) is x86 but they both
fail the same way depending on which one has the active service.

Comment 1 Fabio Massimo Di Nitto 2007-03-19 06:48:38 UTC
Created attachment 150351 [details]
cluster configuration

Comment 2 Fabio Massimo Di Nitto 2007-03-19 06:52:00 UTC
Created attachment 150352 [details]
strace -a 80 -s 80 -f -o /home/fabbione/rgmanager.stracer clurgmgrd

Comment 3 Fabio Massimo Di Nitto 2007-03-19 06:54:47 UTC
the strace shows that:

13035 open("/dev/misc/dlm_rgmanager", O_RDWR) = -1 ENODEV (No such device)

but:

root@diapolon:/dev/misc# ls -als
total 0
0 drwxr-xr-x  2 root root    120 2007-03-19 07:45 .
0 drwxr-xr-x 15 root root  13620 2007-03-19 06:42 ..
0 crw-rw----  1 root root 10, 60 2007-03-19 06:42 dlm_clvmd
0 crw-rw----  1 root root 10, 62 2007-03-19 06:42 dlm-control
0 crw-r--r--  1 root root 10,  0 2007-03-19 07:45 dlm_rgmanager
0 crw-------  1 root root 10, 61 2007-03-19 06:42 lock_dlm_plock

and at last:

golion:

cman_tool services
type             level name     id       state       
fence            0     default  00010001 none        
[1 2]
dlm              1     clvmd    00020001 none        
[1 2]

diapolon (where rgmanager fails to start and strace taken):

cman_tool services
type             level name       id       state       
fence            0     default    00010001 none        
[1 2]
dlm              1     clvmd      00020001 none        
[1 2]
dlm              1     rgmanager  00030001 none        
[1]

hope this is enough info. Anyway i can reproduce this everytime in matter of
minutes, so if you need more info please don't esitate to ask. It's possible for
me to grant access to the cluster since it's just a sandbox.

Fabio

Comment 4 Christine Caulfield 2007-03-19 08:42:54 UTC
There's something odd going on with udev that we've seen in other places. It
seems (sometimes!) to create the dlm lockspace devices in /dev rather than
/dev/misc. This can cause a 10 second delay in libdlm as it waits fruitlessly
for the device to appear before giving up and creating it itself. There could be
a race condition here then maybe ?

Comment 5 Fabio Massimo Di Nitto 2007-03-19 09:46:07 UTC
udev seems to behave correctly. checked both rules and tested for existance of
/dev/dlm* while operating.

In attachement dlm_controld -D output from both nodes while reproducing the problem.
(i added a bunch of --- status --- to make it easier to understand what's
happening when).

Fabio

Comment 6 Fabio Massimo Di Nitto 2007-03-19 09:46:54 UTC
Created attachment 150357 [details]
dlm_controld debug output from node diapolon

Comment 7 Fabio Massimo Di Nitto 2007-03-19 09:47:34 UTC
Created attachment 150358 [details]
dlm_controld debug output from node golion

Comment 8 Lon Hohberger 2007-03-19 13:32:07 UTC
13035 open("/dev/misc/dlm_rgmanager", O_RDWR)                                  
= -1 ENODEV (No such device)

Eeeeeeek.

Comment 9 Lon Hohberger 2007-03-19 13:36:34 UTC
Ok, ENODEV fails with the open() call; ostensibly in libdlm.  Patrick - if
there's a race where it sometimes creates in the wrong location - would it make
sense to have libdlm check the alternate location?

The problem is that the device reports that it's there.  It's almost as if the
kernel->device linkage-magic is gone.

Comment 10 Lon Hohberger 2007-03-19 13:37:34 UTC
...and by "device reports" I mean "file system reports"

Comment 11 Lon Hohberger 2007-03-19 13:50:56 UTC
Adding Ryan - I am pretty sure he's seen this on his cluster.

Comment 13 Fabio Massimo Di Nitto 2007-03-19 14:12:20 UTC
Created attachment 150369 [details]
/proc/misc  node diapolon before crash

Comment 14 Fabio Massimo Di Nitto 2007-03-19 14:13:06 UTC
Created attachment 150370 [details]
/proc/misc node diapolon after crash

Comment 15 Fabio Massimo Di Nitto 2007-03-19 14:14:23 UTC
Created attachment 150371 [details]
/proc/misc node golion before crash

Comment 16 Fabio Massimo Di Nitto 2007-03-19 14:16:02 UTC
Created attachment 150373 [details]
/proc/misc node golion after crash

Comment 17 Christine Caulfield 2007-03-19 14:25:56 UTC
This looks like it might be a race condition in lockspace teardown.

What seems to be happening is that rgmanager is removing the lockspace when it
shuts down, but this takes some time to happen.

When rgmanager starts up again, it tries to recreate the lockspace. But it's
already shutting down so it returns -EEXIST because it's still there. By the
time rgmanager (in libdlm) gets to the point where it needs to check the device
minor then lockspace has gone (no record of it in /proc/misc) so it creates a
stupid device with a minor of 0. Access this always return -ENODEV.

The easiest way to fix this (I think) is to patch libdlm so that if the
lockspace does not exist in /proc/misc it loops back to attempt to create it again.

Comment 18 Christine Caulfield 2007-03-19 14:29:14 UTC
Created attachment 150378 [details]
Very basic patch to test theory

Comment 19 Fabio Massimo Di Nitto 2007-03-20 05:35:10 UTC
The patch helps only partially. The service still does not failover as it
should, but i can clurgmgr still running (it doesn't crash and it can start
again) but it doesn't take any action on what needs to be done.
cman_tools services still show a discrepancy on dlm services as posted before.

Fabio

Comment 20 Fabio Massimo Di Nitto 2007-03-20 07:15:02 UTC
After another run I noticed that the rgmanager process that before used to
crash, is now spinning 100% CPU on this:

write(7, "\5\0\0\0\0\0\0\0\0\0\0\0\4\0\360\267\0\0\0\0\1\0\0\0rg"..., 89) = -1
EEXIST (File exists)
open("/proc/misc", O_RDONLY)            = 8
fstat64(8, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0xb7eed000
read(8, " 58 acpi\n 60 dlm_clvmd\n 61 lock_"..., 1024) = 155
read(8, "", 1024)                       = 0
close(8)                                = 0
munmap(0xb7eed000, 4096)                = 0
write(7, "\5\0\0\0\0\0\0\0\0\0\0\0\4\0\360\267\0\0\0\0\1\0\0\0rg"..., 89) = -1
EEXIST (File exists)
open("/proc/misc", O_RDONLY)            = 8
fstat64(8, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0xb7eed000
read(8, " 58 acpi\n 60 dlm_clvmd\n 61 lock_"..., 1024) = 155
read(8, "", 1024)                       = 0
close(8)                                = 0
munmap(0xb7eed000, 4096)                = 0
write(7, "\5\0\0\0\0\0\0\0\0\0\0\0\4\0\360\267\0\0\0\0\1\0\0\0rg"..., 89) = -1
EEXIST (File exists)

root@diapolon:~# cat /proc/misc 
 58 acpi
 60 dlm_clvmd
 61 lock_dlm_plock
 62 dlm-control
175 agpgart
130 watchdog
183 hw_random
 63 device-mapper
  1 psaux
228 hpet
135 rtc
231 snapshot

so i guess this is the libdlm patch from this bug that's trying to wait but
never exits.

Fabio

Comment 21 Christine Caulfield 2007-03-20 08:48:05 UTC
So that looks like it might be a DLM kernel bug then. It's returning -EEXIST for
a lockspace that exists but doesn't have a misc device associated with it.

Comment 22 Fabio Massimo Di Nitto 2007-03-20 09:43:00 UTC
Created attachment 150470 [details]
strace of rgmanager that dies on node diapolon.

After rgmanager dies we can't restart it anylonger

Comment 23 Fabio Massimo Di Nitto 2007-03-20 10:25:10 UTC
Created attachment 150472 [details]
rgmanager strace on node diapolon at the first start and first stop

Comment 24 Christine Caulfield 2007-03-20 14:40:40 UTC
I see what's happening.

The DLM removes the misc device, then attempts to remove the lockspace. But if
the lockspace removal fails (as it can if the LS is mastering locks) then we are
left with a lockspace and no associated misc device.

A temporary solution is to force release the lockspace in rgmanager. This is OK
because rgmanager is the only application using that lockspace on this node, and
it has the benficial side-effect of remastering any locks away from this node.

I'm working on a DLM patch that will recreate the misc device if the release
lockspace fails and will post it shortly.

Removing the lockspace BEFORE the misc device is a bad idea because it leaves a
situation where an application could open the misc device that is not associated
with a lockspace; with, er, interesting results I'm sure. This way leaves a
small space where the misc device does not exist but then we are trying to
destroy a lockspace so it's not really bad.

Comment 25 Lon Hohberger 2007-03-26 15:46:26 UTC
Hi, 

I applied Patrick's fix to rgmanager in the RHEL5/50/HEAD trees.

Comment 26 Christine Caulfield 2007-03-28 12:34:53 UTC
And the kernel fix is now in Steve's git tree


Comment 27 Kiersten (Kerri) Anderson 2007-04-23 16:49:20 UTC
Fixing Product Name.  Cluster Suite was merged into Red Hat Enterpise Linux for
5.0.  In addition dlm, fence and ccs were merged into the cman package, so
bugzilla should reflect package name where those utilities are located.

Comment 29 RHEL Program Management 2007-05-01 17:31:20 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 30 Chris Feist 2007-08-14 19:00:14 UTC
Closing as this was fixed in kernel-2.6.18-8.el5.