Bug 134559 - pvcreate hangs with todays build
Summary: pvcreate hangs with todays build
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: gfs
Version: 4
Hardware: i686
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Christine Caulfield
QA Contact: GFS Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-10-04 16:42 UTC by Corey Marthaler
Modified: 2010-01-12 02:59 UTC (History)
0 users

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2004-10-27 18:49:18 UTC
Embargoed:


Attachments (Terms of Use)

Description Corey Marthaler 2004-10-04 16:42:04 UTC
Description of problem:
I am seeing this repeatedly on my nodes when attempting to
create PVs. The 'pvcreate /dev/sda' cmd just appears to hang. 
This is very similar to bz134353. Here is where the strace 
shows it's at:

open("/lib/liblvm2clusterlock.so", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0L\7\0\000"...,
512) = 512
fstat64(3, {st_mode=S_IFREG|0555, st_size=9284, ...}) = 0
old_mmap(NULL, 9716, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x40218000
old_mmap(0x4021a000, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED, 3, 0x1000) = 0x4021a000
close(3)                                = 0
socket(PF_UNIX, SOCK_STREAM, 0)         = 3
connect(3, {sa_family=AF_UNIX, path=@clvmd}, 110) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN], [], 8) = 0
write(3, "3\1\0\277\0\0\0\0\0\0\0\0\f\0\0\0\0\4\0P_orphans\0+", 30) = 30
read(3,

How reproducible:
Always

Comment 1 Christine Caulfield 2004-10-05 07:39:24 UTC
This is almost certainly a the same thing as 134353 and the same
comment applies. 

Can you provide some clvmd debugging please as I can't make it fail here.

Readins as much as I can into the strace it looks like a simple VG
lock which is waiting (in clvmd). There are three possible scenarios
for that:

1. Locking is suspended because either the cluster isn't quorate or  
   the lockspace is in recovery.
   cat /proc/cluster/services is the diagnostic here, obviously
2. Some other process/machine already has the P_orphans lock.
   echo "clvmd" > /proc/cluster/dlm_locks
   cat /proc/cluster/dlm_locks
   will help to check for this
3. There's a bug somewhere in the chain, most likely in clvmd but
   possibly in the dlm userland interface.

Comment 2 Corey Marthaler 2004-10-05 15:11:19 UTC
At the time of the hang all nodes show that all services are in the 
run state: 
 
[root@morph-01 clvmd]# cat /proc/cluster/services 
Service          Name                              GID LID State     
Code 
Fence Domain:    "default"                           1   2 run       
- 
[1 3 4 5 2] 
 
DLM Lock Space:  "clvmd"                             2   3 run       
- 
[1 2 3 5 4] 
 
 
[root@morph-01 clvmd]# ps -ef | grep pvcreate 
root      4490  4488  0 10:02 ?        00:00:00 pvcreate /dev/sda 
root      4501  2153  0 10:12 pts/0    00:00:00 grep pvcreate 
 
 
[root@morph-01 clvmd]# echo "clvmd" > /proc/cluster/dlm_locks 
[root@morph-01 clvmd]# cat /proc/cluster/dlm_locks 
DLM lockspace 'clvmd' 
 
Resource f4d60934 (parent 00000000). Name (len=9) "P_orphans" 
Master Copy 
Granted Queue 
0001002c PW 
Conversion Queue 
Waiting Queue 
[root@morph-01 clvmd]# 

Comment 3 Christine Caulfield 2004-10-05 15:34:56 UTC
Ok, so we know it's getting the lock OK. that means it's probably
clvmd not replying for some reason. I copied your clvmd onto my
cluster and it still works fine so I'm going to need some help. Can
you build a debug clvmd and get some output from it? or leave me your
cluster to play with one morning ?

Comment 4 Corey Marthaler 2004-10-05 16:17:08 UTC
Here's a crazy twist, I can only see this bug if clvmd is run as a 
daemon. If I run it in the foreground (no daemonization) I don't see 
the bug and everything runs just fine.  

Comment 5 Christine Caulfield 2004-10-06 10:13:22 UTC
That was the clue I needed. See 134353 for more details.

Comment 6 Corey Marthaler 2004-10-27 18:49:18 UTC
fix verified.

Comment 7 Kiersten (Kerri) Anderson 2004-11-16 19:03:43 UTC
Updating version to the right level in the defects.  Sorry for the storm.


Note You need to log in before you can comment on or make changes to this bug.