169171 – mount deadlock after recovery during regression tests

Bug 169171 - mount deadlock after recovery during regression tests

Summary: mount deadlock after recovery during regression tests

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Cluster Suite
Classification:	Retired
Component:	gulm
Sub Component:
Version:	4
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Chris Feist
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	164915
TreeView+	depends on / blocked

Reported:	2005-09-23 21:44 UTC by Corey Marthaler
Modified:	2009-04-16 20:02 UTC (History)
CC List:	2 users (show)
Fixed In Version:	RHBA-2006-0238
Clone Of:
Environment:
Last Closed:	2006-03-09 19:52:25 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2006:0238	0	normal	SHIPPED_LIVE	gulm bug fix update	2006-03-09 05:00:00 UTC

Description Corey Marthaler 2005-09-23 21:44:37 UTC

Description of problem:
[revolver] Iteration 2 started at Fri Sep 23 16:14:21 CDT 2005
[revolver]      Gulm Status
[revolver]      ===========
[revolver]      link-01: Slave
[revolver]      link-02: Slave
[revolver]      link-08: Master
[revolver] Those facing the revolver=link-08 link-02
[revolver] Feeling lucky link-08? Well do ya? Go'head make my day...
[revolver] Feeling lucky link-02? Well do ya? Go'head make my day...
[revolver] checking pulses for death:
[revolver]      link-08... DEAD
[revolver]      link-02... DEAD
[revolver] Sleeping a minute and a half for victim(s) to come back to life...
[revolver] Read from remote host link-02: Connection reset by peer
[revolver] Still not all alive, sleeping another 10 seconds
[revolver] Still not all alive, sleeping another 10 seconds
[revolver] Still not all alive, sleeping another 10 seconds
[revolver] Still not all alive, sleeping another 10 seconds
[revolver] Still not all alive, sleeping another 10 seconds
[revolver] Still not all alive, sleeping another 10 seconds
[revolver] Still not all alive, sleeping another 10 seconds
[revolver] All killed nodes are back up, making sure they're sshable...
[revolver] Still not all sshable, sleeping another 10 seconds
[revolver] Still not all sshable, sleeping another 10 seconds
[revolver] Still not all sshable, sleeping another 10 seconds
[revolver] Still not all sshable, sleeping another 10 seconds
[revolver] Still not all sshable, sleeping another 10 seconds
[revolver] Still not all sshable, sleeping another 10 seconds
[revolver] mounting /dev/mapper/gfs-gfs0 on /mnt/gfs0 on link-08


[HANG]


I then atttempted a mount of a different filesystem on a different node and the
hung as well:
[...]
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x2a95576000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x2a95577000
mprotect(0x37be328000, 12288, PROT_READ) = 0
arch_prctl(0x1002, 0x2a95576b00)        = 0
munmap(0x2a95557000, 123393)            = 0
brk(0)                                  = 0x615000
brk(0x636000)                           = 0x636000
open("/usr/lib/locale/locale-archive", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=39554608, ...}) = 0
mmap(NULL, 39554608, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2a95578000
close(3)                                = 0
umask(022)                              = 022
open("/dev/null", O_RDWR)               = 3
close(3)                                = 0
getuid()                                = 0
geteuid()                               = 0
lstat("/etc/mtab", {st_mode=S_IFREG|0644, st_size=321, ...}) = 0
stat("/sbin/mount.gfs", 0x7fbffff570)   = -1 ENOENT (No such file or directory)
rt_sigprocmask(SIG_BLOCK, ~[TRAP SEGV RTMIN RT_1], NULL, 8) = 0
mount("/dev/gfs/gfs2", "/mnt/gfs2", "gfs", 0xc0ed0000, 0 


Version-Release number of selected component (if applicable):
GFS 2.6.9-42.1 (built Sep 20 2005 16:14:27) installed
Gulm 2.6.9-42.1 (built Sep 20 2005 16:14:11) installed


How reproducible:
Have only seen this once in during regression testing. Will try and reproduce to
gather more info.

Comment 1 Corey Marthaler 2005-09-23 21:50:25 UTC

GFS: Trying to join cluster "lock_gulm", "LINK_128:gfs2"
GFS: fsid=LINK_128:gfs2.2: Joined cluster. Now mounting FS...
Sep 23 11:04:13 link-08 kernel: GFS: Trying to join cluster "lock_gulm",
"LINK_128:gfs2"
Sep 23 11:04:13 link-08 kernel: GFS: fsid=LINK_128:gfs2.2: Joined cluster. Now
mounting FS...


More info:
[root@link-01 ~]# gulm_tool nodelist $(hostname)
 Name: link-01
  ip    = ::ffff:10.15.84.151
  state = Logged in
  last state = Was Logged in
  mode = Master
  missed beats = 0
  last beat = 1127494104165538
  delay avg = 10001062
  max delay = 18446744072689134512

 Name: link-08
  ip    = ::ffff:10.15.84.158
  state = Logged in
  last state = Logged out
  mode = Slave
  missed beats = 0
  last beat = 1127494104385518
  delay avg = 9998692
  max delay = 18446744072687992192

 Name: link-02
  ip    = ::ffff:10.15.84.152
  state = Logged in
  last state = Was Logged in
  mode = Slave
  missed beats = 0
  last beat = 1127494108379824
  delay avg = 10000074
  max delay = 18446744072689134512

[root@link-01 ~]# gulm_tool getstats link-01:lt000
I_am = Master
run time = 3392
pid = 2225
verbosity = Default
id = 0
partitions = 1
out_queue = 0
drpb_queue = 0
exclusive = 2936
shared = 1333
deferred = 0
lvbs = 7640
expired = 4980
lock ops = 4811750
conflicts = 8
incomming_queue = 0
conflict_queue = 8
reply_queue = 0
free_locks = 9
used_locks = 12886
free_lkrqs = 8
used_lkrqs = 8
free_holders = 0
used_holders = 16889
elss = 4416016
highwater = 1048576

Comment 3 Chris Feist 2005-11-22 19:10:25 UTC

I'm unable to reproduce... Waiting for a way to reproduce this.

Comment 4 Nate Straz 2006-01-06 17:23:51 UTC

I think I hit this bug.  I was running revolver when two slaves were shot.

Iteration 4 started at Fri Jan  6 10:30:25 CST 2006
        Gulm Status
        ===========
        tank-01: Slave
        tank-02: Client
        tank-03: Slave
        tank-04: Client
        tank-05: Master
Those facing the revolver=tank-01 tank-03
...
checking Gulm recovery...
mounting /dev/mapper/gfs-gfs0 on /mnt/gfs0 on tank-01
mounting /dev/mapper/gfs-gfs1 on /mnt/gfs1 on tank-01
mounting /dev/mapper/gfs-gfs2 on /mnt/gfs2 on tank-01
[HANG]

/mnt/gfs0 and /mnt/gfs1 were mounted on tank-01.
The mount for /mnt/gfs2 was hung on tank-01.
I was able to mount /mnt/gfs0 and /mnt/gfs1 on tank-03.
I tried mounting /mnt/gfs2 on tank-03, but that hung too.

It doesn't appear that recovery completed on tank-05.

- Syslog from tank-05:

Jan  6 10:30:55 tank-05 lock_gulmd_core[2502]: tank-01 missed a heartbeat
(time:1136565055589138 mb:1) 
Jan  6 10:31:10 tank-05 lock_gulmd_core[2502]: tank-03 missed a heartbeat
(time:1136565070591488 mb:1) 
Jan  6 10:31:10 tank-05 lock_gulmd_core[2502]: tank-01 missed a heartbeat
(time:1136565070591488 mb:2) 
Jan  6 10:31:25 tank-05 lock_gulmd_core[2502]: tank-03 missed a heartbeat
(time:1136565085594840 mb:2) 
Jan  6 10:31:25 tank-05 lock_gulmd_core[2502]: tank-01 missed a heartbeat
(time:1136565085594840 mb:3) 
Jan  6 10:31:25 tank-05 lock_gulmd_core[2502]: Client (tank-01) expired 
Jan  6 10:31:25 tank-05 lock_gulmd_core[4109]: Gonna exec fence_node -O tank-01 
Jan  6 10:31:25 tank-05 lock_gulmd_core[2502]: Forked [4109] fence_node -O
tank-01 with a 0 pause. 
Jan  6 10:31:28 tank-05 fence_node[4109]: Fence of "tank-01" was successful 
Jan  6 10:31:28 tank-05 lock_gulmd_core[2502]: found match on pid 4109, marking
node tank-01 as logged out. 
Jan  6 10:31:28 tank-05 kernel: lock_gulm: Checking for journals for node "tank-01"
Jan  6 10:31:29 tank-05 clurgmgrd[3019]: <info> Magma Event: Membership Change 
Jan  6 10:31:29 tank-05 clurgmgrd[3019]: <info> State change: tank-01 DOWN 
Jan  6 10:31:40 tank-05 lock_gulmd_core[2502]: tank-03 missed a heartbeat
(time:1136565100597186 mb:3) 
Jan  6 10:31:40 tank-05 lock_gulmd_core[2502]: Client (tank-03) expired 
Jan  6 10:31:40 tank-05 lock_gulmd_core[2502]: Core lost slave quorum. Have 1,
need 2. Switching to Arbitrating. 
Jan  6 10:31:40 tank-05 lock_gulmd_core[4111]: Gonna exec fence_node -O tank-03 
Jan  6 10:31:40 tank-05 lock_gulmd_core[2502]: Forked [4111] fence_node -O
tank-03 with a 0 pause. 
Jan  6 10:31:40 tank-05 clurgmgrd[3019]: <emerg> #1: Quorum Dissolved 
Jan  6 10:31:44 tank-05 fence_node[4111]: Fence of "tank-03" was successful 
Jan  6 10:31:44 tank-05 lock_gulmd_core[2502]: found match on pid 4111, marking
node tank-03 as logged out. 
Jan  6 10:31:44 tank-05 kernel: lock_gulm: Checking for journals for node "tank-03"
Jan  6 10:31:48 tank-05 ccsd[2440]: Cluster is not quorate.  Refusing connection. 
Jan  6 10:31:48 tank-05 ccsd[2440]: Error while processing connect: Connection
refused 
Jan  6 10:31:58 tank-05 ccsd[2440]: Cluster is not quorate.  Refusing connection. 
Jan  6 10:31:58 tank-05 ccsd[2440]: Error while processing connect: Connection
refused 
Jan  6 10:32:08 tank-05 ccsd[2440]: Cluster is not quorate.  Refusing connection. 
Jan  6 10:32:08 tank-05 ccsd[2440]: Error while processing connect: Connection
refused 
Jan  6 10:32:18 tank-05 ccsd[2440]: Cluster is not quorate.  Refusing connection. 
Jan  6 10:32:18 tank-05 ccsd[2440]: Error while processing connect: Connection
refused 
Jan  6 10:32:28 tank-05 ccsd[2440]: Cluster is not quorate.  Refusing connection. 
Jan  6 10:32:28 tank-05 ccsd[2440]: Error while processing connect: Connection
refused 
Jan  6 10:32:38 tank-05 ccsd[2440]: Cluster is not quorate.  Refusing connection. 
Jan  6 10:32:18 tank-05 ccsd[2440]: Cluster is not quorate.  Refusing connection. 
Jan  6 10:32:18 tank-05 ccsd[2440]: Error while processing connect: Connection
refused 
Jan  6 10:32:28 tank-05 ccsd[2440]: Cluster is not quorate.  Refusing connection. 
Jan  6 10:32:28 tank-05 ccsd[2440]: Error while processing connect: Connection
refused 
Jan  6 10:32:38 tank-05 ccsd[2440]: Cluster is not quorate.  Refusing connection. 
Jan  6 10:32:38 tank-05 ccsd[2440]: Error while processing connect: Connection
refused 
Jan  6 10:32:48 tank-05 ccsd[2440]: Cluster is not quorate.  Refusing connection. 
Jan  6 10:32:48 tank-05 ccsd[2440]: Error while processing connect: Connection
refused 
Jan  6 10:32:58 tank-05 ccsd[2440]: Cluster is not quorate.  Refusing connection. 
Jan  6 10:32:58 tank-05 ccsd[2440]: Error while processing connect: Connection
refused 
Jan  6 10:33:06 tank-05 lock_gulmd_core[2502]: Now have Slave quorum, going full
Master. 
Jan  6 10:33:06 tank-05 lock_gulmd_core[2502]: New Client: idx:1 fd:6 from
tank-01 ::ffff:10.15.84.91 
Jan  6 10:33:06 tank-05 clurgmgrd[3019]: <notice> Quorum Achieved 
Jan  6 10:33:06 tank-05 clurgmgrd[3019]: <info> Magma Event: Membership Change 
Jan  6 10:33:06 tank-05 clurgmgrd[3019]: <info> State change: Local UP 
Jan  6 10:33:06 tank-05 clurgmgrd[3019]: <info> Loading Service Data 
Jan  6 10:33:06 tank-05 ccsd[2440]: Cluster is quorate.  Allowing connections. 
Jan  6 10:33:07 tank-05 clurgmgrd[3019]: <info> Initializing Services 
Jan  6 10:33:07 tank-05 clurgmgrd[3019]: <info> Services Initialized 
Jan  6 10:33:07 tank-05 clurgmgrd[3019]: <info> State change: tank-04 UP 
Jan  6 10:33:07 tank-05 clurgmgrd[3019]: <info> State change: tank-02 UP 
Jan  6 10:33:07 tank-05 clurgmgrd[3019]: <info> State change: tank-01 UP 
Jan  6 10:33:07 tank-05 lock_gulmd_LT000[2506]: Attached slave tank-01
::ffff:10.15.84.91 idx:3 fd:8 (soff:2 connected:0x4) 
Jan  6 10:33:07 tank-05 lock_gulmd_LT000[2506]: New Client: idx 4 fd 9 from
tank-01 ::ffff:10.15.84.91 
Jan  6 10:33:07 tank-05 kernel: GFS: fsid=tank-cluster:gfs1.4: jid=0: Trying to
acquire journal lock...
Jan  6 10:33:07 tank-05 kernel: GFS: fsid=tank-cluster:gfs0.4: jid=0: Trying to
acquire journal lock...
Jan  6 10:33:07 tank-05 kernel: GFS: fsid=tank-cluster:gfs2.4: jid=0: Trying to
acquire journal lock...
Jan  6 10:33:08 tank-05 kernel: GFS: fsid=tank-cluster:gfs0.4: jid=0: Looking at
journal...
Jan  6 10:33:08 tank-05 kernel: GFS: fsid=tank-cluster:gfs1.4: jid=0: Looking at
journal...
Jan  6 10:33:08 tank-05 kernel: GFS: fsid=tank-cluster:gfs2.4: jid=0: Busy
Jan  6 10:33:08 tank-05 kernel: GFS: fsid=tank-cluster:gfs2.4: jid=0: Trying to
acquire journal lock...
Jan  6 10:33:08 tank-05 kernel: GFS: fsid=tank-cluster:gfs2.4: jid=0: Busy
Jan  6 10:33:08 tank-05 kernel: GFS: fsid=tank-cluster:gfs2.4: jid=2: Trying to
acquire journal lock...
Jan  6 10:33:08 tank-05 kernel: GFS: fsid=tank-cluster:gfs2.4: jid=2: Busy
Jan  6 10:33:08 tank-05 kernel: GFS: fsid=tank-cluster:gfs0.4: jid=0: Acquiring
the transaction lock...
Jan  6 10:33:08 tank-05 kernel: GFS: fsid=tank-cluster:gfs1.4: jid=0: Acquiring
the transaction lock...
Jan  6 10:33:08 tank-05 kernel: GFS: fsid=tank-cluster:gfs2.4: jid=2: Trying to
acquire journal lock...
Jan  6 10:33:08 tank-05 kernel: GFS: fsid=tank-cluster:gfs2.4: jid=2: Busy
Jan  6 10:33:08 tank-05 kernel: GFS: fsid=tank-cluster:gfs0.4: jid=0: Replaying
journal...
Jan  6 10:33:08 tank-05 kernel: GFS: fsid=tank-cluster:gfs1.4: jid=0: Replaying
journal...
Jan  6 10:33:09 tank-05 kernel: GFS: fsid=tank-cluster:gfs0.4: jid=0: Replayed
435 of 503 blocks
Jan  6 10:33:09 tank-05 kernel: GFS: fsid=tank-cluster:gfs0.4: jid=0: replays =
435, skips = 15, sames = 53
Jan  6 10:33:09 tank-05 kernel: GFS: fsid=tank-cluster:gfs0.4: jid=0: Journal
replayed in 1s
Jan  6 10:33:09 tank-05 kernel: GFS: fsid=tank-cluster:gfs0.4: jid=0: Done
Jan  6 10:33:09 tank-05 kernel: GFS: fsid=tank-cluster:gfs0.4: jid=0: Trying to
acquire journal lock...
Jan  6 10:33:09 tank-05 kernel: GFS: fsid=tank-cluster:gfs0.4: jid=0: Busy
Jan  6 10:33:09 tank-05 kernel: GFS: fsid=tank-cluster:gfs0.4: jid=2: Trying to
acquire journal lock...
Jan  6 10:33:09 tank-05 kernel: GFS: fsid=tank-cluster:gfs0.4: jid=2: Busy
Jan  6 10:33:09 tank-05 kernel: GFS: fsid=tank-cluster:gfs0.4: jid=2: Trying to
acquire journal lock...
Jan  6 10:33:09 tank-05 kernel: GFS: fsid=tank-cluster:gfs0.4: jid=2: Busy
Jan  6 10:33:09 tank-05 kernel: GFS: fsid=tank-cluster:gfs1.4: jid=0: Replayed
579 of 668 blocks
Jan  6 10:33:09 tank-05 kernel: GFS: fsid=tank-cluster:gfs1.4: jid=0: replays =
579, skips = 25, sames = 64
Jan  6 10:33:09 tank-05 kernel: GFS: fsid=tank-cluster:gfs1.4: jid=0: Journal
replayed in 2s
Jan  6 10:33:09 tank-05 kernel: GFS: fsid=tank-cluster:gfs1.4: jid=0: Done
Jan  6 10:33:09 tank-05 kernel: GFS: fsid=tank-cluster:gfs1.4: jid=0: Trying to
acquire journal lock...
Jan  6 10:33:09 tank-05 kernel: GFS: fsid=tank-cluster:gfs1.4: jid=0: Busy
Jan  6 10:33:09 tank-05 kernel: GFS: fsid=tank-cluster:gfs1.4: jid=2: Trying to
acquire journal lock...
Jan  6 10:33:09 tank-05 kernel: GFS: fsid=tank-cluster:gfs1.4: jid=2: Looking at
journal...
Jan  6 10:33:09 tank-05 kernel: GFS: fsid=tank-cluster:gfs1.4: jid=2: Done
Jan  6 10:33:56 tank-05 lock_gulmd_core[2502]: New Client: idx:9 fd:14 from
tank-03 ::ffff:10.15.84.93 

- Backtrace of mount /mnt/gfs2 on tank-01
PID: 3968   TASK: f5ba0630  CPU: 3   COMMAND: "mount"
 #0 [f586cb94] schedule at c02d0655
 #1 [f586cbf8] wait_for_completion at c02d077a
 #2 [f586cc4c] jid_get_lock_state_inr at f8ca0a8e
 #3 [f586cd30] gulm_mount at f8ca029a
 #4 [f586cd8c] gfs_lm_mount at f8c0e2f9
 #5 [f586cdbc] fill_super at f8c1a676
 #6 [f586cea0] gfs_get_sb at f8c1b246
 #7 [f586ced8] do_kern_mount at c01604c8
 #8 [f586cefc] do_new_mount at c017334e
 #9 [f586cf20] do_mount at c017399b
#10 [f586cf9c] sys_mount at c0173cf2
#11 [f586cfc0] system_call at c02d271c
    EAX: 00000015  EBX: 093b6aa8  ECX: 093b6ac8  EDX: 093b6ad8 
    DS:  007b      ESI: c0ed0000  ES:  007b      EDI: 093b6b08 
    SS:  007b      ESP: bff194fc  EBP: bff19528 
    CS:  0073      EIP: 004527a2  ERR: 00000015  EFLAGS: 00000286 

- Backtraces on tank-02
crash> bt 3801
PID: 3801   TASK: f62df130  CPU: 1   COMMAND: "gfs_recoverd"
 #0 [f6b3dcf8] schedule at c02d0655
 #1 [f6b3dd5c] wait_for_completion at c02d077a
 #2 [f6b3ddb0] jid_get_lock_state_inr at f8c64a8e
 #3 [f6b3dde4] check_for_stale_expires at f8c65193
 #4 [f6b3de80] gulm_recovery_done at f8c64714
 #5 [f6b3dea8] gfs_recover_journal at f8ce48a1
 #6 [f6b3dfc8] gfs_check_journals at f8ce49a6
 #7 [f6b3dfe0] gfs_recoverd at f8cbbedf
 #8 [f6b3dff0] kernel_thread_helper at c01041f3
crash> bt 3797
PID: 3797   TASK: f623b830  CPU: 1   COMMAND: "gulm_Cb_Handler"
 #0 [f6ab0e40] schedule at c02d0655
 #1 [f6ab0ea4] __sched_text_start at c02cfc40
 #2 [f6ab0ed8] __down_failed at c02cfdb7
 #3 [f6ab0ee8] .text.lock.gulm_jid (via check_for_stale_expires) at f8c653d0
 #4 [f6ab0f74] handler at f8c67e34
 #5 [f6ab0ff0] kernel_thread_helper at c01041f3

- Backtraces on tank-03
PID: 3937   TASK: ee1de730  CPU: 0   COMMAND: "mount"
 #0 [f5fd4b94] schedule at c02d0655
 #1 [f5fd4bf8] wait_for_completion at c02d077a
 #2 [f5fd4c4c] jid_get_lock_state_inr at f8c9da8e
 #3 [f5fd4d30] gulm_mount at f8c9d29a
 #4 [f5fd4d8c] gfs_lm_mount at f8bf82f9
 #5 [f5fd4dbc] fill_super at f8c04676
 #6 [f5fd4ea0] gfs_get_sb at f8c05246
 #7 [f5fd4ed8] do_kern_mount at c01604c8
 #8 [f5fd4efc] do_new_mount at c017334e
 #9 [f5fd4f20] do_mount at c017399b
#10 [f5fd4f9c] sys_mount at c0173cf2
#11 [f5fd4fc0] system_call at c02d271c
    EAX: ffffffda  EBX: 09b07bc8  ECX: 09b07be8  EDX: 09b07bf8 
    DS:  007b      ESI: c0ed0000  ES:  007b      EDI: 09b07c28 
    SS:  007b      ESP: bfee8a9c  EBP: bfee8ac8 
    CS:  0073      EIP: 0072e7a2  ERR: 00000015  EFLAGS: 00000286 

- Backtraces on tank-04
crash> bt 3559
PID: 3559   TASK: f651b1b0  CPU: 1   COMMAND: "gfs_recoverd"
 #0 [f6ceacf8] schedule at c02d0655
 #1 [f6cead5c] wait_for_completion at c02d077a
 #2 [f6ceadb0] jid_get_lock_state_inr at f8c9da8e
 #3 [f6ceade4] check_for_stale_expires at f8c9e193
 #4 [f6ceae80] gulm_recovery_done at f8c9d714
 #5 [f6ceaea8] gfs_recover_journal at f8c0b8a1
 #6 [f6ceafc8] gfs_check_journals at f8c0b9a6
 #7 [f6ceafe0] gfs_recoverd at f8be2edf
 #8 [f6ceaff0] kernel_thread_helper at c01041f3
crash> bt 3555
PID: 3555   TASK: f76845b0  CPU: 0   COMMAND: "gulm_Cb_Handler"
 #0 [f6d54e40] schedule at c02d0655
 #1 [f6d54ea4] __sched_text_start at c02cfc40
 #2 [f6d54ed8] __down_failed at c02cfdb7
 #3 [f6d54ee8] .text.lock.gulm_jid (via check_for_stale_expires) at f8c9e3d0
 #4 [f6d54f74] handler at f8ca0e34
 #5 [f6d54ff0] kernel_thread_helper at c01041f3

- The last message on tank-01 and tank-03 after the mount attempt is:
GFS: Trying to join cluster "lock_gulm", "tank-cluster:gfs2"

Comment 5 Chris Feist 2006-02-01 22:19:11 UTC

Waiting to see if this bug re-appears with the latest gulm (1.0.5)

Comment 6 Chris Feist 2006-02-10 22:20:47 UTC

This appears to be a problem with name resolution.  When a master & slave are
killed the remaining slave attempts to mark tank-01.lab.msp.redhat.com expired
instead of tank-01.

Comment 7 Chris Feist 2006-02-28 19:19:46 UTC

Closing this bug because the initial problems have been fixed.  (Further issues
are addressed in bz #183383).

Comment 10 Red Hat Bugzilla 2006-03-09 19:52:25 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0238.html

Note You need to log in before you can comment on or make changes to this bug.