Description of problem: Version-Release number of selected component (if applicable): 1.9.53 How reproducible: Have seen this several times, but unsure how to reproduce Steps to Reproduce: Unknown Actual results: Access to data through VIP services is disabled Expected results: This doesn't happen Additional info: Info from /var/log/messages. Node1: Sep 30 19:44:46 frii01 kernel: dlm: capture: process_lockqueue_reply id 87c3007b state 0 Sep 30 20:24:54 frii01 kernel: dlm: capture: process_lockqueue_reply id 8b4903b5 state 0 Sep 30 20:25:03 frii01 kernel: dlm: midcomms: bad header version 34000045 Sep 30 20:25:03 frii01 kernel: dlm: midcomms: cmd=0, flags=41, length=64, lkid=2226062912, lockspace=17435146 Sep 30 20:25:03 frii01 kernel: dlm: midcomms: base=000001005220d000, offset=1720, len=1736, ret=1720, limit=00001000 newbuf=1 Sep 30 20:25:03 frii01 kernel: 45 00 00 34 00 29 40 00-40 06 af 84 0a 0a 0a 01 Sep 30 20:25:03 frii01 kernel: 0a 0a 0a 02 52 48 95 54-e0 d7 14 c2 50 84 4d c4 Sep 30 20:25:03 frii01 kernel: 80 10 11 33 43 33 00 00-01 01 08 0a 0e 00 96 18 Sep 30 20:25:03 frii01 kernel: 0f 03 cb a5 Sep 30 20:25:03 frii01 kernel: ff ff ff ff Sep 30 20:25:03 frii01 kernel: 46 02 00 00 Sep 30 20:25:03 frii01 kernel: 00 Sep 30 20:25:03 frii01 last message repeated 3 times Sep 30 20:25:03 frii01 kernel: dlm: lowcomms: addr=000001005220d000, base=0, len=3456, iov_len=4096, iov_base[0]=000001005220dd80, read=3 456 Sep 30 20:25:03 frii01 kernel: dlm: capture: process_lockqueue_reply id 8d390207 state 0 Sep 30 20:25:03 frii01 kernel: dlm: capture: process_lockqueue_reply state 0 Sep 30 20:25:03 frii01 kernel: dlm: capture: reply from 2 no lock Sep 30 20:25:03 frii01 kernel: dlm: reply Sep 30 20:25:03 frii01 kernel: rh_cmd 5 Sep 30 20:25:03 frii01 kernel: rh_lkid 8c6402fe Sep 30 20:25:03 frii01 kernel: lockstate 0 Sep 30 20:25:03 frii01 kernel: nodeid 2 Sep 30 20:25:03 frii01 kernel: status 4294901758 Sep 30 20:25:03 frii01 kernel: lkid 0 Sep 30 20:25:03 frii01 kernel: dlm: capture: reply from 2 no lock Sep 30 20:25:03 frii01 kernel: dlm: reply Sep 30 20:25:04 frii01 kernel: rh_cmd 5 Sep 30 20:25:04 frii01 kernel: rh_lkid 8b580314 Sep 30 20:25:04 frii01 kernel: lockstate 0 Sep 30 20:25:04 frii01 kernel: nodeid 2 Sep 30 20:25:04 frii01 kernel: status 4294901758 Sep 30 20:25:04 frii01 kernel: lkid 0 Sep 30 20:25:04 frii01 kernel: dlm: capture: process_lockqueue_reply id 8b6103d0 state 0 Sep 30 20:25:04 frii01 kernel: dlm: capture: process_lockqueue_reply id 8b31021b state 0 Sep 30 20:25:04 frii01 kernel: dlm: capture: reply from 2 no lock Sep 30 20:25:04 frii01 kernel: dlm: reply Sep 30 20:25:04 frii01 kernel: rh_cmd 5 Sep 30 20:25:04 frii01 kernel: rh_lkid 8ac1010c Sep 30 20:25:04 frii01 kernel: lockstate 0 Sep 30 20:25:04 frii01 kernel: nodeid 1 Sep 30 20:25:04 frii01 kernel: status 4294901758 Sep 30 20:25:04 frii01 kernel: lkid 0 Sep 30 20:25:04 frii01 kernel: dlm: capture: process_lockqueue_reply id 88d901b9 state 0 Sep 30 20:25:04 frii01 kernel: dlm: capture: reply from 2 no lock Sep 30 20:25:04 frii01 kernel: dlm: reply Sep 30 20:25:04 frii01 kernel: rh_cmd 5 Sep 30 20:25:04 frii01 kernel: rh_lkid 8c340274 Sep 30 20:25:04 frii01 kernel: lockstate 0 Sep 30 20:25:04 frii01 kernel: nodeid 1 Sep 30 20:25:04 frii01 kernel: status 4294901758 Sep 30 20:25:04 frii01 kernel: lkid 0 Sep 30 20:25:04 frii01 kernel: dlm: capture: reply from 2 no lock Sep 30 20:25:04 frii01 kernel: dlm: reply Sep 30 20:25:04 frii01 kernel: rh_cmd 5 Sep 30 20:25:04 frii01 kernel: rh_lkid 8afb024a Sep 30 20:25:04 frii01 kernel: lockstate 0 Sep 30 20:25:04 frii01 kernel: nodeid 1 Sep 30 20:25:04 frii01 kernel: status 4294901758 Sep 30 20:25:04 frii01 kernel: lkid 0 Sep 30 20:25:04 frii01 kernel: dlm: capture: process_lockqueue_reply id 89150231 state 0 Sep 30 20:25:04 frii01 kernel: dlm: capture: process_lockqueue_reply id 8a0b0264 state 0 Sep 30 20:25:04 frii01 kernel: dlm: capture: process_lockqueue_reply id 89b50211 state 0 Sep 30 20:25:04 frii01 kernel: dlm: capture: reply from 2 no lock Sep 30 20:25:04 frii01 kernel: dlm: reply Sep 30 20:25:04 frii01 kernel: rh_cmd 5 Sep 30 20:25:04 frii01 kernel: rh_lkid 8b4e00b3 Sep 30 20:25:04 frii01 kernel: lockstate 0 Sep 30 20:25:04 frii01 kernel: nodeid 1 Sep 30 20:25:04 frii01 kernel: status 4294901758 Sep 30 20:25:04 frii01 kernel: lkid 0 Sep 30 20:25:04 frii01 kernel: dlm: capture: process_lockqueue_reply id 8bbd02de state 0 Sep 30 20:25:04 frii01 kernel: dlm: capture: reply from 2 no lock Sep 30 20:25:04 frii01 kernel: dlm: reply Sep 30 20:25:04 frii01 kernel: rh_cmd 5 Sep 30 20:25:04 frii01 kernel: rh_lkid 8b0000ff Sep 30 20:25:04 frii01 kernel: lockstate 0 Sep 30 20:25:04 frii01 kernel: nodeid 1 Sep 30 20:25:04 frii01 kernel: status 4294901758 Sep 30 20:25:04 frii01 kernel: lkid 0 Sep 30 20:25:04 frii01 kernel: dlm: capture: reply from 2 no lock Sep 30 20:25:05 frii01 kernel: dlm: reply Sep 30 20:25:05 frii01 kernel: rh_cmd 5 Sep 30 20:25:05 frii01 kernel: rh_lkid 8c6303df Sep 30 20:25:05 frii01 kernel: lockstate 0 Sep 30 20:25:05 frii01 kernel: nodeid 1 Sep 30 20:25:05 frii01 kernel: status 4294901758 Sep 30 20:25:05 frii01 kernel: lkid 0 Sep 30 20:25:05 frii01 kernel: dlm: capture: process_lockqueue_reply id 89c902a2 state 0 Sep 30 20:25:05 frii01 kernel: dlm: capture: reply from 2 no lock Sep 30 20:25:05 frii01 kernel: dlm: reply Sep 30 20:25:05 frii01 kernel: rh_cmd 5 Sep 30 20:25:05 frii01 kernel: rh_lkid 8a6901d9 Sep 30 20:25:05 frii01 kernel: lockstate 0 Sep 30 20:25:05 frii01 kernel: nodeid 1 Sep 30 20:25:05 frii01 kernel: status 4294901758 Sep 30 20:25:05 frii01 kernel: lkid 0 Sep 30 20:25:05 frii01 kernel: dlm: capture: process_lockqueue_reply id 8c4c0063 state 0 Sep 30 20:25:05 frii01 kernel: dlm: capture: reply from 2 no lock Sep 30 20:25:05 frii01 kernel: dlm: reply Sep 30 20:25:05 frii01 kernel: rh_cmd 5 Sep 30 20:25:05 frii01 kernel: rh_lkid 8b9b0199 Sep 30 20:25:05 frii01 kernel: lockstate 0 Sep 30 20:25:05 frii01 kernel: nodeid 1 Sep 30 20:25:05 frii01 kernel: status 4294901758 Sep 30 20:25:05 frii01 kernel: lkid 0 Sep 30 20:25:05 frii01 kernel: dlm: capture: process_lockqueue_reply id 8b600314 state 0 Sep 30 20:25:05 frii01 kernel: dlm: capture: reply from 2 no lock Sep 30 20:25:05 frii01 kernel: dlm: reply Sep 30 20:25:05 frii01 kernel: rh_cmd 5 Sep 30 20:25:05 frii01 kernel: rh_lkid 8ab30074 Sep 30 20:25:05 frii01 kernel: lockstate 0 Sep 30 20:25:05 frii01 kernel: nodeid 2 Sep 30 20:25:05 frii01 kernel: status 4294901758 Sep 30 20:25:05 frii01 kernel: lkid 0 Sep 30 20:25:05 frii01 kernel: dlm: midcomms: bad header version 34000045 Sep 30 20:25:05 frii01 kernel: dlm: midcomms: cmd=0, flags=41, length=64, lkid=2226062912, lockspace=17435146 Sep 30 20:25:05 frii01 kernel: dlm: midcomms: base=000001005220d000, offset=1720, len=2168, ret=1720, limit=00001000 newbuf=1 Sep 30 20:25:05 frii01 kernel: 45 00 00 34 00 29 40 00-40 06 af 84 0a 0a 0a 01 Sep 30 20:25:05 frii01 kernel: 0a 0a 0a 02 52 48 95 54-e0 d7 14 c2 50 84 4d c4 Sep 30 20:25:05 frii01 kernel: 80 10 11 33 43 33 00 00-01 01 08 0a 0e 00 96 18 Sep 30 20:25:05 frii01 kernel: 0f 03 cb a5 Sep 30 20:25:05 frii01 kernel: ff ff ff ff Sep 30 20:25:05 frii01 kernel: 46 02 00 00 Sep 30 20:25:05 frii01 kernel: 00 Sep 30 20:25:05 frii01 last message repeated 3 times Sep 30 20:25:05 frii01 kernel: dlm: lowcomms: addr=000001005220d000, base=0, len=3888, iov_len=640, iov_base[0]=000001005220df30, read=43 2 Sep 30 20:25:05 frii01 kernel: dlm: capture: process_lockqueue_reply id 8d390207 state 0 Sep 30 20:25:05 frii01 kernel: dlm: capture: process_lockqueue_reply id 8b3c00d7 state 0 Sep 30 20:25:05 frii01 kernel: dlm: capture: reply from 2 no lock Sep 30 20:25:05 frii01 kernel: dlm: reply Sep 30 20:25:05 frii01 kernel: rh_cmd 5 Sep 30 20:25:05 frii01 kernel: rh_lkid 8c6402fe Sep 30 20:25:05 frii01 kernel: lockstate 0 Sep 30 20:25:05 frii01 kernel: nodeid 2 Sep 30 20:25:05 frii01 kernel: status 4294901758 Sep 30 20:25:05 frii01 kernel: lkid 0 Sep 30 20:25:05 frii01 kernel: dlm: capture: reply from 2 no lock Sep 30 20:25:05 frii01 kernel: dlm: reply Sep 30 20:25:06 frii01 kernel: rh_cmd 5 Sep 30 20:25:06 frii01 kernel: rh_lkid 8b580314 Sep 30 20:25:06 frii01 kernel: lockstate 0 Sep 30 20:25:06 frii01 kernel: nodeid 2 Sep 30 20:25:06 frii01 kernel: status 4294901758 Sep 30 20:25:06 frii01 kernel: lkid 0 Sep 30 20:25:06 frii01 kernel: dlm: capture: process_lockqueue_reply id 8b6103d0 state 0 Sep 30 20:25:06 frii01 kernel: dlm: capture: process_lockqueue_reply id 8b31021b state 0 Sep 30 20:25:06 frii01 kernel: dlm: capture: reply from 2 no lock Sep 30 20:25:06 frii01 kernel: dlm: reply Sep 30 20:25:06 frii01 kernel: rh_cmd 5 Sep 30 20:25:06 frii01 kernel: rh_lkid 8ac1010c Sep 30 20:25:06 frii01 kernel: lockstate 0 Sep 30 20:25:06 frii01 kernel: nodeid 1 Sep 30 20:25:06 frii01 kernel: status 4294901758 Sep 30 20:25:06 frii01 kernel: lkid 0 Sep 30 20:25:06 frii01 kernel: dlm: capture: process_lockqueue_reply id 88d901b9 state 0 Sep 30 20:25:06 frii01 kernel: dlm: capture: reply from 2 no lock Sep 30 20:25:06 frii01 kernel: dlm: reply Sep 30 20:25:06 frii01 kernel: rh_cmd 5 Sep 30 20:25:06 frii01 kernel: rh_lkid 8c340274 Sep 30 20:25:06 frii01 kernel: lockstate 0 Sep 30 20:25:06 frii01 kernel: nodeid 1 Sep 30 20:25:06 frii01 kernel: status 4294901758 Sep 30 20:25:06 frii01 kernel: lkid 0 Sep 30 20:25:06 frii01 kernel: dlm: capture: reply from 2 no lock Sep 30 20:25:06 frii01 kernel: dlm: reply Sep 30 20:25:06 frii01 kernel: rh_cmd 5 Sep 30 20:25:06 frii01 kernel: rh_lkid 8afb024a Sep 30 20:25:06 frii01 kernel: lockstate 0 Sep 30 20:25:06 frii01 kernel: nodeid 1 Sep 30 20:25:06 frii01 kernel: status 4294901758 Sep 30 20:25:06 frii01 kernel: lkid 0 Sep 30 20:25:06 frii01 kernel: dlm: capture: process_lockqueue_reply id 89150231 state 0 Sep 30 20:25:06 frii01 kernel: dlm: capture: process_lockqueue_reply id 8a0b0264 state 0 Sep 30 20:25:06 frii01 kernel: dlm: capture: process_lockqueue_reply id 89b50211 state 0 Sep 30 20:25:06 frii01 kernel: dlm: capture: reply from 2 no lock Sep 30 20:25:06 frii01 kernel: dlm: reply Sep 30 20:25:06 frii01 kernel: rh_cmd 5 Sep 30 20:25:06 frii01 kernel: rh_lkid 8b4e00b3 Sep 30 20:25:06 frii01 kernel: lockstate 0 Sep 30 20:25:06 frii01 kernel: nodeid 1 Sep 30 20:25:06 frii01 kernel: status 4294901758 Sep 30 20:25:06 frii01 kernel: lkid 0 Sep 30 20:25:06 frii01 kernel: dlm: capture: process_lockqueue_reply id 8bbd02de state 0 Sep 30 20:25:06 frii01 kernel: dlm: capture: reply from 2 no lock Sep 30 20:25:06 frii01 kernel: dlm: reply Sep 30 20:25:06 frii01 kernel: rh_cmd 5 Sep 30 20:25:06 frii01 kernel: rh_lkid 8b0000ff Sep 30 20:25:06 frii01 kernel: lockstate 0 Sep 30 20:25:06 frii01 kernel: nodeid 1 Sep 30 20:25:06 frii01 kernel: status 4294901758 Sep 30 20:25:06 frii01 kernel: lkid 0 Sep 30 20:25:06 frii01 kernel: dlm: capture: reply from 2 no lock Sep 30 20:25:06 frii01 kernel: dlm: reply Sep 30 20:25:07 frii01 kernel: rh_cmd 5 Sep 30 20:25:07 frii01 kernel: rh_lkid 8c6303df Sep 30 20:25:07 frii01 kernel: lockstate 0 Sep 30 20:25:07 frii01 kernel: nodeid 1 Sep 30 20:25:07 frii01 kernel: status 4294901758 Sep 30 20:25:07 frii01 kernel: lkid 0 Sep 30 20:25:07 frii01 kernel: dlm: capture: process_lockqueue_reply id 89c902a2 state 0 Sep 30 20:25:07 frii01 kernel: dlm: capture: reply from 2 no lock Sep 30 20:25:07 frii01 kernel: dlm: reply Sep 30 20:25:07 frii01 kernel: rh_cmd 5 Sep 30 20:25:07 frii01 kernel: rh_lkid 8a6901d9 Sep 30 20:25:07 frii01 kernel: lockstate 0 Sep 30 20:25:07 frii01 kernel: nodeid 1 Sep 30 20:25:07 frii01 kernel: status 4294901758 Sep 30 20:25:07 frii01 kernel: lkid 0 Sep 30 20:25:07 frii01 kernel: dlm: capture: process_lockqueue_reply id 8c4c0063 state 0 Sep 30 20:25:07 frii01 kernel: dlm: capture: reply from 2 no lock Sep 30 20:25:07 frii01 kernel: dlm: reply Sep 30 20:25:07 frii01 kernel: rh_cmd 5 Sep 30 20:25:07 frii01 kernel: rh_lkid 8b9b0199 Sep 30 20:25:07 frii01 kernel: lockstate 0 Sep 30 20:25:07 frii01 kernel: nodeid 1 Sep 30 20:25:07 frii01 kernel: status 4294901758 Sep 30 20:25:07 frii01 kernel: lkid 0 Sep 30 20:25:07 frii01 kernel: dlm: capture: process_lockqueue_reply id 8b600314 state 0 Sep 30 20:25:07 frii01 kernel: dlm: capture: reply from 2 no lock Sep 30 20:25:07 frii01 kernel: dlm: reply Sep 30 20:25:07 frii01 kernel: rh_cmd 5 Sep 30 20:25:07 frii01 kernel: rh_lkid 8ab30074 Sep 30 20:25:07 frii01 kernel: lockstate 0 Sep 30 20:25:07 frii01 kernel: nodeid 2 Sep 30 20:25:07 frii01 kernel: status 4294901758 Sep 30 20:25:07 frii01 kernel: lkid 0 Sep 30 20:25:07 frii01 kernel: dlm: midcomms: bad header version 34000045 Sep 30 20:25:07 frii01 kernel: dlm: midcomms: cmd=0, flags=41, length=64, lkid=2226062912, lockspace=17435146 Sep 30 20:25:07 frii01 kernel: dlm: midcomms: base=000001005220d000, offset=1720, len=2376, ret=1720, limit=00001000 newbuf=1 Sep 30 20:25:07 frii01 kernel: 45 00 00 34 00 29 40 00-40 06 af 84 0a 0a 0a 01 Sep 30 20:25:07 frii01 kernel: 0a 0a 0a 02 52 48 95 54-e0 d7 14 c2 50 84 4d c4 Sep 30 20:25:07 frii01 kernel: 80 10 11 33 43 33 00 00-01 01 08 0a 0e 00 96 18 Sep 30 20:25:07 frii01 kernel: 0f 03 cb a5 Sep 30 20:25:07 frii01 kernel: ff ff ff ff Sep 30 20:25:07 frii01 kernel: 46 02 00 00 Sep 30 20:25:07 frii01 kernel: 00 Sep 30 20:25:07 frii01 last message repeated 3 times Sep 30 20:25:07 frii01 kernel: dlm: lowcomms: addr=000001005220d000, base=0, len=4096, iov_len=208, iov_base[0]=000001005220e000, read=20 8 Sep 30 20:25:50 frii01 clurgmgrd[15603]: <err> #48: Unable to obtain cluster lock: Connection timed out Sep 30 20:26:35 frii01 clurgmgrd[15603]: <err> #50: Unable to obtain cluster lock: Connection timed out Sep 30 20:27:05 frii01 clurgmgrd[15603]: <err> #48: Unable to obtain cluster lock: Connection timed out Sep 30 20:27:50 frii01 clurgmgrd[15603]: <err> #50: Unable to obtain cluster lock: Connection timed out Sep 30 20:27:51 frii01 clvmd: Cluster LVM daemon started - connected to CMAN Sep 30 20:28:35 frii01 clurgmgrd[15603]: <err> #48: Unable to obtain cluster lock: Connection timed out Sep 30 20:29:05 frii01 clurgmgrd[15603]: <err> #50: Unable to obtain cluster lock: Connection timed out Sep 30 20:29:50 frii01 clurgmgrd[15603]: <err> #48: Unable to obtain cluster lock: Connection timed out Sep 30 20:30:35 frii01 clurgmgrd[15603]: <err> #50: Unable to obtain cluster lock: Connection timed out Sep 30 20:31:20 frii01 clurgmgrd[15603]: <err> #48: Unable to obtain cluster lock: Connection timed out Sep 30 20:31:50 frii01 clurgmgrd[15603]: <err> #50: Unable to obtain cluster lock: Connection timed out Sep 30 20:32:20 frii01 clurgmgrd[15603]: <err> #48: Unable to obtain cluster lock: Connection timed out Sep 30 20:33:05 frii01 clurgmgrd[15603]: <err> #50: Unable to obtain cluster lock: Connection timed out Sep 30 20:33:35 frii01 clurgmgrd[15603]: <err> #48: Unable to obtain cluster lock: Connection timed out Sep 30 20:34:05 frii01 clurgmgrd[15603]: <err> #50: Unable to obtain cluster lock: Connection timed out Sep 30 20:34:50 frii01 clurgmgrd[15603]: <err> #48: Unable to obtain cluster lock: Connection timed out Sep 30 20:35:35 frii01 clurgmgrd[15603]: <err> #50: Unable to obtain cluster lock: Connection timed out Sep 30 20:36:20 frii01 clurgmgrd[15603]: <err> #48: Unable to obtain cluster lock: Connection timed out Sep 30 20:37:05 frii01 clurgmgrd[15603]: <err> #50: Unable to obtain cluster lock: Connection timed out Sep 30 20:37:50 frii01 clurgmgrd[15603]: <err> #48: Unable to obtain cluster lock: Connection timed out Sep 30 20:38:20 frii01 clurgmgrd[15603]: <err> #50: Unable to obtain cluster lock: Connection timed out Sep 30 20:39:05 frii01 clurgmgrd[15603]: <err> #48: Unable to obtain cluster lock: Connection timed out Sep 30 20:39:50 frii01 clurgmgrd[15603]: <err> #50: Unable to obtain cluster lock: Connection timed out Sep 30 20:40:20 frii01 clurgmgrd[15603]: <err> #48: Unable to obtain cluster lock: Connection timed out Sep 30 20:41:05 frii01 clurgmgrd[15603]: <err> #50: Unable to obtain cluster lock: Connection timed out Sep 30 20:41:50 frii01 clurgmgrd[15603]: <err> #48: Unable to obtain cluster lock: Connection timed out Sep 30 20:42:20 frii01 clurgmgrd[15603]: <err> #50: Unable to obtain cluster lock: Connection timed out Sep 30 20:43:05 frii01 clurgmgrd[15603]: <err> #48: Unable to obtain cluster lock: Connection timed out Sep 30 20:43:50 frii01 clurgmgrd[15603]: <err> #50: Unable to obtain cluster lock: Connection timed out Sep 30 20:44:35 frii01 clurgmgrd[15603]: <err> #48: Unable to obtain cluster lock: Connection timed out Sep 30 20:45:20 frii01 clurgmgrd[15603]: <err> #50: Unable to obtain cluster lock: Connection timed out Node 2: Sep 30 20:26:26 frii02 clurgmgrd[17122]: <err> #49: Failed getting status for RG 172.16.107.225 Sep 30 20:27:56 frii02 clurgmgrd[17122]: <err> #49: Failed getting status for RG 172.16.106.225 Sep 30 20:29:26 frii02 clurgmgrd[17122]: <err> #48: Unable to obtain cluster lock: Connection timed out Sep 30 20:30:11 frii02 clurgmgrd[17122]: <err> #50: Unable to obtain cluster lock: Connection timed out Sep 30 20:30:56 frii02 clurgmgrd[17122]: <err> #48: Unable to obtain cluster lock: Connection timed out Sep 30 20:31:41 frii02 clurgmgrd[17122]: <err> #50: Unable to obtain cluster lock: Connection timed out Sep 30 20:32:26 frii02 clurgmgrd[17122]: <err> #48: Unable to obtain cluster lock: Connection timed out Sep 30 20:33:11 frii02 clurgmgrd[17122]: <err> #50: Unable to obtain cluster lock: Connection timed out Sep 30 20:33:56 frii02 clurgmgrd[17122]: <err> #48: Unable to obtain cluster lock: Connection timed out Sep 30 20:34:41 frii02 clurgmgrd[17122]: <err> #50: Unable to obtain cluster lock: Connection timed out Sep 30 20:35:26 frii02 clurgmgrd[17122]: <err> #48: Unable to obtain cluster lock: Connection timed out Sep 30 20:35:56 frii02 clurgmgrd[17122]: <err> #50: Unable to obtain cluster lock: Connection timed out Sep 30 20:36:41 frii02 clurgmgrd[17122]: <err> #48: Unable to obtain cluster lock: Connection timed out Sep 30 20:37:26 frii02 clurgmgrd[17122]: <err> #50: Unable to obtain cluster lock: Connection timed out Sep 30 20:38:11 frii02 clurgmgrd[17122]: <err> #48: Unable to obtain cluster lock: Connection timed out Sep 30 20:38:56 frii02 clurgmgrd[17122]: <err> #50: Unable to obtain cluster lock: Connection timed out Sep 30 20:40:26 frii02 clurgmgrd[17122]: <err> #51: Failed getting status for RG 172.16.107.225 Sep 30 20:41:56 frii02 clurgmgrd[17122]: <err> #48: Unable to obtain cluster lock: Connection timed out Sep 30 20:42:41 frii02 clurgmgrd[17122]: <err> #50: Unable to obtain cluster lock: Connection timed out Sep 30 20:43:26 frii02 clurgmgrd[17122]: <err> #48: Unable to obtain cluster lock: Connection timed out Sep 30 20:44:11 frii02 clurgmgrd[17122]: <err> #50: Unable to obtain cluster lock: Connection timed out Sep 30 20:44:56 frii02 clurgmgrd[17122]: <err> #48: Unable to obtain cluster lock: Connection timed out Sep 30 20:45:41 frii02 clurgmgrd[17122]: <err> #50: Unable to obtain cluster lock: Connection timed out
*** Bug 200841 has been marked as a duplicate of this bug. ***
There's a possibility that this is fixed in U4. If it isn't, we'll need to work to get a reliable test-case which causes it.
Setting to component 'dlm', but waiting for more information.
After comparing notes, the symptoms here are caused by a memory leak in the DLM which is caused by rgmanager. This leak has been fixed in Red Hat Cluster Suite 4 Update 5. Please upgrade magma, magma-plugins, dlm, dlm-kernel, and rgmanager if you continue to see this problem on a prior release.
Note - pursuant to bug #247766, there is a chance that this was caused not specifically by the rgmanager lock-leak in RHCS 4.4, but in general by a large number of locks outstanding.