Bug 149119
Summary: | recovery causes stuck lock_gulm while mounting | ||
---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Corey Marthaler <cmarthal> |
Component: | gulm | Assignee: | michael conrad tadpol tilstra <mtilstra> |
Status: | CLOSED DUPLICATE | QA Contact: | Cluster QE <mspqa-list> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4 | CC: | cluster-maint |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2005-02-21 16:31:29 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Corey Marthaler
2005-02-18 22:14:47 UTC
Here's the syslog from morph-02 the node who became the new master: Feb 18 15:30:59 morph-02 lock_gulmd_core[5975]: Failed to receive a timely heartbeat reply from Master. (t:110 8758659731017 mb:1) Feb 18 15:31:14 morph-02 lock_gulmd_core[5975]: Failed to receive a timely heartbeat reply from Master. (t:110 8758674731737 mb:2) Feb 18 15:31:29 morph-02 lock_gulmd_core[5975]: Failed to receive a timely heartbeat reply from Master. (t:110 8758689732455 mb:3) Feb 18 15:31:49 morph-02 lock_gulmd_core[5975]: I see no Masters, So I am Arbitrating until enough Slaves talk to me. Feb 18 15:31:49 morph-02 lock_gulmd_core[5975]: Could not send quorum update to slave morph-03.lab.msp.redhat.com Feb 18 15:31:49 morph-02 lock_gulmd_core[5975]: Could not send quorum update to slave morph-05.lab.msp.redhat.com Feb 18 15:31:49 morph-02 lock_gulmd_core[5975]: Could not send quorum update to slave morph-04.lab.msp.redhat.com Feb 18 15:31:49 morph-02 lock_gulmd_core[5975]: Could not send quorum update to slave morph-02.lab.msp.redhat.com Feb 18 15:31:49 morph-02 lock_gulmd_core[5975]: LastMaster morph-01.lab.msp.redhat.com ::ffff:192.168.44.61, is being marked Expired. Feb 18 15:31:49 morph-02 lock_gulmd_core[5975]: Could not send membership update "Expired" about morph-01.lab. msp.redhat.com to slave morph-03.lab.msp.redhat.com Feb 18 15:31:49 morph-02 lock_gulmd_core[5975]: Could not send membership update "Expired" about morph-01.lab. msp.redhat.com to slave morph-05.lab.msp.redhat.com Feb 18 15:31:49 morph-02 lock_gulmd_core[5975]: Could not send membership update "Expired" about morph-01.lab. msp.redhat.com to slave morph-04.lab.msp.redhat.com Feb 18 15:31:49 morph-02 lock_gulmd_core[5975]: Could not send membership update "Expired" about morph-01.lab. msp.redhat.com to slave morph-02.lab.msp.redhat.com Feb 18 15:31:49 morph-02 lock_gulmd_core[7372]: Gonna exec fence_node -O morph-01.lab.msp.redhat.com Feb 18 15:31:49 morph-02 lock_gulmd_core[5975]: Forked [7372] fence_node -O morph-01.lab.msp.redhat.com with a 0 pause. Feb 18 15:31:49 morph-02 lock_gulmd_LTPX[5983]: New Master at morph-02.lab.msp.redhat.com ::ffff:192.168.44.62 Feb 18 15:32:03 morph-02 fence_node[7372]: Fence of "morph-01.lab.msp.redhat.com" was successful Feb 18 15:32:03 morph-02 lock_gulmd_core[5975]: found match on pid 7372, marking node morph-01.lab.msp.redhat. com as logged out. Feb 18 15:32:03 morph-02 lock_gulmd_core[5975]: Member update message Fenced about morph-01.lab.msp.redhat.com to morph-03.lab.msp.redhat.com is lost because node is in OM Feb 18 15:32:03 morph-02 lock_gulmd_core[5975]: Member update message Fenced about morph-01.lab.msp.redhat.com to morph-05.lab.msp.redhat.com is lost because node is in OM Feb 18 15:32:03 morph-02 lock_gulmd_core[5975]: Member update message Fenced about morph-01.lab.msp.redhat.com to morph-04.lab.msp.redhat.com is lost because node is in OM Feb 18 15:32:03 morph-02 kernel: lock_gulm: Checking for journals for node "morph-01.lab.msp.redhat.com" Feb 18 15:32:04 morph-02 lock_gulmd_core[5975]: morph-03.lab.msp.redhat.com missed a heartbeat (time:110875872 4751132 mb:1) Feb 18 15:32:04 morph-02 lock_gulmd_core[5975]: morph-05.lab.msp.redhat.com missed a heartbeat (time:110875872 4751132 mb:1) Feb 18 15:32:04 morph-02 lock_gulmd_core[5975]: morph-04.lab.msp.redhat.com missed a heartbeat (time:110875872 4751132 mb:1) Feb 18 15:32:06 morph-02 lock_gulmd_core[5975]: Timeout (15000000) on fd:9 (morph-01.lab.msp.redhat.com ::ffff :192.168.44.61) Feb 18 15:32:15 morph-02 lock_gulmd_core[5975]: Still in Arbitrating: Have 2, need 3 for quorum. Feb 18 15:32:15 morph-02 lock_gulmd_core[5975]: New Client: idx:7 fd:12 from morph-04.lab.msp.redhat.com ::ffff:192.168.44.64 Feb 18 15:32:15 morph-02 lock_gulmd_core[5975]: Member update message Logged in about morph-04.lab.msp.redhat.com to morph-03.lab.msp.redhat.com is lost because node is in OM Feb 18 15:32:15 morph-02 lock_gulmd_core[5975]: Member update message Logged in about morph-04.lab.msp.redhat.com to morph-05.lab.msp.redhat.com is lost because node is in OM Feb 18 15:32:15 morph-02 lock_gulmd_LT000[5979]: Attached slave morph-04.lab.msp.redhat.com ::ffff:192.168.44.64 idx:2 fd:7 (soff:3 connected:0x8) Feb 18 15:32:19 morph-02 lock_gulmd_core[5975]: morph-03.lab.msp.redhat.com missed a heartbeat (time:1108758739755874 mb:2) Feb 18 15:32:19 morph-02 lock_gulmd_core[5975]: morph-05.lab.msp.redhat.com missed a heartbeat (time:1108758739755874 mb:2) Feb 18 15:32:20 morph-02 lock_gulmd_core[5975]: Now have Slave quorum, going full Master. Feb 18 15:32:20 morph-02 lock_gulmd_core[5975]: New Client: idx:4 fd:9 from morph-05.lab.msp.redhat.com ::ffff:192.168.44.65 Feb 18 15:32:20 morph-02 lock_gulmd_core[5975]: Member update message Logged in about morph-05.lab.msp.redhat.com to morph-03.lab.msp.redhat.com is lost because node is in OM Feb 18 15:32:20 morph-02 lock_gulmd_LT000[5979]: New Client: idx 3 fd 8 from morph-02.lab.msp.redhat.com ::ffff:192.168.44.62 Feb 18 15:32:20 morph-02 lock_gulmd_LTPX[5983]: Logged into LT000 at morph-02.lab.msp.redhat.com ::ffff:192.168.44.62 Feb 18 15:32:20 morph-02 lock_gulmd_LTPX[5983]: Finished resending to LT000 Feb 18 15:32:20 morph-02 lock_gulmd_LT000[5979]: New Client: idx 4 fd 9 from morph-05.lab.msp.redhat.com ::ffff:192.168.44.65 Feb 18 15:32:20 morph-02 lock_gulmd_LT000[5979]: Attached slave morph-05.lab.msp.redhat.com ::ffff:192.168.44.65 idx:5 fd:10 (soff:2 connected:0xc) Feb 18 15:32:21 morph-02 kernel: GFS: fsid=morph-cluster:gfs3.1: jid=0: Trying to acquire journal lock... Feb 18 15:32:21 morph-02 kernel: GFS: fsid=morph-cluster:gfs2.1: jid=0: Trying to acquire journal lock... Feb 18 15:32:21 morph-02 kernel: GFS: fsid=morph-cluster:gfs1.1: jid=0: Trying to acquire journal lock... Feb 18 15:32:21 morph-02 kernel: GFS: fsid=morph-cluster:gfs0.1: jid=0: Trying to acquire journal lock... Feb 18 15:32:21 morph-02 ccsd[5929]: Cluster is quorate. Allowing connections. Feb 18 15:32:21 morph-02 kernel: GFS: fsid=morph-cluster:gfs3.1: jid=0: Busy Feb 18 15:32:21 morph-02 kernel: GFS: fsid=morph-cluster:gfs2.1: jid=0: Busy Feb 18 15:32:21 morph-02 kernel: GFS: fsid=morph-cluster:gfs1.1: jid=0: Busy Feb 18 15:32:21 morph-02 kernel: GFS: fsid=morph-cluster:gfs0.1: jid=0: Looking at journal... Feb 18 15:32:21 morph-02 kernel: GFS: fsid=morph-cluster:gfs3.1: jid=0: Trying to acquire journal lock... Feb 18 15:32:21 morph-02 kernel: GFS: fsid=morph-cluster:gfs2.1: jid=0: Trying to acquire journal lock... Feb 18 15:32:21 morph-02 kernel: GFS: fsid=morph-cluster:gfs1.1: jid=0: Trying to acquire journal lock... Feb 18 15:32:21 morph-02 kernel: GFS: fsid=morph-cluster:gfs3.1: jid=0: Busy Feb 18 15:32:21 morph-02 kernel: GFS: fsid=morph-cluster:gfs2.1: jid=0: Busy Feb 18 15:32:21 morph-02 kernel: GFS: fsid=morph-cluster:gfs1.1: jid=0: Busy Feb 18 15:32:21 morph-02 kernel: GFS: fsid=morph-cluster:gfs0.1: jid=0: Acquiring the transaction lock... Feb 18 15:32:21 morph-02 lock_gulmd_LT000[5979]: Attached slave morph-04.lab.msp.redhat.com ::ffff:192.168.44.64 idx:2 fd:7 (soff:3 connected:0xc) Feb 18 15:32:21 morph-02 lock_gulmd_LT000[5979]: morph-04.lab.msp.redhat.com ::ffff:192.168.44.64 Gave us a 1733051136:gulm_lock_cb_state 1006:Not Allowed, closing the connection. Feb 18 15:32:21 morph-02 lock_gulmd_LT000[5979]: Attached slave morph-04.lab.msp.redhat.com ::ffff:192.168.44.64 idx:2 fd:7 (soff:3 connected:0xc) Feb 18 15:32:21 morph-02 last message repeated 3 times Feb 18 15:32:21 morph-02 lock_gulmd_LT000[5979]: New Client: idx 6 fd 11 from morph-04.lab.msp.redhat.com ::ffff:192.168.44.64 Feb 18 15:32:34 morph-02 lock_gulmd_core[5975]: morph-03.lab.msp.redhat.com missed a heartbeat (time:1108758754760570 mb:3) Feb 18 15:32:34 morph-02 lock_gulmd_core[5975]: Client (morph-03.lab.msp.redhat.com) expired Feb 18 15:32:34 morph-02 lock_gulmd_core[7374]: Gonna exec fence_node -O morph-03.lab.msp.redhat.com Feb 18 15:32:34 morph-02 lock_gulmd_core[5975]: Forked [7374] fence_node -O morph-03.lab.msp.redhat.com with a 0 pause. Feb 18 15:32:34 morph-02 lock_gulmd_LT000[5979]: morph-04.lab.msp.redhat.com ::ffff:192.168.44.64 Gave us a 1733051136:gulm_lock_cb_state 1006:Not Allowed, closing the connection. Feb 18 15:32:34 morph-02 lock_gulmd_LT000[5979]: Attached slave morph-04.lab.msp.redhat.com ::ffff:192.168.44.64 idx:2 fd:7 (soff:3 connected:0xc) Feb 18 15:32:56 morph-02 fence_node[7374]: Fence of "morph-03.lab.msp.redhat.com" was successful Feb 18 15:32:56 morph-02 lock_gulmd_core[5975]: found match on pid 7374, marking node morph-03.lab.msp.redhat.com as logged out. Feb 18 15:32:56 morph-02 kernel: lock_gulm: Checking for journals for node "morph-03.lab.msp.redhat.com" Feb 18 15:32:56 morph-02 kernel: GFS: fsid=morph-cluster:gfs3.1: jid=2: Trying to acquire journal lock... Feb 18 15:32:56 morph-02 kernel: GFS: fsid=morph-cluster:gfs2.1: jid=2: Trying to acquire journal lock... Feb 18 15:32:56 morph-02 kernel: GFS: fsid=morph-cluster:gfs3.1: jid=2: Looking at journal... Feb 18 15:32:56 morph-02 kernel: GFS: fsid=morph-cluster:gfs2.1: jid=2: Looking at journal... Feb 18 15:32:56 morph-02 kernel: GFS: fsid=morph-cluster:gfs3.1: jid=2: Acquiring the transaction lock... Feb 18 15:32:56 morph-02 lock_gulmd_LT000[5979]: Attached slave morph-04.lab.msp.redhat.com ::ffff:192.168.44.64 idx:2 fd:7 (soff:3 connected:0xc) Feb 18 15:32:56 morph-02 kernel: GFS: fsid=morph-cluster:gfs1.1: jid=2: Trying to acquire journal lock... Feb 18 15:32:56 morph-02 kernel: GFS: fsid=morph-cluster:gfs1.1: jid=2: Looking at journal... Feb 18 15:32:56 morph-02 kernel: GFS: fsid=morph-cluster:gfs0.1: jid=0: Replaying journal... Feb 18 15:32:56 morph-02 kernel: GFS: fsid=morph-cluster:gfs0.1: jid=0: Replayed 0 of 1 blocks Feb 18 15:32:56 morph-02 kernel: GFS: fsid=morph-cluster:gfs0.1: jid=0: replays = 0, skips = 1, sames = 0 Feb 18 15:32:56 morph-02 kernel: GFS: fsid=morph-cluster:gfs0.1: jid=0: Journal replayed in 36s Feb 18 15:32:56 morph-02 kernel: GFS: fsid=morph-cluster:gfs2.1: jid=2: Done Feb 18 15:32:56 morph-02 kernel: GFS: fsid=morph-cluster:gfs0.1: jid=0: Done Feb 18 15:32:56 morph-02 kernel: GFS: fsid=morph-cluster:gfs0.1: jid=0: Trying to acquire journal lock... Feb 18 15:32:56 morph-02 kernel: GFS: fsid=morph-cluster:gfs0.1: jid=0: Busy Feb 18 15:32:56 morph-02 kernel: GFS: fsid=morph-cluster:gfs0.1: jid=2: Trying to acquire journal lock... Feb 18 15:32:56 morph-02 kernel: GFS: fsid=morph-cluster:gfs1.1: jid=2: Done Feb 18 15:32:56 morph-02 kernel: GFS: fsid=morph-cluster:gfs0.1: jid=2: Busy Feb 18 15:32:56 morph-02 kernel: GFS: fsid=morph-cluster:gfs0.1: jid=2: Trying to acquire journal lock... Feb 18 15:32:56 morph-02 kernel: GFS: fsid=morph-cluster:gfs0.1: jid=2: Busy Feb 18 15:33:41 morph-02 smartd[2040]: Device: /dev/hda, 1 Currently unreadable (pending) sectors Feb 18 15:33:41 morph-02 smartd[2040]: Device: /dev/hda, 1 Offline uncorrectable sectors Feb 18 15:36:54 morph-02 lock_gulmd_core[5975]: New Client: idx:8 fd:13 from morph-01.lab.msp.redhat.com ::ffff:192.168.44.61 Feb 18 15:36:54 morph-02 lock_gulmd_LT000[5979]: Attached slave morph-01.lab.msp.redhat.com ::ffff:192.168.44.61 idx:7 fd:12 (soff:1 connected:0xe) Feb 18 15:36:55 morph-02 lock_gulmd_LT000[5979]: New Client: idx 8 fd 13 from morph-01.lab.msp.redhat.com ::ffff:192.168.44.61 Feb 18 15:36:56 morph-02 lock_gulmd_core[5975]: New Client: idx:9 fd:14 from morph-03.lab.msp.redhat.com ::ffff:192.168.44.63 Feb 18 15:36:56 morph-02 lock_gulmd_LT000[5979]: Attached slave morph-03.lab.msp.redhat.com ::ffff:192.168.44.63 idx:9 fd:14 (soff:0 connected:0xf) Feb 18 15:36:57 morph-02 lock_gulmd_LT000[5979]: New Client: idx 10 fd 15 from morph-03.lab.msp.redhat.com ::ffff:192.168.44.63 just to note, pjc saw something like this as well. ....... Feb 18 15:32:21 morph-02 lock_gulmd_LT000[5979]: morph-04.lab.msp.redhat.com ::ffff:192.168.44.64 Gave us a 1733051136:gulm_lock_cb_state 1006:Not Allowed, closing the connection. ....... mixing up slave and client sockets. <sarcasm>wonderful.</sarcasm> Looks a lot like bug #148029 I checked stuff in on the 16 that should have fixed this. I see the code you're using was built on the 17. Can you confirm if that commit was in it? *sigh* it is there. Its the same patch that added the "Gave us..." message. cruptity. This is bug #148029, marking as such. *** This bug has been marked as a duplicate of 148029 *** |