Bug 239080
Summary: | gulm and clustermon interaction causes gulm to fence cluster members while running IO load. | ||
---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Dean Jansa <djansa> |
Component: | gulm | Assignee: | Chris Feist <cfeist> |
Status: | CLOSED WONTFIX | QA Contact: | Cluster QE <mspqa-list> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4 | CC: | cluster-maint |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2010-01-04 21:19:45 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Dean Jansa
2007-05-04 19:24:04 UTC
A more detailed portion of the log, showing a typcial scenario: May 1 13:36:15 link-13 lock_gulmd_core[6829]: "Magma::9794" is logged out. fd:14 May 1 13:36:19 link-13 kernel: lock_gulmd(6836): unaligned access to 0x600000000003d941, ip=0x400000000005d741 May 1 13:36:52 link-13 lock_gulmd_LT000[6836]: EOF on xdr (link-14 ::ffff:10.15.89.164 idx:4 fd:9) May 1 13:37:10 link-13 lock_gulmd_core[6829]: link-14 missed a heartbeat (time:1178044630197028 mb:1) May 1 13:37:10 link-13 kernel: lock_gulmd(6836): unaligned access to 0x600000000003d941, ip=0x400000000005d300 May 1 13:37:10 link-13 lock_gulmd_core[6829]: link-15 missed a heartbeat (time:1178044630197028 mb:1) May 1 13:37:10 link-13 lock_gulmd_LT000[6836]: EOF on xdr (link-15 ::ffff:10.15.89.165 idx:6 fd:11) May 1 13:37:10 link-13 kernel: lock_gulmd(6836): unaligned access to 0x600000000003d941, ip=0x400000000005dc10 May 1 13:37:10 link-13 lock_gulmd_core[6829]: link-16 missed a heartbeat (time:1178044630197028 mb:1) May 1 13:37:10 link-13 lock_gulmd_LT000[6836]: ERROR [src/lock_io.c:1685] Warning! When trying to send a 0x674c4300:gulm_lock_cb_state packet, we got a -32:32:Broken pipe May 1 13:37:10 link-13 kernel: lock_gulmd(6839): unaligned access to 0x60000000009dbca1, ip=0x400000000005d741 May 1 13:37:10 link-13 lock_gulmd_core[6829]: link-13 missed a heartbeat (time:1178044630197028 mb:1) May 1 13:37:10 link-13 kernel: lock_gulmd(6836): unaligned access to 0x600000000003e521, ip=0x400000000005d741 May 1 13:37:10 link-13 lock_gulmd_core[6829]: ERROR [src/core_io.c:2082] POLLHUP on idx:4 fd:9 name:link-14 May 1 13:37:10 link-13 kernel: lock_gulmd(6836): unaligned access to 0x600000000003e521, ip=0x400000000005d300 May 1 13:37:10 link-13 lock_gulmd_core[6829]: ERROR [src/core_io.c:2082] POLLHUP on idx:5 fd:10 name:link-15 May 1 13:37:10 link-13 kernel: lock_gulmd(6836): unaligned access to 0x600000000003e521, ip=0x400000000005dc10 May 1 13:37:10 link-13 lock_gulmd_core[6829]: Core lost slave quorum. Have 1, need 2. Switching to Arbitrating. May 1 13:37:10 link-13 kernel: lock_gulmd(6836): unaligned access to 0x600000000003e521, ip=0x400000000005d611 May 1 13:37:10 link-13 lock_gulmd_core[6829]: Could not send quorum update to slave link-14 May 1 13:37:10 link-13 kernel: lock_gulmd(6839): unaligned access to 0x60000000009dc551, ip=0x400000000005d741 May 1 13:37:10 link-13 lock_gulmd_core[6829]: ERROR [src/core_resources.c:302] Error sending core state information to child Magma::9796: Broken pipe This appears to still be reproducable. May 21 14:38:46 grant-03 lock_gulmd_main[3087]: Forked lock_gulmd_core. May 21 14:38:46 grant-03 lock_gulmd_core[3089]: Starting lock_gulmd_core 1.0.10. (built Mar 14 2007 16:40:42) Copyright (C) 2004 Red Hat, Inc. All rights reserved. May 21 14:38:46 grant-03 lock_gulmd_core[3089]: I am running in Fail-over mode. May 21 14:38:46 grant-03 lock_gulmd_core[3089]: I am (grant-03) with ip (::ffff:10.15.89.153) May 21 14:38:46 grant-03 lock_gulmd_core[3089]: This is cluster GRANT May 21 14:38:46 grant-03 lock_gulmd_core[3089]: EOF on xdr (Magma::3024 ::1 idx:2 fd:7) May 21 14:38:47 grant-03 lock_gulmd_main[3087]: Forked lock_gulmd_LT. May 21 14:38:47 grant-03 lock_gulmd_LT[3092]: Starting lock_gulmd_LT 1.0.10. (built Mar 14 2007 16:40:42) Copyright (C) 2004 Red Hat, Inc. All rights reserved. May 21 14:38:47 grant-03 lock_gulmd_LT[3092]: I am running in Fail-over mode. May 21 14:38:47 grant-03 lock_gulmd_LT[3092]: I am (grant-03) with ip (::ffff:10.15.89.153) May 21 14:38:47 grant-03 lock_gulmd_LT[3092]: This is cluster GRANT May 21 14:38:47 grant-03 lock_gulmd_core[3089]: EOF on xdr (Magma::3024 ::1 idx:3 fd:8) May 21 14:38:48 grant-03 lock_gulmd_main[3087]: Forked lock_gulmd_LTPX. May 21 14:38:48 grant-03 lock_gulmd_LTPX[3096]: Starting lock_gulmd_LTPX 1.0.10. (built Mar 14 2007 16:40:42) Copyright (C) 2004 Red Hat, Inc. All rights reserved. May 21 14:38:48 grant-03 lock_gulmd_LTPX[3096]: I am running in Fail-over mode. May 21 14:38:48 grant-03 lock_gulmd_LTPX[3096]: I am (grant-03) with ip (::ffff:10.15.89.153) May 21 14:38:48 grant-03 lock_gulmd_LTPX[3096]: This is cluster GRANT May 21 14:38:48 grant-03 ccsd[3023]: Connected to cluster infrastruture via: GuLM Plugin v1.0.5 May 21 14:38:48 grant-03 ccsd[3023]: Initial status:: Inquorate May 21 14:38:49 grant-03 lock_gulmd_core[3089]: ERROR [src/core_io.c:1066] Got error from reply: (grant-02.lab.msp.redhat.com ::ffff:10.15.89.152) 1008:Bad State Change May 21 14:38:52 grant-03 lock_gulmd_core[3089]: ERROR [src/core_io.c:1066] Got error from reply: (grant-02.lab.msp.redhat.com ::ffff:10.15.89.152) 1008:Bad State Change May 21 14:38:52 grant-03 lock_gulmd_LTPX[3096]: finished. May 21 14:38:52 grant-03 lock_gulmd_core[3089]: finished. May 21 14:38:52 grant-03 lock_gulmd_LT000[3092]: EOF on xdr (_ core _ ::1 idx:1 fd:6) May 21 14:38:52 grant-03 lock_gulmd_LT000[3092]: In src/lock_io.c:419 (1.0.10) death by: Lost connection to core, cannot continue.node reset required to re-activate cluster operations. May 21 14:38:52 grant-03 ccsd[3023]: Cluster manager shutdown. Attemping to reconnect... May 21 14:38:53 grant-03 lock_gulmd: startup failed |