Bug 917914
Summary: | the cluster is down because of [TOTEM ] FAILED TO RECEIVE and INFO: task clvmd:4741 blocked | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | muse <fyjm2010> |
Component: | corosync | Assignee: | Jan Friesse <jfriesse> |
Status: | CLOSED DUPLICATE | QA Contact: | Cluster QE <mspqa-list> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 6.4 | CC: | ccaulfie, cluster-maint, fdinitto, rpeterso, sdake, teigland |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-03-05 10:27:18 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
muse
2013-03-05 06:25:39 UTC
(In reply to comment #0) > Description of problem: > The RHEL6.4 is installed on the Cisco UCS B200 NIC 1240. Before I create the > cluster, the OS is working well, but after I create the cluster node from > luci and within 2 minutes, the cluster service will be no responsed and > crashed after the cluster is created. The cluster service can not be stopped > by useing service XXX stop, and the reboot is very very slow and > unsuccessful. After serveral hours, the cluster service management can be > closed with errors. > Actually, the cluster only has the nodes name. There is no possible to add > other resources because of the service crashed. > > OS info > rpm -q corosync cman rgmanager fence-agents gfs2-utils lvm2-cluster > corosync-1.4.1-15.el6.x86_64 > cman-3.0.12.1-49.el6.x86_64 > rgmanager-3.0.12.1-17.el6.x86_64 > fence-agents-3.1.5-25.el6.x86_64 > gfs2-utils-3.0.12.1-49.el6.x86_64 > lvm2-cluster-2.02.98-9.el6.x86_64 > > uname -a > Linux 2.6.32-358.el6.x86_64 #1 SMP Tue Jan 29 11:47:41 EST 2013 x86_64 > x86_64 x86_64 GNU/Linux > > > Error Log: > Mar 4 16:10:22 ARCUCSB2007712M corosync[4519]: [TOTEM ] Retransmit List: > 78 7a 7b 7c 7e 7f 80 82 83 > Mar 4 16:10:24 ARCUCSB2007712M corosync[4519]: [TOTEM ] Retransmit List: > 78 7a 7b 7c 7e 7f 80 82 83 > Mar 4 16:10:26 ARCUCSB2007712M corosync[4519]: [TOTEM ] Retransmit List: > 78 7a 7b 7c 7e 7f 80 82 83 > Mar 4 16:10:28 ARCUCSB2007712M corosync[4519]: [TOTEM ] Retransmit List: > 78 7a 7b 7c 7e 7f 80 82 83 > Mar 4 16:10:28 ARCUCSB2007712M corosync[4519]: [TOTEM ] FAILED TO RECEIVE > Mar 4 16:10:30 ARCUCSB2007712M abrt[8051]: File '/usr/sbin/corosync' seems > to be deleted This generally indicates a network (multicast) issue between the nodes. Also, why would abrt reports that corosync has been deleted? *** This bug has been marked as a duplicate of bug 854216 *** |