Bug 881694
Summary: | corosync process is heavy load, deadlocks in plug/unplug network cable test | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Shining <nshi_nb> | ||||
Component: | corosync | Assignee: | Jan Friesse <jfriesse> | ||||
Status: | CLOSED WONTFIX | QA Contact: | cluster-qe <cluster-qe> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 6.3 | CC: | fdinitto, jkortus, pzimek | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2015-08-11 14:31:23 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
From logs: The network interface is down. Are you running NM? If so, please turn it off and use static configuration. Corosync has really big problems if interface is shutdown (at least route table change A LOT). There is little better explanation: https://github.com/corosync/corosync/wiki/Corosync-and-ifdown-on-active-network-interface It's in TODO to fix ifdown problem generally, but even it's quite high priority, there are bugs with even higher priority. I will keep this BZ open as TODO. *** Bug 883080 has been marked as a duplicate of this bug. *** *** Bug 989934 has been marked as a duplicate of this bug. *** Proper solution of this bug means change in huge part of very sensitive code. Also bug has well known causes and workaround (don't test cluster failover by ifdown and don't use NetworkManager) so closing it as wontfix. |
Created attachment 654195 [details] corosync log files Description of problem: Version-Release number of selected component (if applicable): [root@gcluster74 ~]# lsb_release -a LSB Version: :core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd 64:printing-4.0-noarch Distributor ID: RedHatEnterpriseServer Description: Red Hat Enterprise Linux Server release 6.2 (Santiago) Release: 6.2 Codename: Santiago [gbase@gcluster77 ~]$ rpm -qa | grep corosync corosync-1.4.1-4.el6.x86_64 corosynclib-1.4.1-4.el6.x86_64 /etc/corosync/corosync.conf -------------- totem { version: 2 secauth: on threads: 0 interface { ringnumber: 0 bindnetaddr: 192.168.9.74 mcastaddr: 226.94.1.9 mcastport: 5498 ttl: 1 } } logging { fileline: off to_stderr: no to_logfile: yes to_syslog: yes logfile: /var/log/corosync.log debug: on ## only 74 is on, other nodes is off timestamp: on logger_subsys { subsys: AMF debug: off } } -------------- How reproducible: Four cluster Node, ip from 192.168.9.71 ~ 74 unplug the network cable on one or two node, wait a few second, then replugin the network cable. Steps to Reproduce: 1. unplug the network cable 2. wait a few seconds 3. replugin the network cable Actual results: The cluster node in the infinite loop of Gather state 11. Expected results: The cluster node in the consistent status. Additional info: see the logfile attached.