Bug 616095
| Summary: | corosync process eats 90+% of CPU, node fenced during add/remove test | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Dean Jansa <djansa> |
| Component: | corosync | Assignee: | Angus Salkeld <asalkeld> |
| Status: | CLOSED DUPLICATE | QA Contact: | Cluster QE <mspqa-list> |
| Severity: | medium | Docs Contact: | |
| Priority: | low | ||
| Version: | 6.0 | CC: | cluster-maint, sdake |
| Target Milestone: | rc | Keywords: | RHELNAK |
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2010-07-22 19:38:45 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Dean Jansa
2010-07-19 16:24:03 UTC
Angus, Can you please verify if this is a dup of https://bugzilla.redhat.com/show_bug.cgi?id=580741. Ie run with -9 (in this bug report) then run with -11. The symptoms sound like they should be resolved with the -11 build. This issue has been proposed when we are only considering blocker issues in the current Red Hat Enterprise Linux release. ** If you would still like this issue considered for the current release, ask your support representative to file as a blocker on your behalf. Otherwise ask that it be considered for the next Red Hat Enterprise Linux release. ** This issue has been proposed when we are only considering blocker issues in the current Red Hat Enterprise Linux release. ** If you would still like this issue considered for the current release, ask your support representative to file as a blocker on your behalf. Otherwise ask that it be considered for the next Red Hat Enterprise Linux release. ** This issue has been proposed when we are only considering blocker issues in the current Red Hat Enterprise Linux release. ** If you would still like this issue considered for the current release, ask your support representative to file as a blocker on your behalf. Otherwise ask that it be considered for the next Red Hat Enterprise Linux release. ** Sly, Have not yet set flags as requested in your earlier email. Angus will triage and when that is finished we will either close as dup (most likely scenario), target for 6.1, or 6.0 blocker? depending on how serious the defect is. Should have answer in 1-2 days. Thanks -steve This issue has been proposed when we are only considering blocker issues in the current Red Hat Enterprise Linux release. ** If you would still like this issue considered for the current release, ask your support representative to file as a blocker on your behalf. Otherwise ask that it be considered for the next Red Hat Enterprise Linux release. ** This issue has been proposed when we are only considering blocker issues in the current Red Hat Enterprise Linux release. ** If you would still like this issue considered for the current release, ask your support representative to file as a blocker on your behalf. Otherwise ask that it be considered for the next Red Hat Enterprise Linux release. ** This issue has been proposed when we are only considering blocker issues in the current Red Hat Enterprise Linux release. ** If you would still like this issue considered for the current release, ask your support representative to file as a blocker on your behalf. Otherwise ask that it be considered for the next Red Hat Enterprise Linux release. ** Hi guys, I keep getting the error below. Any ideas on getting this test to run? Do I need anything special in config file (I am using the same one from a laryngitis test run)? $ rpm -q corosync cman corosync-1.2.3-9.el6.x86_64 cman-3.0.12-9.el6.x86_64 $ ./ccs/bin/pruner -R /home/asalkeld/virty4.xml ricci (pid 1858) is running... ricci (pid 1806) is running... ricci (pid 1828) is running... ricci (pid 1811) is running... Grabbing running cluster.conf from r4 r4:/etc/cluster/cluster.conf -> /tmp/cluster.conf.orig.pruner. Removing r4 from cluster Stopping cluster: Leaving fence domain... [ OK ] Stopping gfs_controld... [ OK ] Stopping dlm_controld... [ OK ] Stopping fenced... [ OK ] Stopping cman... [ OK ] Waiting for corosync to shutdown:[ OK ] Unloading kernel modules... [ OK ] Unmounting configfs... [ OK ] Removing r4 from /tmp/cluster.conf.pruner.30035 1 nodes removed from /tmp/cluster.conf.pruner.30035 Bumping config_version to 3 Distributing /tmp/cluster.conf.pruner.30035 to one remaning node /tmp/cluster.conf.pruner.30035 -> r1:/etc/cluster/cluster.conf Updating cman's view of the cluster Checking node count, vote and quorum on r1 New Node count: 3 r1: Node count: 3 New Expected votes: 3 r1: Expected votes: 3 New Quorum: 2 r1: Quorum: 3 Quorum votes on the running cluster does not match the expected value! Expected Quorum: 2 Cman has: 3 --------------- $ cman_tool status Version: 6.2.0 Config Version: 3 Cluster Name: virty Cluster Id: 13185 Cluster Member: Yes Cluster Generation: 19996 Membership state: Cluster-Member Nodes: 3 Expected votes: 3 Total votes: 3 Node votes: 1 Quorum: 3 Active subsystems: 1 Flags: Ports Bound: 0 Node name: r2 Node ID: 2 Multicast addresses: 239.192.51.180 Node addresses: 192.168.100.92 Chrissie indicated comment #10 may be explained by Bug 606989. Thanks guys, I am now using
corosync-1.2.3-9.el6.x86_64
cman-3.0.12-14.el6.x86_64
and the quorum problem seems to be fixed. Only problem
is that I can't reproduce the bug. I have run through 200
iterations without seeing anything unusual.
Dean does the test actually fail (return != 0)?
My little script does this (and doesn't fail) - output looks good too:
#!/bin/bash
set -e
for i in {1..200}
do
echo "=================================="
echo "=> Iteration $i"
./ccs/bin/pruner -R /home/asalkeld/virty4.xml
done
With the fix in -13 I don't see this test fail either. Looks like we can close this as a duplicate of Bug 606989. *** This bug has been marked as a duplicate of bug 606989 *** *** This bug has been marked as a duplicate of bug 580741 *** |