Red Hat Bugzilla – Bug 166652
Barriers are broken
Last modified: 2009-04-16 16:00:14 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-GB; rv:1.7.10) Gecko/20050720 Fedora/1.0.6-1.1.fc4 Firefox/1.0.6
Description of problem:
Barriers in RHEL4 cman are completely broken. The only reason the thing works at all is
1. Most of the time the two clients of them (membership & sm) are largely synchronised already and
2. the upper layers are tolerant of timeout and will retry or ignore the failure
The observable symptoms of this are few, but usually manifest themselves as very slow transition times (membership retrying until everyone is synchronised) or slow joining: lots of "CMAN sending membership request" messages on the new node.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Bring up a lot of nodes into a cluster at the same time OR
2. Bring a new node into a large(ish) cluster
Actual Results: These things should happen fairly quickly, but not instantly.
Expected Results: They can take some while to settle.
I'll check in a fix into the STABLE branch but hold off RHEL4 until we get any evidence it is annoying customers.
Nobody outside of the RHCS code is using barriers anyway - if they were we'd have noticed this sooner!
After further thoughts & testing, do NOT apply this to U2. It's faulty in
I'll do a proper fix when I return from holiday.
Looks like I needed that holiday.
Barriers were broken, but not nearly as badly as I'd thought. The fix is simple
and now applied to STABLE & RHEL4 branches. It's obviously too late for U2 but
the problem is nowhere near serious enough to warrant any panic.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.