Bug 208954 - groupd doesn't support mixed recovery with join/leave events
groupd doesn't support mixed recovery with join/leave events
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cman (Show other bugs)
All Linux
medium Severity medium
: ---
: ---
Assigned To: David Teigland
Cluster QE
Depends On:
  Show dependency treegraph
Reported: 2006-10-02 15:18 EDT by Robert Peterson
Modified: 2009-04-16 18:49 EDT (History)
4 users (show)

See Also:
Fixed In Version: 5.0.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2006-11-28 16:11:18 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Robert Peterson 2006-10-02 15:18:32 EDT
Description of problem:
The groupd daemon does not currently support mixing
recovery with join and leave events.

Version-Release number of selected component (if applicable):
RHEL5 Beta 1 plus cluster development code from 01 Oct 2006.

How reproducible:
Difficult to recreate, but we've seen it occasionally with the
'revolver' test from QE.

Steps to Reproduce:
Actual results:

Expected results:

Additional info:
Comment 1 Robert Peterson 2006-10-02 15:28:31 EDT
Additional information:
After running revolver without gfs for several hours, system
'camel' reported:
Assertion failed on line 218 of file app.c
which means it got an empty recovery set for "add_recovery_set"
associated with nodeid 2, which is system 'merit'.

Further analysis by Dave Teigland found this:

1159570695 0:default process_node_join 2
1159570695 0:default cpg add node 2 total 3
1159570695 0:default make_event_id 200030001 nodeid 2 memb_count 3 type 1
1159570695 0:default queue join event for nodeid 2
1159570696 0:default confchg left 0 joined 1 total 4
1159570696 0:default process_node_join 3
1159570696 0:default cpg add node 3 total 4
1159570696 0:default queue_app_join: current event 3 300030003 FAIL_START_WAIT
1159570696 0:default make_event_id 300040001 nodeid 3 memb_count 4 type 1
1159570696 0:default queue join event for nodeid 3
1159570696 0:default     queued ev 2 200030001 JOIN_BEGIN
1159570700 0:default confchg left 1 joined 0 total 3
1159570700 0:default confchg removed node 2 reason 3
1159570700 0:default process_node_down 2

This shows that node 2 joined and then failed, four seconds later, while
groupd was still processing the joins for it and others. 
Comment 2 Kiersten (Kerri) Anderson 2006-10-03 12:57:44 EDT
Devel ACK for RHEL 5.0.0 Beta 2
Comment 3 RHEL Product and Program Management 2006-10-03 13:03:21 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux release.  Product Management has requested further review
of this request by Red Hat Engineering.  This request is not yet committed for
inclusion in release.
Comment 4 Jay Turner 2006-10-10 15:48:28 EDT
QE ack for RHEL5.
Comment 5 David Teigland 2006-10-12 10:32:47 EDT
The changes I've been working on in this area are tested and
checked in now.  Recoveries mixed with joins do work in some
scenarios now.  There will be more work to do here.
Comment 7 Nate Straz 2007-12-13 12:22:22 EST
Moving all RHCS ver 5 bugs to RHEL 5 so we can remove RHCS v5 which never existed.

Note You need to log in before you can comment on or make changes to this bug.