Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1095657

Summary: receive_start should make it clear when it fails
Product: Red Hat Enterprise Linux 6 Reporter: Cedric Buissart <cbuissar>
Component: clusterAssignee: Christine Caulfield <ccaulfie>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: low Docs Contact:
Priority: low    
Version: 6.5CC: ccaulfie, cluster-maint, jkortus, jpayne, rpeterso, teigland, tlavigne
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-07-22 07:04:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Patch none

Description Cedric Buissart 2014-05-08 09:50:42 UTC
Description of problem:

When receive_start fails to add a node to its group (e.g. because of both start_count not being 0, for instance), it is important to make the uneducated eye understand that the whole group is failed and has somehow to be recreated.

This is reproducible for at least the fence domain and the DLM lockspaces.


Version-Release number of selected component (if applicable): 
cman-3.0.12.1-59.el6_5.2.x86_64


How reproducible:
any time a partitioned group is trying to merge 


Steps to Reproduce:
1. have a slow'ish network
2. set a very log FENCED_MEMBER_DELAY (0 or 1 ?)
3. start all node simultaneously (cman and clvmd)

Actual results:
* The current 'error' is simply :

fenced[XXXX]: receive_start 2:2 add node with started_count 1

However, it is not being understood as an error by the uneducated eye.

Expected results:
- Add an additional line, such as "receive_start error in %s, membership is disallowed.", with %s being an identifier of the problematic mountgroup or DLM lockspace.

Additional info:
* This has to be set in 3 locations :

 - Fence domain : fence/fenced/cpg.c
 - DLM lockspaces : group/dlm_controld/cpg.c
 - GFS groups : group/gfs_controld/cpg-new.c

Comment 2 Christine Caulfield 2014-06-03 14:26:26 UTC
Created attachment 901810 [details]
Patch

That seems pretty straightforward. For reference here's a patch that's been vaguely (I had to force the condition) tested.

Comment 3 Cedric Buissart 2014-06-04 07:38:24 UTC
Oupsie, I had forgotten to send it.

Thanks Chrissie!

Comment 7 Justin Payne 2015-03-19 17:29:18 UTC
Could not reproduce on my network. Verified SanityOnly in cman-3.0.12.1-73.el6

Comment 8 errata-xmlrpc 2015-07-22 07:04:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1363.html