Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 675099

Summary: when existing ring file is zero bytes, corosync aborts
Product: Red Hat Enterprise Linux 6 Reporter: Fabio Massimo Di Nitto <fdinitto>
Component: corosyncAssignee: Steven Dake <sdake>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.1CC: cluster-maint, djansa, jfriesse, jkortus, jwest
Target Milestone: rcKeywords: ZStream
Target Release: 6.1   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: corosync-1.2.3-27.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 675206 (view as bug list) Environment:
Last Closed: 2011-05-19 14:24:18 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 675206, 696734    
Attachments:
Description Flags
upstream submitted patch to resolve this issue none

Description Fabio Massimo Di Nitto 2011-02-04 09:58:50 UTC
Description of problem:

on x86_64:
[root@rhel6-node2 corosync]# corosync -f
corosync: totemsrp.c:3073: memb_ring_id_create_or_load: Assertion `res == sizeof (unsigned long long)' failed.
Aborted

on i386 it starts, but any access will result in 100% cpu spinning.

Version-Release number of selected component (if applicable):

corosync-1.2.3-21.el6

How reproducible:

as shows above

Additional info:

nodes 1 and 2 are latest RHEL6.1.

Selinux enable/disable makes no difference (selinux is currently disable in the above assertion).

iptables are off.

Issue is triggered either via corosync standalone startup or via cman.

Comment 1 Fabio Massimo Di Nitto 2011-02-04 15:41:50 UTC
Some more information.

I found a bunch of files in /var/lib/corosync, including a ring_$somedata.

After removing that file (it was 0 bytes), corosync starts again.

The 100% cpu spinning is a different problem that I am investigating now.

Comment 2 Fabio Massimo Di Nitto 2011-02-04 15:50:28 UTC
Very easy to reproduce too:

start corosync

ls -als /var/lib/corosync/ring*

(take a note of the file name)

stop corosync

rm -rf /var/lib/corosync/*

touch /var/lib/corosync/ringid_ (as above file name)
chmod 700 /var/lib/corosync/ringid_ (file is create 700 by corosync)
chown root:root ....

now it should match the same file as above but size 0 instead of 4/8.

corosync -f

corosync: totemsrp.c:3106: memb_ring_id_create_or_load: Assertion `res == sizeof (unsigned long long)' failed.
Aborted

independent of the architecture.

Suggested fix is always to unlink a file at startup time and recreate as needed, instead of rely on existing ones.

Comment 3 Fabio Massimo Di Nitto 2011-02-04 15:53:50 UTC
one more side note.. I have no idea _how_ i got a 0 len file there.. but it was there.

Comment 6 Steven Dake 2011-02-22 19:20:43 UTC
Created attachment 480217 [details]
upstream submitted patch to resolve this issue

Comment 10 Jaroslav Kortus 2011-03-02 15:34:25 UTC
verified with corosync-1.2.3-28.el6.x86_64.
corosync starts correctly when old file is there, new zero file is there or if big file is there instead (50M).

Comment 14 errata-xmlrpc 2011-05-19 14:24:18 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0764.html