Bug 675099

Summary: when existing ring file is zero bytes, corosync aborts
Product: Red Hat Enterprise Linux 6 Reporter: Fabio Massimo Di Nitto <fdinitto>
Component: corosyncAssignee: Steven Dake <sdake>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.1CC: cluster-maint, djansa, jfriesse, jkortus, jwest
Target Milestone: rcKeywords: ZStream
Target Release: 6.1   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: corosync-1.2.3-27.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 675206 (view as bug list) Environment:
Last Closed: 2011-05-19 14:24:18 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 675206, 696734    
Attachments:
Description Flags
upstream submitted patch to resolve this issue none

Description Fabio Massimo Di Nitto 2011-02-04 09:58:50 UTC
Description of problem:

on x86_64:
[root@rhel6-node2 corosync]# corosync -f
corosync: totemsrp.c:3073: memb_ring_id_create_or_load: Assertion `res == sizeof (unsigned long long)' failed.
Aborted

on i386 it starts, but any access will result in 100% cpu spinning.

Version-Release number of selected component (if applicable):

corosync-1.2.3-21.el6

How reproducible:

as shows above

Additional info:

nodes 1 and 2 are latest RHEL6.1.

Selinux enable/disable makes no difference (selinux is currently disable in the above assertion).

iptables are off.

Issue is triggered either via corosync standalone startup or via cman.

Comment 1 Fabio Massimo Di Nitto 2011-02-04 15:41:50 UTC
Some more information.

I found a bunch of files in /var/lib/corosync, including a ring_$somedata.

After removing that file (it was 0 bytes), corosync starts again.

The 100% cpu spinning is a different problem that I am investigating now.

Comment 2 Fabio Massimo Di Nitto 2011-02-04 15:50:28 UTC
Very easy to reproduce too:

start corosync

ls -als /var/lib/corosync/ring*

(take a note of the file name)

stop corosync

rm -rf /var/lib/corosync/*

touch /var/lib/corosync/ringid_ (as above file name)
chmod 700 /var/lib/corosync/ringid_ (file is create 700 by corosync)
chown root:root ....

now it should match the same file as above but size 0 instead of 4/8.

corosync -f

corosync: totemsrp.c:3106: memb_ring_id_create_or_load: Assertion `res == sizeof (unsigned long long)' failed.
Aborted

independent of the architecture.

Suggested fix is always to unlink a file at startup time and recreate as needed, instead of rely on existing ones.

Comment 3 Fabio Massimo Di Nitto 2011-02-04 15:53:50 UTC
one more side note.. I have no idea _how_ i got a 0 len file there.. but it was there.

Comment 6 Steven Dake 2011-02-22 19:20:43 UTC
Created attachment 480217 [details]
upstream submitted patch to resolve this issue

Comment 10 Jaroslav Kortus 2011-03-02 15:34:25 UTC
verified with corosync-1.2.3-28.el6.x86_64.
corosync starts correctly when old file is there, new zero file is there or if big file is there instead (50M).

Comment 14 errata-xmlrpc 2011-05-19 14:24:18 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0764.html