Bug 675099 - when existing ring file is zero bytes, corosync aborts
Summary: when existing ring file is zero bytes, corosync aborts
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: corosync
Version: 6.1
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: rc
: 6.1
Assignee: Steven Dake
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks: 675206 696734
TreeView+ depends on / blocked
 
Reported: 2011-02-04 09:58 UTC by Fabio Massimo Di Nitto
Modified: 2016-04-26 13:31 UTC (History)
5 users (show)

Fixed In Version: corosync-1.2.3-27.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 675206 (view as bug list)
Environment:
Last Closed: 2011-05-19 14:24:18 UTC
Target Upstream Version:


Attachments (Terms of Use)
upstream submitted patch to resolve this issue (2.54 KB, patch)
2011-02-22 19:20 UTC, Steven Dake
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0764 0 normal SHIPPED_LIVE corosync bug fix update 2011-05-18 18:08:44 UTC

Description Fabio Massimo Di Nitto 2011-02-04 09:58:50 UTC
Description of problem:

on x86_64:
[root@rhel6-node2 corosync]# corosync -f
corosync: totemsrp.c:3073: memb_ring_id_create_or_load: Assertion `res == sizeof (unsigned long long)' failed.
Aborted

on i386 it starts, but any access will result in 100% cpu spinning.

Version-Release number of selected component (if applicable):

corosync-1.2.3-21.el6

How reproducible:

as shows above

Additional info:

nodes 1 and 2 are latest RHEL6.1.

Selinux enable/disable makes no difference (selinux is currently disable in the above assertion).

iptables are off.

Issue is triggered either via corosync standalone startup or via cman.

Comment 1 Fabio Massimo Di Nitto 2011-02-04 15:41:50 UTC
Some more information.

I found a bunch of files in /var/lib/corosync, including a ring_$somedata.

After removing that file (it was 0 bytes), corosync starts again.

The 100% cpu spinning is a different problem that I am investigating now.

Comment 2 Fabio Massimo Di Nitto 2011-02-04 15:50:28 UTC
Very easy to reproduce too:

start corosync

ls -als /var/lib/corosync/ring*

(take a note of the file name)

stop corosync

rm -rf /var/lib/corosync/*

touch /var/lib/corosync/ringid_ (as above file name)
chmod 700 /var/lib/corosync/ringid_ (file is create 700 by corosync)
chown root:root ....

now it should match the same file as above but size 0 instead of 4/8.

corosync -f

corosync: totemsrp.c:3106: memb_ring_id_create_or_load: Assertion `res == sizeof (unsigned long long)' failed.
Aborted

independent of the architecture.

Suggested fix is always to unlink a file at startup time and recreate as needed, instead of rely on existing ones.

Comment 3 Fabio Massimo Di Nitto 2011-02-04 15:53:50 UTC
one more side note.. I have no idea _how_ i got a 0 len file there.. but it was there.

Comment 6 Steven Dake 2011-02-22 19:20:43 UTC
Created attachment 480217 [details]
upstream submitted patch to resolve this issue

Comment 10 Jaroslav Kortus 2011-03-02 15:34:25 UTC
verified with corosync-1.2.3-28.el6.x86_64.
corosync starts correctly when old file is there, new zero file is there or if big file is there instead (50M).

Comment 14 errata-xmlrpc 2011-05-19 14:24:18 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0764.html


Note You need to log in before you can comment on or make changes to this bug.