Bug 733424 - Relax-NG errors reported after cman_tool join; cman_tool leave is interrupted
Summary: Relax-NG errors reported after cman_tool join; cman_tool leave is interrupted
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: cluster
Version: 6.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Fabio Massimo Di Nitto
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-08-25 16:51 UTC by Jaroslav Kortus
Modified: 2011-12-06 14:53 UTC (History)
5 users (show)

Fixed In Version: cluster-3.0.12.1-18.el6
Doc Type: Bug Fix
Doc Text:
Do not document.
Clone Of:
Environment:
Last Closed: 2011-12-06 14:53:11 UTC


Attachments (Terms of Use)
corrupted /var/lib/cluster (27.90 KB, application/x-gzip)
2011-08-26 09:39 UTC, Jaroslav Kortus
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:1516 normal SHIPPED_LIVE cluster and gfs2-utils bug fix update 2011-12-06 00:51:09 UTC

Description Jaroslav Kortus 2011-08-25 16:51:30 UTC
Description of problem:
# cman_tool join
Relax-NG parser error : Reference SERVICE has no matching definition

Version-Release number of selected component (if applicable):
cman-3.0.12.1-14.el6.x86_64

How reproducible:
sometimes

Steps to Reproduce (cluster-wide):
1. run while true; do cman_tool join; cman_tool leave; done
2. feel lucky
3. hit ctrl-c couple of times
  
Actual results:
it may happen that the regenerated schema gets corrupted and cluster operations on that node are disabled until the error is resolved.

Expected results:
one of:
* nothing like corruption happening
* message telling users about ccs_update_schema
* removing the cache automatically

Additional info:

Comment 5 Jaroslav Kortus 2011-08-26 09:39:48 UTC
Created attachment 520048 [details]
corrupted /var/lib/cluster

Comment 9 Fabio Massimo Di Nitto 2011-09-03 09:20:50 UTC
http://git.fedorahosted.org/git/?p=cluster.git;a=commitdiff;h=e236677a5c238673a94923c34dcd4f83f4c9374b

The fix simply invalidates the ccs_update_schema cache if any trap is received, forcing a schema regeneration in the next run.

I am unable to reproduce the issue reported after applying this patch.

Jaroslav, I left the manually patched version of ccs_update_schema on the marathon cluster (all 5 nodes), for you to stress some more, and should be able to provide a build by Monday.

Comment 10 Jaroslav Kortus 2011-09-05 11:51:22 UTC
Works well with the patched version, I'm not able to reproduce it any more.

The easiest way how to reproduce this is start the cman_join cman_leave loop on all nodes and kill -9 corosync on one, wait for messages like "cman_tool: Error leaving cluster: Operation already in progress" appearing rapidly and then try to stop the loop (ctrl-c, ctrl-q). Eventually it won't be able to start again.

Thanks Fabio, I'll wait for an official build.

Comment 11 Jaroslav Kortus 2011-09-12 13:44:01 UTC
can't reproduce it in any longer with cman-3.0.12.1-19.el6.x86_64

Comment 12 Lon Hohberger 2011-10-26 22:36:22 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Do not document.

Comment 13 errata-xmlrpc 2011-12-06 14:53:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1516.html


Note You need to log in before you can comment on or make changes to this bug.