Hide Forgot
Description of problem: # cman_tool join Relax-NG parser error : Reference SERVICE has no matching definition Version-Release number of selected component (if applicable): cman-3.0.12.1-14.el6.x86_64 How reproducible: sometimes Steps to Reproduce (cluster-wide): 1. run while true; do cman_tool join; cman_tool leave; done 2. feel lucky 3. hit ctrl-c couple of times Actual results: it may happen that the regenerated schema gets corrupted and cluster operations on that node are disabled until the error is resolved. Expected results: one of: * nothing like corruption happening * message telling users about ccs_update_schema * removing the cache automatically Additional info:
Created attachment 520048 [details] corrupted /var/lib/cluster
http://git.fedorahosted.org/git/?p=cluster.git;a=commitdiff;h=e236677a5c238673a94923c34dcd4f83f4c9374b The fix simply invalidates the ccs_update_schema cache if any trap is received, forcing a schema regeneration in the next run. I am unable to reproduce the issue reported after applying this patch. Jaroslav, I left the manually patched version of ccs_update_schema on the marathon cluster (all 5 nodes), for you to stress some more, and should be able to provide a build by Monday.
Works well with the patched version, I'm not able to reproduce it any more. The easiest way how to reproduce this is start the cman_join cman_leave loop on all nodes and kill -9 corosync on one, wait for messages like "cman_tool: Error leaving cluster: Operation already in progress" appearing rapidly and then try to stop the loop (ctrl-c, ctrl-q). Eventually it won't be able to start again. Thanks Fabio, I'll wait for an official build.
can't reproduce it in any longer with cman-3.0.12.1-19.el6.x86_64
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Do not document.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1516.html