Bug 924261

Summary: cfg service: When shutdown cannot be processed immediately, it's not possible to repeat
Product: Red Hat Enterprise Linux 6 Reporter: Jan Friesse <jfriesse>
Component: corosyncAssignee: Jan Friesse <jfriesse>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: high    
Version: 6.4CC: cluster-maint, jkortus, sdake, slevine, sradvan
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: corosync-1.4.1-16.el6 Doc Type: Bug Fix
Doc Text:
Cause: In very rare conditions, running corosync-cfgtool -H. Consequence: corosync-cfgtool -H returns error 6 (CS_ERR_TRY_AGAIN). When command is called again, error (CS_ERR_EXISTS) is always returned. Fix: corosync-cfgtool is changed to automatically retry (so no longer error 6 - CS_ERR_TRY_AGAIN). cfg library now allows call shutdown function multiple times. Result: corosync-cfgtool -H works all time.
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-11-21 04:33:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 907894, 960054    
Attachments:
Description Flags
Proposed patch - part 1 - When send_shutdown fails, clear shutdown_con
none
Proposed patch - part 2 - cfgtool Retry shutdown on CS_ERR_TRY_AGAIN none

Description Jan Friesse 2013-03-21 12:57:12 UTC
Created attachment 713828 [details]
Proposed patch - part 1 - When send_shutdown fails, clear shutdown_con

Description of problem:
When send_shutdown fails (usually CS_ERR_TRY_AGAIN), shutdown_con is still set, and next call will check that shutdown_con is set and refuses to shutdown. Also in corosync-cfgtool, shutdown should be repeated if err = CS_ERR_TRY_AGAIN.

Version-Release number of selected component (if applicable):
Any

How reproducible:
0.000001% with blackbox testing

Steps to Reproduce:
1. https://github.com/jfriesse/csts/blob/master/tests/start-cfgstop-one-by-one-with-load.sh
2. From time to time, result is (Can't shutdown, error 6) (error 6 is CS_ERR_TRY_AGAIN). Recalling of function (with patch 2) returns CS_ERR_EXISTS.

Actual results:
corosync-cfgtool -H doesn't work with high loaded cluster

Expected results:
corosync-cfgtool -H Always works

Additional info:

Comment 1 Jan Friesse 2013-03-21 13:03:41 UTC
Created attachment 713830 [details]
Proposed patch - part 2 - cfgtool Retry shutdown on CS_ERR_TRY_AGAIN

Comment 11 errata-xmlrpc 2013-11-21 04:33:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1531.html