Red Hat Bugzilla – Bug 712188
corosync-objctl -t run multiple times with heavy load deadlocks corosync
Last modified: 2016-04-26 10:05:50 EDT
Description of problem: During customer investigation of different issue, a scenario was found where it is possible to lock up corosync. This use model could occur in the typical cluster suite "reload" operatoin. Version-Release number of selected component (if applicable): corosync-1.2.3-36.el6 How reproducible: 100% Steps to Reproduce: 1. run corosync (or cluster suite) 2. run testa one time 3. run corosync-objctl -t test 4. run testall (make sure test is on the system in the cwd) Actual results: corosync deadlocks and stops processing and participating in cluster membership Expected results: corosync shouldn't deadlock Additional info:
Honza, Make sure to add a test case to the automated test suite for this scenario.
step 3 should be run atleast 2 times to generate the deadlock.
Created attachment 505074 [details] Proposed patch Patch sent to ML
Created attachment 505267 [details] Second version of proposed patch Patch sent to ML. It should fix all reviewer notes.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: A race condition when using the tracking functionality of the internal object database. Consequence: Corosync would lock-up under heavy load with contrived test cases. Fix: Resolved race condition. Result: Corosync now doesn't lock up with corosync-objctl -t is run multiple times with heavy objdb load.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1515.html