Description of problem:
During customer investigation of different issue, a scenario was found where it is possible to lock up corosync. This use model could occur in the typical cluster suite "reload" operatoin.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. run corosync (or cluster suite)
2. run testa one time
3. run corosync-objctl -t test
4. run testall (make sure test is on the system in the cwd)
corosync deadlocks and stops processing and participating in cluster membership
corosync shouldn't deadlock
Make sure to add a test case to the automated test suite for this scenario.
step 3 should be run atleast 2 times to generate the deadlock.
Created attachment 505074 [details]
Patch sent to ML
Created attachment 505267 [details]
Second version of proposed patch
Patch sent to ML. It should fix all reviewer notes.
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
Cause: A race condition when using the tracking functionality of the internal object database.
Consequence: Corosync would lock-up under heavy load with contrived test cases.
Fix: Resolved race condition.
Result: Corosync now doesn't lock up with corosync-objctl -t is run multiple times with heavy objdb load.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.