Bug 712115

Summary: corosync confdb connection can cause segfault
Product: Red Hat Enterprise Linux 6 Reporter: Jan Friesse <jfriesse>
Component: corosyncAssignee: Jan Friesse <jfriesse>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: high    
Version: 6.1CC: cluster-maint, djansa, jkortus, sdake
Target Milestone: rc   
Target Release: 6.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: corosync-1.4.0-1.el6 Doc Type: Bug Fix
Doc Text:
Cause: A race condition in the internal confdb data storage system would had incorrect mutual exclusion. Consequence: Corosync would segfault under rare and contrived circumstances. Fix: The race condition was fixed. Result: Corosync no longer segfaults.
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-06 11:51:01 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Patch for first problem
none
Patch for second problem
none
test-confdb patch which checks first problem in valgrind
none
First patch backprted to current RHEL 6 package none

Description Jan Friesse 2011-06-09 14:32:59 UTC
Created attachment 503906 [details]
Patch for first problem

Description of problem:
Problem 1:
in confdb_object_iter result of object_find_create is now properly
checked. object_find_create can return -1 if object doesn't exists.
Without this check, incorrect handle (memory garbage) was directly
passed to object_find_next.

Problem 2:
Following situation could happen:
- process 1 thru confdb creates find handle
- calls find iteration once
- different process 2 deletes object pointed by process 1 iterator
- process 1 calls iteration again ->
  object_find_instance->find_child_list is invalid pointer

-> segfault

Now object_find_create creates array of matching object handlers and
object_find_next uses that array together with check for name. This
prevents situation where between steps 2 and 3 new object is created
with different name but sadly with same handler.

Version-Release number of selected component (if applicable):
Corosync master

How reproducible:
Often but it's race so depends on HW, ... Problem 1 is visible in valgrind.

Steps to Reproduce:
One node.
# for i in `seq 1 5`;do (while true;do corosync-objctl -a | grep closed;done)& done 
# corosync -f
  
Actual results:
segfault

Expected results:
no segfault

Additional info:

Comment 1 Jan Friesse 2011-06-09 14:33:43 UTC
Created attachment 503907 [details]
Patch for second problem

Comment 2 Jan Friesse 2011-06-09 14:35:39 UTC
Created attachment 503909 [details]
test-confdb patch which checks first problem in valgrind

Corosync must be running thru valgrind

Comment 3 Jan Friesse 2011-06-09 14:36:01 UTC
Patches posted to ML

Comment 5 Jan Friesse 2011-06-10 12:15:28 UTC
Created attachment 504088 [details]
First patch backprted to current RHEL 6 package

Comment 10 Steven Dake 2011-10-27 18:47:37 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: A race condition in the internal confdb data storage system would had incorrect mutual exclusion.
  Consequence: Corosync would segfault under rare and contrived circumstances.
  Fix: The race condition was fixed.
  Result: Corosync no longer segfaults.

Comment 11 errata-xmlrpc 2011-12-06 11:51:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1515.html