Bug 712115
Summary: | corosync confdb connection can cause segfault | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Jan Friesse <jfriesse> | ||||||||||
Component: | corosync | Assignee: | Jan Friesse <jfriesse> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||||||||
Severity: | high | Docs Contact: | |||||||||||
Priority: | high | ||||||||||||
Version: | 6.1 | CC: | cluster-maint, djansa, jkortus, sdake | ||||||||||
Target Milestone: | rc | ||||||||||||
Target Release: | 6.2 | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | corosync-1.4.0-1.el6 | Doc Type: | Bug Fix | ||||||||||
Doc Text: |
Cause: A race condition in the internal confdb data storage system would had incorrect mutual exclusion.
Consequence: Corosync would segfault under rare and contrived circumstances.
Fix: The race condition was fixed.
Result: Corosync no longer segfaults.
|
Story Points: | --- | ||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2011-12-06 11:51:01 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Created attachment 503907 [details]
Patch for second problem
Created attachment 503909 [details]
test-confdb patch which checks first problem in valgrind
Corosync must be running thru valgrind
Patches posted to ML Created attachment 504088 [details]
First patch backprted to current RHEL 6 package
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: A race condition in the internal confdb data storage system would had incorrect mutual exclusion. Consequence: Corosync would segfault under rare and contrived circumstances. Fix: The race condition was fixed. Result: Corosync no longer segfaults. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1515.html |
Created attachment 503906 [details] Patch for first problem Description of problem: Problem 1: in confdb_object_iter result of object_find_create is now properly checked. object_find_create can return -1 if object doesn't exists. Without this check, incorrect handle (memory garbage) was directly passed to object_find_next. Problem 2: Following situation could happen: - process 1 thru confdb creates find handle - calls find iteration once - different process 2 deletes object pointed by process 1 iterator - process 1 calls iteration again -> object_find_instance->find_child_list is invalid pointer -> segfault Now object_find_create creates array of matching object handlers and object_find_next uses that array together with check for name. This prevents situation where between steps 2 and 3 new object is created with different name but sadly with same handler. Version-Release number of selected component (if applicable): Corosync master How reproducible: Often but it's race so depends on HW, ... Problem 1 is visible in valgrind. Steps to Reproduce: One node. # for i in `seq 1 5`;do (while true;do corosync-objctl -a | grep closed;done)& done # corosync -f Actual results: segfault Expected results: no segfault Additional info: