Bug 185455

Summary: kernel dm snapshots: replace siblings list
Product: Red Hat Enterprise Linux 4 Reporter: Alasdair Kergon <agk>
Component: kernelAssignee: Alasdair Kergon <agk>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: jbaron, mbroz
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2006-0575 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-08-10 22:43:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 181409    

Description Alasdair Kergon 2006-03-14 21:52:45 UTC
The siblings "list" is used unsafely at the moment.
                                                                               
                                                     
Firstly, only the element on the list being changed gets locked (via the
snapshot lock), not the next and previous elements which have pointers that
are also being changed.
                                                                               
                                                     
Secondly, if you have two or more snapshots and write to the same chunk a
second time before every snapshot has finished making its private copy of the
data, if you're unlucky, _origin_write() could attempt its list_merge() and
dereference a 'last' pointer to a pending_exception structure that has just
been freed.
                                                                               
                                                     
Analysis reveals that the list is actually only there for reference counting.
If 5 pending_exceptions are needed in origin_write, then the 5 are joined
together into a 5-element list - without a separate list head because there's
nowhere suitable to store it.  As the pending_exceptions complete, they are
removed from the list one-by-one and any contents of origin_bios get moved
across to one of the remaining pending_exceptions on the list.  Whichever one
is last is detected because list_empty() is then true and the origin_bios get
submitted.
                                                                               
                                                     
The fix here uses an alternative reference counting mechanism by choosing
one of the pending_exceptions as primary and maintaining an atomic counter
there.

Comment 2 Jason Baron 2006-04-24 19:18:14 UTC
committed in 34.9

Comment 6 Red Hat Bugzilla 2006-08-10 22:43:10 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0575.html