175830 – dm-snap.c: Data read from snapshot may be corrupt if origin is being written to simultaneously

Bug 175830 - dm-snap.c: Data read from snapshot may be corrupt if origin is being written to simultaneously

Summary: dm-snap.c: Data read from snapshot may be corrupt if origin is being written ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.3
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Mikuláš Patočka
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	444049 (view as bug list)
Depends On:
Blocks:	176344 430698 459337 461304
TreeView+	depends on / blocked

Reported:	2005-12-15 16:18 UTC by Alasdair Kergon
Modified:	2009-05-18 19:32 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-05-18 19:32:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
snapshot flaw demonstration script (1.34 KB, text/plain) 2005-12-15 16:18 UTC, Alasdair Kergon	no flags	Details
update for detection script (3.90 KB, text/plain) 2005-12-15 22:59 UTC, Jonathan Earl Brassow	no flags	Details
A patch for this bug (6.06 KB, patch) 2008-12-15 00:57 UTC, Mikuláš Patočka	no flags	Details \| Diff
An updated patch. (7.07 KB, patch) 2008-12-15 02:15 UTC, Mikuláš Patočka	no flags	Details \| Diff
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2009:1024	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 4.8 kernel security and bug fix update	2009-05-18 14:57:26 UTC

Description Alasdair Kergon 2005-12-15 16:18:52 UTC

If you read from the snapshot at the same time as the origin is being written
to, the data you get back may be contaminated by what is being written to the
origin.

Neither the origin nor snapshot are actually corrupted on disk by this bug.
Rather it is read() that can return the wrong blocks of data, so anything
reading the snapshot device (such as a backup program) may see corruption.

The correct read path is fast compared to the time taken by the code in the path
that can race against it - I think this explains why occurrences of the problem
are fortunately rare and hard to reproduce.  (If you re-read the corrupt data,
it'll always be correct - the race can't happen twice on the same data.)

This might be partly responsible for bug 174742 and others.



Explanation
-----------
dm-snap.c:

                /*
                 * FIXME: this read path scares me because we
                 * always use the origin when we have a pending
                 * exception.  However I can't think of a
                 * situation where this is wrong - ejt.
                 */
                                                                                
                /* Do reads */
                down_read(&s->lock);
                                                                                
                /* See if it it has been remapped */
                e = lookup_exception(&s->complete, chunk);
                if (e)
                        remap_exception(s, e, bio);
                else
                        bio->bi_bdev = s->origin->bdev;
                                                                                
                up_read(&s->lock);
                                                                                

As the FIXME suggests, there are situation when                                
              what it does is wrong.

Consider:                                         
  A write to origin creates a pe.
  A read from snapshot gets mapped to the origin.
  The pe processing is completed and the write to the origin completes.
  The snapshot read happens but sees what was just written to the origin.


A new dm target, rwsplit, was written to make it straightforward to reproduce
this scenario.  

http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-rwsplit.patch

This target sends reads to one device and writes to another.  The test script
attached uses this to suspend reads to a device whilst allowing writes to
proceed.  The script takes 3 copies of the snapshot device.  They should all be
identical as nothing is writing to the snapshot.  But the middle one gets
contaminated with the data written to the origin.  This happens every time on my
test machine.


A fix will have to add code to track the I/O through the snapshot target -
either directly (as elsewhere in dm targets) or possibly by extending the
pending exception code.

Comment 1 Alasdair Kergon 2005-12-15 16:18:52 UTC

Created attachment 122288 [details]
snapshot flaw demonstration script

Comment 19 Linda Wang 2007-01-25 14:57:16 UTC

Clearing flags for 4.6...

Comment 22 Rob Kenna 2007-08-23 18:29:20 UTC

re-asserting pm-ack for 4.7

Comment 23 Alasdair Kergon 2008-02-28 23:11:44 UTC

Mikulas is working on a solution.  Unclear as yet whether or not this will have
to be kABI-breaking.

Comment 25 Mikuláš Patočka 2008-03-25 21:00:00 UTC

Tom Coughlan: Fix for upstream exists (and doesn't break kABI). I don't know if
Alasdair wants to backport it to 4.7 so soon. Probably defer to 4.8.

Comment 26 RHEL Program Management 2008-03-26 14:39:12 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 28 RHEL Program Management 2008-09-03 13:15:01 UTC

Updating PM score.

Comment 29 Mikuláš Patočka 2008-12-15 00:57:27 UTC

Created attachment 326893 [details]
A patch for this bug

A patch from this bug. Backported from upstream and RHEL 5.

Comment 30 Mikuláš Patočka 2008-12-15 02:15:52 UTC

Created attachment 326898 [details]
An updated patch.

The previous patch was wrong, it was only half of the patch. This is the updated correct patch.

Comment 31 Mikuláš Patočka 2009-01-08 09:47:13 UTC

*** Bug 444049 has been marked as a duplicate of this bug. ***

Comment 32 Vivek Goyal 2009-02-12 15:34:45 UTC

Committed in 81.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 36 errata-xmlrpc 2009-05-18 19:32:26 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1024.html

Note You need to log in before you can comment on or make changes to this bug.