Bug 175830 - dm-snap.c: Data read from snapshot may be corrupt if origin is being written to simultaneously
Summary: dm-snap.c: Data read from snapshot may be corrupt if origin is being written ...
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.3
Hardware: All
OS: Linux
Target Milestone: ---
: ---
Assignee: Mikuláš Patočka
QA Contact: Brian Brock
: 444049 (view as bug list)
Depends On:
Blocks: 176344 430698 459337 461304
TreeView+ depends on / blocked
Reported: 2005-12-15 16:18 UTC by Alasdair Kergon
Modified: 2009-05-18 19:32 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2009-05-18 19:32:26 UTC
Target Upstream Version:

Attachments (Terms of Use)
snapshot flaw demonstration script (1.34 KB, text/plain)
2005-12-15 16:18 UTC, Alasdair Kergon
no flags Details
update for detection script (3.90 KB, text/plain)
2005-12-15 22:59 UTC, Jonathan Earl Brassow
no flags Details
A patch for this bug (6.06 KB, patch)
2008-12-15 00:57 UTC, Mikuláš Patočka
no flags Details | Diff
An updated patch. (7.07 KB, patch)
2008-12-15 02:15 UTC, Mikuláš Patočka
no flags Details | Diff

System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:1024 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 4.8 kernel security and bug fix update 2009-05-18 14:57:26 UTC

Description Alasdair Kergon 2005-12-15 16:18:52 UTC
If you read from the snapshot at the same time as the origin is being written
to, the data you get back may be contaminated by what is being written to the

Neither the origin nor snapshot are actually corrupted on disk by this bug.
Rather it is read() that can return the wrong blocks of data, so anything
reading the snapshot device (such as a backup program) may see corruption.

The correct read path is fast compared to the time taken by the code in the path
that can race against it - I think this explains why occurrences of the problem
are fortunately rare and hard to reproduce.  (If you re-read the corrupt data,
it'll always be correct - the race can't happen twice on the same data.)

This might be partly responsible for bug 174742 and others.


                 * FIXME: this read path scares me because we
                 * always use the origin when we have a pending
                 * exception.  However I can't think of a
                 * situation where this is wrong - ejt.
                /* Do reads */
                /* See if it it has been remapped */
                e = lookup_exception(&s->complete, chunk);
                if (e)
                        remap_exception(s, e, bio);
                        bio->bi_bdev = s->origin->bdev;

As the FIXME suggests, there are situation when                                
              what it does is wrong.

  A write to origin creates a pe.
  A read from snapshot gets mapped to the origin.
  The pe processing is completed and the write to the origin completes.
  The snapshot read happens but sees what was just written to the origin.

A new dm target, rwsplit, was written to make it straightforward to reproduce
this scenario.  


This target sends reads to one device and writes to another.  The test script
attached uses this to suspend reads to a device whilst allowing writes to
proceed.  The script takes 3 copies of the snapshot device.  They should all be
identical as nothing is writing to the snapshot.  But the middle one gets
contaminated with the data written to the origin.  This happens every time on my
test machine.

A fix will have to add code to track the I/O through the snapshot target -
either directly (as elsewhere in dm targets) or possibly by extending the
pending exception code.

Comment 1 Alasdair Kergon 2005-12-15 16:18:52 UTC
Created attachment 122288 [details]
snapshot flaw demonstration script

Comment 19 Linda Wang 2007-01-25 14:57:16 UTC
Clearing flags for 4.6...

Comment 22 Rob Kenna 2007-08-23 18:29:20 UTC
re-asserting pm-ack for 4.7

Comment 23 Alasdair Kergon 2008-02-28 23:11:44 UTC
Mikulas is working on a solution.  Unclear as yet whether or not this will have
to be kABI-breaking.

Comment 25 Mikuláš Patočka 2008-03-25 21:00:00 UTC
Tom Coughlan: Fix for upstream exists (and doesn't break kABI). I don't know if
Alasdair wants to backport it to 4.7 so soon. Probably defer to 4.8.

Comment 26 RHEL Program Management 2008-03-26 14:39:12 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update

Comment 28 RHEL Program Management 2008-09-03 13:15:01 UTC
Updating PM score.

Comment 29 Mikuláš Patočka 2008-12-15 00:57:27 UTC
Created attachment 326893 [details]
A patch for this bug

A patch from this bug. Backported from upstream and RHEL 5.

Comment 30 Mikuláš Patočka 2008-12-15 02:15:52 UTC
Created attachment 326898 [details]
An updated patch.

The previous patch was wrong, it was only half of the patch. This is the updated correct patch.

Comment 31 Mikuláš Patočka 2009-01-08 09:47:13 UTC
*** Bug 444049 has been marked as a duplicate of this bug. ***

Comment 32 Vivek Goyal 2009-02-12 15:34:45 UTC
Committed in 81.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 36 errata-xmlrpc 2009-05-18 19:32:26 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.