If you read from the snapshot at the same time as the origin is being written to, the data you get back may be contaminated by what is being written to the origin. Neither the origin nor snapshot are actually corrupted on disk by this bug. Rather it is read() that can return the wrong blocks of data, so anything reading the snapshot device (such as a backup program) may see corruption. The correct read path is fast compared to the time taken by the code in the path that can race against it - I think this explains why occurrences of the problem are fortunately rare and hard to reproduce. (If you re-read the corrupt data, it'll always be correct - the race can't happen twice on the same data.) This might be partly responsible for bug 174742 and others. Explanation ----------- dm-snap.c: /* * FIXME: this read path scares me because we * always use the origin when we have a pending * exception. However I can't think of a * situation where this is wrong - ejt. */ /* Do reads */ down_read(&s->lock); /* See if it it has been remapped */ e = lookup_exception(&s->complete, chunk); if (e) remap_exception(s, e, bio); else bio->bi_bdev = s->origin->bdev; up_read(&s->lock); As the FIXME suggests, there are situation when what it does is wrong. Consider: A write to origin creates a pe. A read from snapshot gets mapped to the origin. The pe processing is completed and the write to the origin completes. The snapshot read happens but sees what was just written to the origin. A new dm target, rwsplit, was written to make it straightforward to reproduce this scenario. http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-rwsplit.patch This target sends reads to one device and writes to another. The test script attached uses this to suspend reads to a device whilst allowing writes to proceed. The script takes 3 copies of the snapshot device. They should all be identical as nothing is writing to the snapshot. But the middle one gets contaminated with the data written to the origin. This happens every time on my test machine. A fix will have to add code to track the I/O through the snapshot target - either directly (as elsewhere in dm targets) or possibly by extending the pending exception code.
Created attachment 122288 [details] snapshot flaw demonstration script
Clearing flags for 4.6...
re-asserting pm-ack for 4.7
Mikulas is working on a solution. Unclear as yet whether or not this will have to be kABI-breaking.
Tom Coughlan: Fix for upstream exists (and doesn't break kABI). I don't know if Alasdair wants to backport it to 4.7 so soon. Probably defer to 4.8.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Updating PM score.
Created attachment 326893 [details] A patch for this bug A patch from this bug. Backported from upstream and RHEL 5.
Created attachment 326898 [details] An updated patch. The previous patch was wrong, it was only half of the patch. This is the updated correct patch.
*** Bug 444049 has been marked as a duplicate of this bug. ***
Committed in 81.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1024.html