Bug 444049 - kernel dm snapshot: corruption detected after write to origin volumes
kernel dm snapshot: corruption detected after write to origin volumes
Status: CLOSED DUPLICATE of bug 175830
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.6.z
All Linux
high Severity high
: rc
: ---
Assigned To: Mikulas Patocka
Corey Marthaler
: Regression
Depends On:
Blocks: 461297
  Show dependency treegraph
 
Reported: 2008-04-24 15:06 EDT by Corey Marthaler
Modified: 2009-01-08 04:47 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-01-08 04:47:13 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Fix read-realloc race in RHEL 4 (6.06 KB, patch)
2008-05-01 12:17 EDT, Mikulas Patocka
no flags Details | Diff

  None (edit)
Description Corey Marthaler 2008-04-24 15:06:53 EDT
Description of problem:
I been able to reproduce this issue now quite a few times while running the
4.6.Z regression tests.

I create an origin and a few snapshot volumes. Then with the tool
b_iogen/b_doio, I read the snap, then write to the origin, and verify that the
snap didn't change. I've been seeing that it has however. 

The snaps all have the same 32k chunksizes even though their names imply that is
not so.

  block_snap128 snapper    swi-a-  3.50G origin  13.82
  block_snap16  snapper    swi-a-  3.50G origin  23.31
  block_snap256 snapper    swi-a-  3.50G origin   8.59
  block_snap32  snapper    swi-a-  3.50G origin  20.27
  block_snap64  snapper    swi-a-  3.50G origin  17.32
  origin        snapper    owi-a-  4.00G


[root@taft-01 ~]# /usr/tests/sts-rhel4.6/bin/b_iogen -o -m random -f direct -i
15s -s write,writev -t1000b -T10000b -d
/dev/snapper/origin:/dev/snapper/block_snap256 |
/usr/tests/sts-rhel4.6/bin/b_doio -v
b_iogen starting up with the following:

Iterations:      15s
Seed:            31265
Offset-mode:     random
Single Pass:     off
Overlap Flag:    on
Mintrans:        512000
Maxtrans:        5120000
Syscalls:        write  writev
Flags:          direct

Test Devices:

Path                                                      Size
                                                        (bytes)
---------------------------------------------------------------
/dev/snapper/origin                                        4294967296
        Snap Devices:
                /dev/snapper/block_snap256
*** DATA COMPARISON ERROR /dev/snapper/block_snap256  (Offset 785754112) ***
Started at byte 55296
Expected 0x66(f) 0x74(t) 0x2d(-) 0x30(0) 0x31(1) 0x3a(:) 0x62(b) 0x5f(_) 0x64(d)
0x6f(o)  ...
Got 0x4b(K) 0x4b(K) 0x4b(K) 0x4b(K) 0x4b(K) 0x4b(K) 0x4b(K) 0x4b(K) 0x4b(K)
0x4b(K)  ...


Version-Release number of selected component (if applicable):
2.6.9-67.0.15.ELsmp
lvm2-2.02.27-2.el4_6.1

How reproducible:
Often
Comment 1 Mikulas Patocka 2008-04-24 16:21:22 EDT
Is the corruption permanent or transient? After the test finishes and there is no more any IO, do the snapshots contain correct data?  Transient corruption is a known problem, there is already fix for upstream. I don't know about any permanent corruption of snapshots with the same chunksize.
Comment 2 Mikulas Patocka 2008-05-01 12:17:28 EDT
Created attachment 304319 [details]
Fix read-realloc race in RHEL 4

Try this patch, if it fixed the test. It is backport for the pending upstream
patch that fixes the read-fs-realloc race condition.
Comment 8 RHEL Product and Program Management 2008-05-22 15:00:21 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 10 RHEL Product and Program Management 2008-05-22 15:24:01 EDT
This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being proposed as a blocker for this release.  

Please resolve ASAP.
Comment 12 RHEL Product and Program Management 2008-09-03 08:52:01 EDT
Updating PM score.
Comment 13 Mikulas Patocka 2009-01-08 04:47:13 EST
Resolving as a duplicate of bug #175830 --- it could be duplicate but it is impossible to say certainly without reproducing this bug and further testing. If anyone ever sees this bug after fix for #175830 was applied, reopen it.

*** This bug has been marked as a duplicate of bug 175830 ***

Note You need to log in before you can comment on or make changes to this bug.