Bug 444049

Summary: kernel dm snapshot: corruption detected after write to origin volumes
Product: Red Hat Enterprise Linux 4 Reporter: Corey Marthaler <cmarthal>
Component: kernelAssignee: Mikuláš Patočka <mpatocka>
Status: CLOSED DUPLICATE QA Contact: Corey Marthaler <cmarthal>
Severity: high Docs Contact:
Priority: high    
Version: 4.6.zCC: agk, dwysocha, edamato, heinzm, jbrassow, mbroz, mpatocka, prockai
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-01-08 09:47:13 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 461297    
Attachments:
Description Flags
Fix read-realloc race in RHEL 4 none

Description Corey Marthaler 2008-04-24 19:06:53 UTC
Description of problem:
I been able to reproduce this issue now quite a few times while running the
4.6.Z regression tests.

I create an origin and a few snapshot volumes. Then with the tool
b_iogen/b_doio, I read the snap, then write to the origin, and verify that the
snap didn't change. I've been seeing that it has however. 

The snaps all have the same 32k chunksizes even though their names imply that is
not so.

  block_snap128 snapper    swi-a-  3.50G origin  13.82
  block_snap16  snapper    swi-a-  3.50G origin  23.31
  block_snap256 snapper    swi-a-  3.50G origin   8.59
  block_snap32  snapper    swi-a-  3.50G origin  20.27
  block_snap64  snapper    swi-a-  3.50G origin  17.32
  origin        snapper    owi-a-  4.00G


[root@taft-01 ~]# /usr/tests/sts-rhel4.6/bin/b_iogen -o -m random -f direct -i
15s -s write,writev -t1000b -T10000b -d
/dev/snapper/origin:/dev/snapper/block_snap256 |
/usr/tests/sts-rhel4.6/bin/b_doio -v
b_iogen starting up with the following:

Iterations:      15s
Seed:            31265
Offset-mode:     random
Single Pass:     off
Overlap Flag:    on
Mintrans:        512000
Maxtrans:        5120000
Syscalls:        write  writev
Flags:          direct

Test Devices:

Path                                                      Size
                                                        (bytes)
---------------------------------------------------------------
/dev/snapper/origin                                        4294967296
        Snap Devices:
                /dev/snapper/block_snap256
*** DATA COMPARISON ERROR /dev/snapper/block_snap256  (Offset 785754112) ***
Started at byte 55296
Expected 0x66(f) 0x74(t) 0x2d(-) 0x30(0) 0x31(1) 0x3a(:) 0x62(b) 0x5f(_) 0x64(d)
0x6f(o)  ...
Got 0x4b(K) 0x4b(K) 0x4b(K) 0x4b(K) 0x4b(K) 0x4b(K) 0x4b(K) 0x4b(K) 0x4b(K)
0x4b(K)  ...


Version-Release number of selected component (if applicable):
2.6.9-67.0.15.ELsmp
lvm2-2.02.27-2.el4_6.1

How reproducible:
Often

Comment 1 Mikuláš Patočka 2008-04-24 20:21:22 UTC
Is the corruption permanent or transient? After the test finishes and there is no more any IO, do the snapshots contain correct data?  Transient corruption is a known problem, there is already fix for upstream. I don't know about any permanent corruption of snapshots with the same chunksize.

Comment 2 Mikuláš Patočka 2008-05-01 16:17:28 UTC
Created attachment 304319 [details]
Fix read-realloc race in RHEL 4

Try this patch, if it fixed the test. It is backport for the pending upstream
patch that fixes the read-fs-realloc race condition.

Comment 8 RHEL Program Management 2008-05-22 19:00:21 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 10 RHEL Program Management 2008-05-22 19:24:01 UTC
This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being proposed as a blocker for this release.  

Please resolve ASAP.

Comment 12 RHEL Program Management 2008-09-03 12:52:01 UTC
Updating PM score.

Comment 13 Mikuláš Patočka 2009-01-08 09:47:13 UTC
Resolving as a duplicate of bug #175830 --- it could be duplicate but it is impossible to say certainly without reproducing this bug and further testing. If anyone ever sees this bug after fix for #175830 was applied, reopen it.

*** This bug has been marked as a duplicate of bug 175830 ***