Bug 806570

Summary: EXT4 Data Validation error
Product: Red Hat Enterprise Linux 6 Reporter: jack
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.0CC: dchinner, esandeen, jbacik, lczerner, rwheeler
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-07-10 15:22:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description jack 2012-03-24 21:34:12 UTC
Description of problem:

Vdbench detects data validation error with ext4 filesystem.

Version-Release number of selected component (if applicable):

# uname -a
Linux VBIT-R415-60 2.6.32-71.el6.x86_64 #1 SMP Wed Sep 1 01:33:01 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 6.0 (Santiago)

# mount | grep sdb1
/dev/sdb1 on /mnt/sdb1 type ext4 (rw)

# df -h | grep sdb1
/dev/sdb1             197G   30G  158G  16% /mnt/sdb1

How reproducible:


Steps to Reproduce:
1. Create a clean 197G ext4 filesystem on /dev/sdb1.  Mount to /mnt/sdb1.
   # mkfs.ext4 /dev/sdb1
   # mount /dev/sdb1 /mnt/sdb1
2. Create a 10G file /mnt/sdb1/2 using dd.
   # dd if=/dev/zero of=/mnt/sdb1/2 bs=1M count=1000; sync
3. Run 'vdbench' with the following parameter file and the validation option "-vr".
   # vdbench -f ext4-conf -vr

[ext4-conf]
sd=sd1,lun=/mnt/sdb1/2,openflags=o_direct,hitarea=40M,align=4096
wd=wd1,sd=sd*,rdpct=50,rhpct=20,xfersize=9k,seekpct=20 
rd=run1,wd=wd1,iorate=8000,elapsed=100000,interval=1,forthreads=(30) 

4. Withing minutes, vdbench complains data validation errors.
  
Actual results:

<pre>
16:16:00.059 input argument scanned: '-fext4-conf'
16:16:00.061 input argument scanned: '-vr'
16:16:00.061 Vdbench distribution: vdbench502
16:16:00.061
16:16:00.064 java.vendor                   Oracle Corporation
16:16:00.064 java.home                     /usr/java/jre1.7.0_02
16:16:00.064 java.vm.specification.version 1.7
16:16:00.064 java.vm.version               22.0-b10
16:16:00.064 java.vm.vendor                Oracle Corporation
16:16:00.064 java.specification.version    1.7
16:16:00.065 java.class.version            51.0
16:16:00.065 user.name                     root
16:16:00.065 user.dir                      /root/vdbench
16:16:00.065 java.class.path               /home/jackh/svn/phast/utils/vdbench/:/home/jackh/svn/phast/utils/vdbench/classes:/home/jackh/svn/phast/utils/vdbench/vdbench.jar
16:16:00.065 os.name                       Linux
16:16:00.065 os.arch                       amd64
16:16:00.065 sun.arch.data.model           64
16:16:00.065 os.version                    2.6.32-71.el6.x86_64
16:16:00.113
16:16:00.113
16:16:00.146 Starting slave: /home/jackh/svn/phast/utils/vdbench/vdbench SlaveJvm -m localhost -n localhost-10-120324-16.16.00.036 -l localhost-0 -p 5570
16:16:00.147 Setting shared library to: /home/jackh/svn/phast/utils/vdbench/linux/linux64.so
16:16:00.454 Slave localhost-0 connected
16:16:00.500 All slaves are now connected
16:16:00.596 sd=sd1,lun=/mnt/sdb1/2 lun size: 10485760000 bytes; 9.7656 GB (1024**3); 10.4858 GB (1000**3)

Link to Run Definitions:         <A HREF="#_2028358271">run1 For loops: threads=30.0</A>

16:16:00.640 Sending to localhost-0  wd=wd1,sd=sd1,lun=/mnt/sdb1/2
16:16:00.692 SlaveList.printWorkForSlaves()
16:16:00.692 slv=localhost-0 rd=run1   wd=wd1    sd=sd1    rd= 50 sk= 20 sw=100.00 xf=   9216 rh= 20 hi=41943040 th=30
16:16:00.692
16:16:00.693 Waiting for synchronization of all slaves
16:16:01.094 Synchronization of all slaves complete
16:16:02.002 Starting RD=run1; I/O rate: 8000; elapsed=100000; For loops: threads=30.0

<a name="_2028358271"></a><i><b>16:16:02.002 Starting RD=run1; I/O rate: 8000; elapsed=100000; For loops: threads=30.0</b></i>


Mar 24, 2012  interval        i/o   MB/sec   bytes   read     resp     resp     resp    cpu%  cpu%
                             rate  1024**2     i/o    pct     time      max   stddev sys+usr   sys
16:16:03.047         1     798.00     7.01    9216  64.66   33.585  268.157   52.092     3.9   0.7
16:16:04.051         2     646.00     5.68    9216  66.10   47.907 1023.302   92.697     2.0   0.5
16:16:05.050         3     695.00     6.11    9216  67.05   42.071  658.773   86.608     1.2   0.2
16:16:06.049         4     661.00     5.81    9216  67.17   45.698  589.837   91.253     3.5   0.6
16:16:07.049         5     633.00     5.56    9216  66.19   47.130  826.167   92.532     0.9   0.3
16:16:08.048         6     325.00     2.86    9216  68.31   93.471  790.425  203.780     1.3   0.4
16:16:09.048         7     761.00     6.69    9216  66.10   37.443  576.405   81.603     0.8   0.2
16:16:10.048         8     673.00     5.92    9216  65.23   44.593  543.782   92.037     1.7   0.5
16:16:11.049         9     707.00     6.21    9216  66.20   45.951  665.933   94.492     0.6   0.2
16:16:12.049        10     653.00     5.74    9216  65.08   41.806  545.097   76.255     1.1   0.5
16:16:13.047        11     375.00     3.30    9216  65.33   83.685  838.302  205.073     3.0   0.3
16:16:14.047        12     808.00     7.10    9216  67.08   35.722  647.399   72.863     1.3   0.5
16:16:15.047        13     681.00     5.99    9216  65.93   48.958  698.005  114.482     1.0   0.5
16:16:16.047        14     623.00     5.48    9216  65.65   38.505  445.840   75.733     1.8   0.3
16:16:17.046        15     689.00     6.06    9216  65.75   47.685  774.575  107.320     0.5   0.2
16:16:18.046        16     358.00     3.15    9216  69.55   88.520  793.665  202.905     1.9   0.5
16:16:19.046        17     714.00     6.28    9216  68.63   39.860  477.947   75.923     1.8   0.3
16:16:20.046        18     686.00     6.03    9216  67.06   42.047  444.021   75.931     0.8   0.4
16:16:21.045        19     637.00     5.60    9216  66.72   50.795  866.817  108.397     1.2   0.3
16:16:22.045        20     667.00     5.86    9216  66.72   44.259  841.352   95.014     0.6   0.4
16:16:23.045        21     478.00     4.20    9216  66.11   64.765  732.467  154.815     0.4   0.2
16:16:23.740 localhost-0:
16:16:23.740 localhost-0: At least one Data Validation error detected.
16:16:23.740 localhost-0: All data in the disk block (assumed to be 512 bytes long) will
16:16:23.740 localhost-0: be printed in the errorlog; lines that do not compare will be
16:16:23.740 localhost-0: marked with '*'.
16:16:23.740 localhost-0:
16:16:23.740 localhost-0: Byte 0x00 -  0x07: Byte offset of this disk block
16:16:23.740 localhost-0: Byte 0x08 -  0x0f: Timestamp: number of microseconds since 1/1/1970
16:16:23.740 localhost-0: Byte 0x10        : Data Validation key from 1 - 126
16:16:23.740 localhost-0: Byte 0x11        : Checksum of timestamp
16:16:23.740 localhost-0: Byte 0x12 -  0x13: Reserved
16:16:23.740 localhost-0: Byte 0x14 -  0x1b: SD or FSD name in ASCII hexadecimal
16:16:23.740 localhost-0: Byte 0x1c -  0x1f: Reserved
16:16:23.740 localhost-0: Byte 0x20 - 0x1ff: 480 bytes of data
16:16:23.741 localhost-0:
16:16:23.741 localhost-0: On the left: the data that was expected ('.' marks unknown value).
16:16:23.741 localhost-0: On the right: the data that was found.
16:16:23.741 localhost-0: (When the data compare of the 480 data bytes results in a mismatch ONLY because of a wrong
16:16:23.741 localhost-0:  Data Validation key and/or wrong lba the reporting of that data will be suppressed.)
16:16:23.741 localhost-0:
16:16:23.741 localhost-0: dvpost: /mnt/sdb1/2 sd1 garbage  0x00000000 0x107edf000 9216 0xa 0x0 0x1d 0x1 0x0 0x0
16:16:23.741 localhost-0:
16:16:23.741 localhost-0:         Data Validation error for sd=sd1,lun=/mnt/sdb1/2; block lba: 0x107ee0400; xfersize: 9216; relative sector in block: 0x0a (10)
16:16:23.741 localhost-0:         ===> Data Validation Key miscompare.
16:16:23.741 localhost-0:         ===> Logical byte address miscompare.
16:16:23.741 localhost-0:         ===> SD or FSD name miscompare.
16:16:23.741 localhost-0:         ===> Data miscompare.
16:16:23.741 localhost-0:         The sector below was written Wednesday, December 31, 1969 19:00:00.000
16:16:23.742 localhost-0: 0x000*  00000001 07ee0400 ........ ........   00000000 00000000 00000000 00000000
16:16:23.742 localhost-0: 0x010*  01..0000 73643120 20202020 00000000   00000000 00000000 00000000 00000000
16:16:23.742 localhost-0: 0x020*  033b3780 019d9bc0 00cecde0 006766f0   00000000 00000000 00000000 00000000
16:16:23.742 localhost-0: 0x030*  0033b378 0019d9bc 000cecde 0006766f   00000000 00000000 00000000 00000000
16:16:23.742 localhost-0: 0x040*  8b0155ef ce82c42f ec430ccf fd23e8bf   00000000 00000000 00000000 00000000
16:16:23.742 localhost-0: 0x050*  f5939a87 f1cba39b f3e7bf15 f2f1b152   00000000 00000000 00000000 00000000
16:16:23.742 localhost-0: 0x060*  7978d8a9 b7be028c 5bdf0146 2def80a3   00000000 00000000 00000000 00000000
16:16:23.742 localhost-0: 0x070*  9df5ae89 c5f8b99c 62fc5cce 317e2e67   00000000 00000000 00000000 00000000
16:16:23.742 localhost-0: 0x080*  93bd79eb c2dcd22d ea6c07ce 753603e7   00000000 00000000 00000000 00000000
16:16:23.742 localhost-0: 0x090*  b1996f2b d3ced94d e2e5027e 7172813f   00000000 00000000 00000000 00000000
16:16:23.742 localhost-0: 0x0a0*  b3bb2e47 d2dff9fb e26d9225 fa34a7ca   00000000 00000000 00000000 00000000
16:16:23.742 localhost-0: 0x0b0*  7d1a53e5 b58f472a 5ac7a395 a661bf12   00000000 00000000 00000000 00000000
16:16:23.743 localhost-0: 0x0c0*  5330df89 a29a011c 514d008e 28a68047   00000000 00000000 00000000 00000000
16:16:23.743 localhost-0: 0x0d0*  9f512efb c4aaf9a5 e957120a 74ab8905   00000000 00000000 00000000 00000000
16:16:23.743 localhost-0: 0x0e0*  b157aa5a 58abd52d a757844e 53abc227   00000000 00000000 00000000 00000000
16:16:23.743 localhost-0: 0x0f0*  a2d78fcb da69a93d e636ba46 731b5d23   00000000 00000000 00000000 00000000
16:16:23.743 localhost-0: 0x100*  b28fc049 d2458efc 6922c77e 349163bf   00000000 00000000 00000000 00000000
16:16:23.743 localhost-0: 0x110*  914adf07 c3a7015b ead1ee75 fe6a99e2   00000000 00000000 00000000 00000000
16:16:23.743 localhost-0: 0x120*  7f354cf1 b498c8a0 5a4c6450 2d263228   00000000 00000000 00000000 00000000
16:16:23.743 localhost-0: 0x130*  16931914 0b498c8a 05a4c645 89d00dfa   00000000 00000000 00000000 00000000
16:16:23.743 localhost-0: 0x140*  44e806fd a9766da6 54bb36d3 a15ff5b1   00000000 00000000 00000000 00000000
16:16:23.743 localhost-0: 0x150*  dbad9400 6dd6ca00 36eb6500 1b75b280   00000000 00000000 00000000 00000000
16:16:23.743 localhost-0: 0x160*  0dbad940 06dd6ca0 036eb650 01b75b28   00000000 00000000 00000000 00000000
16:16:23.743 localhost-0: 0x170*  00dbad94 006dd6ca 0036eb65 8b191b6a   00000000 00000000 00000000 00000000
16:16:23.744 localhost-0: 0x180*  458c8db5 a9c42802 54e21401 a17364d8   00000000 00000000 00000000 00000000
16:16:23.744 localhost-0: 0x190*  50b9b26c 285cd936 142e6c9b 81155895   00000000 00000000 00000000 00000000
16:16:23.744 localhost-0: 0x1a0*  cb88c292 65c46149 b9e05e7c 5cf02f3e   00000000 00000000 00000000 00000000
16:16:23.744 localhost-0: 0x1b0*  2e78179f 9c3e6517 c51d5c53 e98cc0f1   00000000 00000000 00000000 00000000
16:16:23.744 localhost-0: 0x1c0*  ffc40ea0 7fe20750 3ff103a8 1ff881d4   00000000 00000000 00000000 00000000
16:16:23.744 localhost-0: 0x1d0*  0ffc40ea 07fe2075 88fd7ee2 447ebf71   00000000 00000000 00000000 00000000
16:16:23.744 localhost-0: 0x1e0*  a93d3160 549e98b0 2a4f4c58 1527a62c   00000000 00000000 00000000 00000000
16:16:23.744 localhost-0: 0x1f0*  0a93d316 0549e98b 89a69a1d cfd123d6   00000000 00000000 00000000 00000000
. . . (errors repeat) . . .

Expected results:

No data validation error.

Additional info:

Key factors seem to be : 
    sd=sd1,...,openflags=o_direct,...,align=4096
    wd=wd1,...,xfersize=9k,...

Comment 2 jack 2012-03-25 01:49:35 UTC
How reproducible:

100%

Comment 3 Ondrej Vasik 2012-03-25 06:08:26 UTC
Thanks for report, but it has nothing to do with filesystem component - which just creates and owns set of the basic system directories. Ext4 filesystem issues in most cases belongs to kernel - reassigning.

Comment 4 Ric Wheeler 2012-03-27 19:25:52 UTC
Hi Jack,

Please work issues through your Red Hat support team first. They help open the BZ and gather any needed data for us.

If you are not a Red Hat customer, you should report the issues to the appropriate upstream mailing list (linux-ext4 for this one).

Thanks!