Red Hat Bugzilla – Bug 806570
EXT4 Data Validation error
Last modified: 2012-07-10 11:22:14 EDT
Description of problem: Vdbench detects data validation error with ext4 filesystem. Version-Release number of selected component (if applicable): # uname -a Linux VBIT-R415-60 2.6.32-71.el6.x86_64 #1 SMP Wed Sep 1 01:33:01 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux # cat /etc/redhat-release Red Hat Enterprise Linux Server release 6.0 (Santiago) # mount | grep sdb1 /dev/sdb1 on /mnt/sdb1 type ext4 (rw) # df -h | grep sdb1 /dev/sdb1 197G 30G 158G 16% /mnt/sdb1 How reproducible: Steps to Reproduce: 1. Create a clean 197G ext4 filesystem on /dev/sdb1. Mount to /mnt/sdb1. # mkfs.ext4 /dev/sdb1 # mount /dev/sdb1 /mnt/sdb1 2. Create a 10G file /mnt/sdb1/2 using dd. # dd if=/dev/zero of=/mnt/sdb1/2 bs=1M count=1000; sync 3. Run 'vdbench' with the following parameter file and the validation option "-vr". # vdbench -f ext4-conf -vr [ext4-conf] sd=sd1,lun=/mnt/sdb1/2,openflags=o_direct,hitarea=40M,align=4096 wd=wd1,sd=sd*,rdpct=50,rhpct=20,xfersize=9k,seekpct=20 rd=run1,wd=wd1,iorate=8000,elapsed=100000,interval=1,forthreads=(30) 4. Withing minutes, vdbench complains data validation errors. Actual results: <pre> 16:16:00.059 input argument scanned: '-fext4-conf' 16:16:00.061 input argument scanned: '-vr' 16:16:00.061 Vdbench distribution: vdbench502 16:16:00.061 16:16:00.064 java.vendor Oracle Corporation 16:16:00.064 java.home /usr/java/jre1.7.0_02 16:16:00.064 java.vm.specification.version 1.7 16:16:00.064 java.vm.version 22.0-b10 16:16:00.064 java.vm.vendor Oracle Corporation 16:16:00.064 java.specification.version 1.7 16:16:00.065 java.class.version 51.0 16:16:00.065 user.name root 16:16:00.065 user.dir /root/vdbench 16:16:00.065 java.class.path /home/jackh/svn/phast/utils/vdbench/:/home/jackh/svn/phast/utils/vdbench/classes:/home/jackh/svn/phast/utils/vdbench/vdbench.jar 16:16:00.065 os.name Linux 16:16:00.065 os.arch amd64 16:16:00.065 sun.arch.data.model 64 16:16:00.065 os.version 2.6.32-71.el6.x86_64 16:16:00.113 16:16:00.113 16:16:00.146 Starting slave: /home/jackh/svn/phast/utils/vdbench/vdbench SlaveJvm -m localhost -n localhost-10-120324-16.16.00.036 -l localhost-0 -p 5570 16:16:00.147 Setting shared library to: /home/jackh/svn/phast/utils/vdbench/linux/linux64.so 16:16:00.454 Slave localhost-0 connected 16:16:00.500 All slaves are now connected 16:16:00.596 sd=sd1,lun=/mnt/sdb1/2 lun size: 10485760000 bytes; 9.7656 GB (1024**3); 10.4858 GB (1000**3) Link to Run Definitions: <A HREF="#_2028358271">run1 For loops: threads=30.0</A> 16:16:00.640 Sending to localhost-0 wd=wd1,sd=sd1,lun=/mnt/sdb1/2 16:16:00.692 SlaveList.printWorkForSlaves() 16:16:00.692 slv=localhost-0 rd=run1 wd=wd1 sd=sd1 rd= 50 sk= 20 sw=100.00 xf= 9216 rh= 20 hi=41943040 th=30 16:16:00.692 16:16:00.693 Waiting for synchronization of all slaves 16:16:01.094 Synchronization of all slaves complete 16:16:02.002 Starting RD=run1; I/O rate: 8000; elapsed=100000; For loops: threads=30.0 <a name="_2028358271"></a><i><b>16:16:02.002 Starting RD=run1; I/O rate: 8000; elapsed=100000; For loops: threads=30.0</b></i> Mar 24, 2012 interval i/o MB/sec bytes read resp resp resp cpu% cpu% rate 1024**2 i/o pct time max stddev sys+usr sys 16:16:03.047 1 798.00 7.01 9216 64.66 33.585 268.157 52.092 3.9 0.7 16:16:04.051 2 646.00 5.68 9216 66.10 47.907 1023.302 92.697 2.0 0.5 16:16:05.050 3 695.00 6.11 9216 67.05 42.071 658.773 86.608 1.2 0.2 16:16:06.049 4 661.00 5.81 9216 67.17 45.698 589.837 91.253 3.5 0.6 16:16:07.049 5 633.00 5.56 9216 66.19 47.130 826.167 92.532 0.9 0.3 16:16:08.048 6 325.00 2.86 9216 68.31 93.471 790.425 203.780 1.3 0.4 16:16:09.048 7 761.00 6.69 9216 66.10 37.443 576.405 81.603 0.8 0.2 16:16:10.048 8 673.00 5.92 9216 65.23 44.593 543.782 92.037 1.7 0.5 16:16:11.049 9 707.00 6.21 9216 66.20 45.951 665.933 94.492 0.6 0.2 16:16:12.049 10 653.00 5.74 9216 65.08 41.806 545.097 76.255 1.1 0.5 16:16:13.047 11 375.00 3.30 9216 65.33 83.685 838.302 205.073 3.0 0.3 16:16:14.047 12 808.00 7.10 9216 67.08 35.722 647.399 72.863 1.3 0.5 16:16:15.047 13 681.00 5.99 9216 65.93 48.958 698.005 114.482 1.0 0.5 16:16:16.047 14 623.00 5.48 9216 65.65 38.505 445.840 75.733 1.8 0.3 16:16:17.046 15 689.00 6.06 9216 65.75 47.685 774.575 107.320 0.5 0.2 16:16:18.046 16 358.00 3.15 9216 69.55 88.520 793.665 202.905 1.9 0.5 16:16:19.046 17 714.00 6.28 9216 68.63 39.860 477.947 75.923 1.8 0.3 16:16:20.046 18 686.00 6.03 9216 67.06 42.047 444.021 75.931 0.8 0.4 16:16:21.045 19 637.00 5.60 9216 66.72 50.795 866.817 108.397 1.2 0.3 16:16:22.045 20 667.00 5.86 9216 66.72 44.259 841.352 95.014 0.6 0.4 16:16:23.045 21 478.00 4.20 9216 66.11 64.765 732.467 154.815 0.4 0.2 16:16:23.740 localhost-0: 16:16:23.740 localhost-0: At least one Data Validation error detected. 16:16:23.740 localhost-0: All data in the disk block (assumed to be 512 bytes long) will 16:16:23.740 localhost-0: be printed in the errorlog; lines that do not compare will be 16:16:23.740 localhost-0: marked with '*'. 16:16:23.740 localhost-0: 16:16:23.740 localhost-0: Byte 0x00 - 0x07: Byte offset of this disk block 16:16:23.740 localhost-0: Byte 0x08 - 0x0f: Timestamp: number of microseconds since 1/1/1970 16:16:23.740 localhost-0: Byte 0x10 : Data Validation key from 1 - 126 16:16:23.740 localhost-0: Byte 0x11 : Checksum of timestamp 16:16:23.740 localhost-0: Byte 0x12 - 0x13: Reserved 16:16:23.740 localhost-0: Byte 0x14 - 0x1b: SD or FSD name in ASCII hexadecimal 16:16:23.740 localhost-0: Byte 0x1c - 0x1f: Reserved 16:16:23.740 localhost-0: Byte 0x20 - 0x1ff: 480 bytes of data 16:16:23.741 localhost-0: 16:16:23.741 localhost-0: On the left: the data that was expected ('.' marks unknown value). 16:16:23.741 localhost-0: On the right: the data that was found. 16:16:23.741 localhost-0: (When the data compare of the 480 data bytes results in a mismatch ONLY because of a wrong 16:16:23.741 localhost-0: Data Validation key and/or wrong lba the reporting of that data will be suppressed.) 16:16:23.741 localhost-0: 16:16:23.741 localhost-0: dvpost: /mnt/sdb1/2 sd1 garbage 0x00000000 0x107edf000 9216 0xa 0x0 0x1d 0x1 0x0 0x0 16:16:23.741 localhost-0: 16:16:23.741 localhost-0: Data Validation error for sd=sd1,lun=/mnt/sdb1/2; block lba: 0x107ee0400; xfersize: 9216; relative sector in block: 0x0a (10) 16:16:23.741 localhost-0: ===> Data Validation Key miscompare. 16:16:23.741 localhost-0: ===> Logical byte address miscompare. 16:16:23.741 localhost-0: ===> SD or FSD name miscompare. 16:16:23.741 localhost-0: ===> Data miscompare. 16:16:23.741 localhost-0: The sector below was written Wednesday, December 31, 1969 19:00:00.000 16:16:23.742 localhost-0: 0x000* 00000001 07ee0400 ........ ........ 00000000 00000000 00000000 00000000 16:16:23.742 localhost-0: 0x010* 01..0000 73643120 20202020 00000000 00000000 00000000 00000000 00000000 16:16:23.742 localhost-0: 0x020* 033b3780 019d9bc0 00cecde0 006766f0 00000000 00000000 00000000 00000000 16:16:23.742 localhost-0: 0x030* 0033b378 0019d9bc 000cecde 0006766f 00000000 00000000 00000000 00000000 16:16:23.742 localhost-0: 0x040* 8b0155ef ce82c42f ec430ccf fd23e8bf 00000000 00000000 00000000 00000000 16:16:23.742 localhost-0: 0x050* f5939a87 f1cba39b f3e7bf15 f2f1b152 00000000 00000000 00000000 00000000 16:16:23.742 localhost-0: 0x060* 7978d8a9 b7be028c 5bdf0146 2def80a3 00000000 00000000 00000000 00000000 16:16:23.742 localhost-0: 0x070* 9df5ae89 c5f8b99c 62fc5cce 317e2e67 00000000 00000000 00000000 00000000 16:16:23.742 localhost-0: 0x080* 93bd79eb c2dcd22d ea6c07ce 753603e7 00000000 00000000 00000000 00000000 16:16:23.742 localhost-0: 0x090* b1996f2b d3ced94d e2e5027e 7172813f 00000000 00000000 00000000 00000000 16:16:23.742 localhost-0: 0x0a0* b3bb2e47 d2dff9fb e26d9225 fa34a7ca 00000000 00000000 00000000 00000000 16:16:23.742 localhost-0: 0x0b0* 7d1a53e5 b58f472a 5ac7a395 a661bf12 00000000 00000000 00000000 00000000 16:16:23.743 localhost-0: 0x0c0* 5330df89 a29a011c 514d008e 28a68047 00000000 00000000 00000000 00000000 16:16:23.743 localhost-0: 0x0d0* 9f512efb c4aaf9a5 e957120a 74ab8905 00000000 00000000 00000000 00000000 16:16:23.743 localhost-0: 0x0e0* b157aa5a 58abd52d a757844e 53abc227 00000000 00000000 00000000 00000000 16:16:23.743 localhost-0: 0x0f0* a2d78fcb da69a93d e636ba46 731b5d23 00000000 00000000 00000000 00000000 16:16:23.743 localhost-0: 0x100* b28fc049 d2458efc 6922c77e 349163bf 00000000 00000000 00000000 00000000 16:16:23.743 localhost-0: 0x110* 914adf07 c3a7015b ead1ee75 fe6a99e2 00000000 00000000 00000000 00000000 16:16:23.743 localhost-0: 0x120* 7f354cf1 b498c8a0 5a4c6450 2d263228 00000000 00000000 00000000 00000000 16:16:23.743 localhost-0: 0x130* 16931914 0b498c8a 05a4c645 89d00dfa 00000000 00000000 00000000 00000000 16:16:23.743 localhost-0: 0x140* 44e806fd a9766da6 54bb36d3 a15ff5b1 00000000 00000000 00000000 00000000 16:16:23.743 localhost-0: 0x150* dbad9400 6dd6ca00 36eb6500 1b75b280 00000000 00000000 00000000 00000000 16:16:23.743 localhost-0: 0x160* 0dbad940 06dd6ca0 036eb650 01b75b28 00000000 00000000 00000000 00000000 16:16:23.743 localhost-0: 0x170* 00dbad94 006dd6ca 0036eb65 8b191b6a 00000000 00000000 00000000 00000000 16:16:23.744 localhost-0: 0x180* 458c8db5 a9c42802 54e21401 a17364d8 00000000 00000000 00000000 00000000 16:16:23.744 localhost-0: 0x190* 50b9b26c 285cd936 142e6c9b 81155895 00000000 00000000 00000000 00000000 16:16:23.744 localhost-0: 0x1a0* cb88c292 65c46149 b9e05e7c 5cf02f3e 00000000 00000000 00000000 00000000 16:16:23.744 localhost-0: 0x1b0* 2e78179f 9c3e6517 c51d5c53 e98cc0f1 00000000 00000000 00000000 00000000 16:16:23.744 localhost-0: 0x1c0* ffc40ea0 7fe20750 3ff103a8 1ff881d4 00000000 00000000 00000000 00000000 16:16:23.744 localhost-0: 0x1d0* 0ffc40ea 07fe2075 88fd7ee2 447ebf71 00000000 00000000 00000000 00000000 16:16:23.744 localhost-0: 0x1e0* a93d3160 549e98b0 2a4f4c58 1527a62c 00000000 00000000 00000000 00000000 16:16:23.744 localhost-0: 0x1f0* 0a93d316 0549e98b 89a69a1d cfd123d6 00000000 00000000 00000000 00000000 . . . (errors repeat) . . . Expected results: No data validation error. Additional info: Key factors seem to be : sd=sd1,...,openflags=o_direct,...,align=4096 wd=wd1,...,xfersize=9k,...
How reproducible: 100%
Thanks for report, but it has nothing to do with filesystem component - which just creates and owns set of the basic system directories. Ext4 filesystem issues in most cases belongs to kernel - reassigning.
Hi Jack, Please work issues through your Red Hat support team first. They help open the BZ and gather any needed data for us. If you are not a Red Hat customer, you should report the issues to the appropriate upstream mailing list (linux-ext4 for this one). Thanks!