Description of problem: blkrawverify announces bad events when blktrace is executed with -a complete, fs, queue or read parameter. The interesting part is that the problem appears on i386/i686 machines only. Version-Release number of selected component (if applicable): blktrace-1.0.0-6.el5 kernel-2.6.18-333.el5 kernel-devel-2.6.18-333.el5 kernel-headers-2.6.18-333.el5 How reproducible: always Steps to Reproduce: # echo 1 > /proc/sys/vm/drop_caches # mount -t debugfs debugfs /sys/kernel/debug/ # dd if=/dev/hda of=/dev/null bs=1M & [1] 12980 # blktrace -w 1 -d /dev/hda -a complete Device: /dev/hda CPU 0: 0 events, 18 KiB data Total: 0 events (dropped 0), 18 KiB data # blkrawverify hda Verifying hda CPU 0 -- 24 bad Wrote output to hda.verify.out # blkrawverify hda Verifying hda CPU 0 -- 24 bad Wrote output to hda.verify.out # killall dd [1]+ Terminated dd if=/dev/hda of=/dev/null bs=1M # umount /sys/kernel/debug/ #
Following packages were installed during the time I tested the above-mentioned scenario. # rpm -qa | grep -e kernel -e blktrace | sort blktrace-1.0.0-6.el5 kernel-2.6.18-333.el5 kernel-2.6.18-336.el5 kernel-devel-2.6.18-333.el5 kernel-devel-2.6.18-336.el5 kernel-headers-2.6.18-336.el5 # The problem appeared when 2.6.18-333.el5 kernel was loaded in memory. Once I rebooted the machine and the newer 2.6.18-336.el5 kernel was loaded in memory the problem disappeared. Is it possible that blkrawverify used some information from kernel-headers package which was inconsistent with the running kernel at that time?
These packages were updated today. The problem disappeared after the reboot. But before the reboot the problem was visible, I swear. Aug 21 10:27:07 Updated: kernel-headers-2.6.18-336.el5.i386 Aug 21 10:28:51 Updated: glibc-common-2.5-102.i386 Aug 21 10:30:04 Installed: kernel-2.6.18-336.el5.i686 Aug 21 10:31:39 Installed: kernel-devel-2.6.18-336.el5.i686 Aug 21 10:32:10 Updated: glibc-2.5-102.i686 Aug 21 10:32:20 Updated: quagga-0.98.6-11.el5.i386 Aug 21 10:32:23 Updated: nscd-2.5-102.i386 Aug 21 10:32:27 Updated: 1:autofs-5.0.1-0.rc2.170.el5.i386 Aug 21 10:32:38 Updated: glibc-headers-2.5-102.i386 Aug 21 10:32:42 Updated: glibc-devel-2.5-102.i386 Aug 21 10:32:51 Updated: ricci-0.12.2-60.el5.i386
kernel headers should have no effect at all. > Wrote output to hda.verify.out Ok, do you have that file? What errors did it find?
Created attachment 606162 [details] output of blktrace
Created attachment 606164 [details] bzip2 compressed output of blkrawverify
The blktrace output file does not look right--it has all events, not just completion events. Are you sure you attached the right one? I was unable to reproduce the problem. What are you using for storage? Is this a virtual machine?
blkrawverify does the following: else if (bit->time <= save_time) { INC_BAD("time regression"); nbad_time++; continue; } All of the reported errors are of the type, "time regression." Given that this is difficult to reproduce, it may be that the system in question did not have a very accurate clock, or that some bug was present in the platform time-keeping code that got fixed between -333 and -336. Looking more closely at the trace, it appears that the clock granularity is .1usec, which is indeed pretty coarse. Milos, you're not running on Hyper-V by any chance, are you? [virt] hv: add the Hyper-V virtual clocksource driver That's the only changelog entry that looked potentially relevant to me.
I'm closing this bugzilla. If you run into this problem again, you can feel free to reopen this bug, or file a new bug against the kernel, since it's very likely to be a problem somewhere in the timekeeping code.