Bug 849957 - blkrawverify announces bad events on i386 only
blkrawverify announces bad events on i386 only
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: blktrace (Show other bugs)
i386 Linux
unspecified Severity medium
: rc
: ---
Assigned To: Jeff Moyer
BaseOS QE Security Team
Depends On:
  Show dependency treegraph
Reported: 2012-08-21 06:19 EDT by Milos Malik
Modified: 2013-02-25 13:57 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2013-02-25 13:57:18 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
output of blktrace (734.26 KB, application/octet-stream)
2012-08-22 04:16 EDT, Milos Malik
no flags Details
bzip2 compressed output of blkrawverify (98.53 KB, application/x-bzip)
2012-08-22 04:17 EDT, Milos Malik
no flags Details

  None (edit)
Description Milos Malik 2012-08-21 06:19:45 EDT
Description of problem:
blkrawverify announces bad events when blktrace is executed with -a complete, fs, queue or read parameter. The interesting part is that the problem appears on i386/i686 machines only.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
# echo 1 > /proc/sys/vm/drop_caches
# mount -t debugfs debugfs /sys/kernel/debug/
# dd if=/dev/hda of=/dev/null bs=1M &
[1] 12980
# blktrace -w 1 -d /dev/hda -a complete
Device: /dev/hda
  CPU  0:                    0 events,       18 KiB data
  Total:                     0 events (dropped 0),       18 KiB data
# blkrawverify hda
Verifying hda
    CPU 0 -- 24 bad
Wrote output to hda.verify.out
# blkrawverify hda
Verifying hda
    CPU 0 -- 24 bad
Wrote output to hda.verify.out
# killall dd
[1]+  Terminated              dd if=/dev/hda of=/dev/null bs=1M
# umount /sys/kernel/debug/
Comment 1 Milos Malik 2012-08-21 06:30:29 EDT
Following packages were installed during the time I tested the above-mentioned scenario.

# rpm -qa | grep -e kernel -e blktrace | sort

The problem appeared when 2.6.18-333.el5 kernel was loaded in memory. Once I rebooted the machine and the newer 2.6.18-336.el5 kernel was loaded in memory the problem disappeared. Is it possible that blkrawverify used some information from kernel-headers package which was inconsistent with the running kernel at that time?
Comment 2 Milos Malik 2012-08-21 06:40:05 EDT
These packages were updated today. The problem disappeared after the reboot. But before the reboot the problem was visible, I swear.

Aug 21 10:27:07 Updated: kernel-headers-2.6.18-336.el5.i386
Aug 21 10:28:51 Updated: glibc-common-2.5-102.i386
Aug 21 10:30:04 Installed: kernel-2.6.18-336.el5.i686
Aug 21 10:31:39 Installed: kernel-devel-2.6.18-336.el5.i686
Aug 21 10:32:10 Updated: glibc-2.5-102.i686
Aug 21 10:32:20 Updated: quagga-0.98.6-11.el5.i386
Aug 21 10:32:23 Updated: nscd-2.5-102.i386
Aug 21 10:32:27 Updated: 1:autofs-5.0.1-0.rc2.170.el5.i386
Aug 21 10:32:38 Updated: glibc-headers-2.5-102.i386
Aug 21 10:32:42 Updated: glibc-devel-2.5-102.i386
Aug 21 10:32:51 Updated: ricci-0.12.2-60.el5.i386
Comment 3 Eric Sandeen 2012-08-21 15:23:43 EDT
kernel headers should have no effect at all.

> Wrote output to hda.verify.out

Ok, do you have that file?  What errors did it find?
Comment 4 Milos Malik 2012-08-22 04:16:11 EDT
Created attachment 606162 [details]
output of blktrace
Comment 5 Milos Malik 2012-08-22 04:17:10 EDT
Created attachment 606164 [details]
bzip2 compressed output of blkrawverify
Comment 6 Jeff Moyer 2013-01-02 18:06:02 EST
The blktrace output file does not look right--it has all events, not just completion events.  Are you sure you attached the right one?

I was unable to reproduce the problem.  What are you using for storage?  Is this a virtual machine?
Comment 7 Jeff Moyer 2013-01-02 18:40:29 EST
blkrawverify does the following:

                        else if (bit->time <= save_time) {
                                INC_BAD("time regression");

All of the reported errors are of the type, "time regression."  Given that this is difficult to reproduce, it may be that the system in question did not have a very accurate clock, or that some bug was present in the platform time-keeping code that got fixed between -333 and -336.  Looking more closely at the trace, it appears that the clock granularity is .1usec, which is indeed pretty coarse.  Milos, you're not running on Hyper-V by any chance, are you?

[virt] hv: add the Hyper-V virtual clocksource driver

That's the only changelog entry that looked potentially relevant to me.
Comment 8 Jeff Moyer 2013-02-25 13:57:18 EST
I'm closing this bugzilla.  If you run into this problem again, you can feel free to reopen this bug, or file a new bug against the kernel, since it's very likely to be a problem somewhere in the timekeeping code.

Note You need to log in before you can comment on or make changes to this bug.