Bug 849957 - blkrawverify announces bad events on i386 only
Summary: blkrawverify announces bad events on i386 only
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: blktrace
Version: 5.9
Hardware: i386
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Jeff Moyer
QA Contact: BaseOS QE Security Team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-08-21 10:19 UTC by Milos Malik
Modified: 2013-02-25 18:57 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-02-25 18:57:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
output of blktrace (734.26 KB, application/octet-stream)
2012-08-22 08:16 UTC, Milos Malik
no flags Details
bzip2 compressed output of blkrawverify (98.53 KB, application/x-bzip)
2012-08-22 08:17 UTC, Milos Malik
no flags Details

Description Milos Malik 2012-08-21 10:19:45 UTC
Description of problem:
blkrawverify announces bad events when blktrace is executed with -a complete, fs, queue or read parameter. The interesting part is that the problem appears on i386/i686 machines only.

Version-Release number of selected component (if applicable):
blktrace-1.0.0-6.el5
kernel-2.6.18-333.el5
kernel-devel-2.6.18-333.el5
kernel-headers-2.6.18-333.el5

How reproducible:
always

Steps to Reproduce:
# echo 1 > /proc/sys/vm/drop_caches
# mount -t debugfs debugfs /sys/kernel/debug/
# dd if=/dev/hda of=/dev/null bs=1M &
[1] 12980
# blktrace -w 1 -d /dev/hda -a complete
Device: /dev/hda
  CPU  0:                    0 events,       18 KiB data
  Total:                     0 events (dropped 0),       18 KiB data
# blkrawverify hda
Verifying hda
    CPU 0 -- 24 bad
Wrote output to hda.verify.out
# blkrawverify hda
Verifying hda
    CPU 0 -- 24 bad
Wrote output to hda.verify.out
# killall dd
[1]+  Terminated              dd if=/dev/hda of=/dev/null bs=1M
# umount /sys/kernel/debug/
#

Comment 1 Milos Malik 2012-08-21 10:30:29 UTC
Following packages were installed during the time I tested the above-mentioned scenario.

# rpm -qa | grep -e kernel -e blktrace | sort
blktrace-1.0.0-6.el5
kernel-2.6.18-333.el5
kernel-2.6.18-336.el5
kernel-devel-2.6.18-333.el5
kernel-devel-2.6.18-336.el5
kernel-headers-2.6.18-336.el5
#

The problem appeared when 2.6.18-333.el5 kernel was loaded in memory. Once I rebooted the machine and the newer 2.6.18-336.el5 kernel was loaded in memory the problem disappeared. Is it possible that blkrawverify used some information from kernel-headers package which was inconsistent with the running kernel at that time?

Comment 2 Milos Malik 2012-08-21 10:40:05 UTC
These packages were updated today. The problem disappeared after the reboot. But before the reboot the problem was visible, I swear.

Aug 21 10:27:07 Updated: kernel-headers-2.6.18-336.el5.i386
Aug 21 10:28:51 Updated: glibc-common-2.5-102.i386
Aug 21 10:30:04 Installed: kernel-2.6.18-336.el5.i686
Aug 21 10:31:39 Installed: kernel-devel-2.6.18-336.el5.i686
Aug 21 10:32:10 Updated: glibc-2.5-102.i686
Aug 21 10:32:20 Updated: quagga-0.98.6-11.el5.i386
Aug 21 10:32:23 Updated: nscd-2.5-102.i386
Aug 21 10:32:27 Updated: 1:autofs-5.0.1-0.rc2.170.el5.i386
Aug 21 10:32:38 Updated: glibc-headers-2.5-102.i386
Aug 21 10:32:42 Updated: glibc-devel-2.5-102.i386
Aug 21 10:32:51 Updated: ricci-0.12.2-60.el5.i386

Comment 3 Eric Sandeen 2012-08-21 19:23:43 UTC
kernel headers should have no effect at all.

> Wrote output to hda.verify.out

Ok, do you have that file?  What errors did it find?

Comment 4 Milos Malik 2012-08-22 08:16:11 UTC
Created attachment 606162 [details]
output of blktrace

Comment 5 Milos Malik 2012-08-22 08:17:10 UTC
Created attachment 606164 [details]
bzip2 compressed output of blkrawverify

Comment 6 Jeff Moyer 2013-01-02 23:06:02 UTC
The blktrace output file does not look right--it has all events, not just completion events.  Are you sure you attached the right one?

I was unable to reproduce the problem.  What are you using for storage?  Is this a virtual machine?

Comment 7 Jeff Moyer 2013-01-02 23:40:29 UTC
blkrawverify does the following:

                        else if (bit->time <= save_time) {
                                INC_BAD("time regression");
                                nbad_time++;
                                continue;
                        }

All of the reported errors are of the type, "time regression."  Given that this is difficult to reproduce, it may be that the system in question did not have a very accurate clock, or that some bug was present in the platform time-keeping code that got fixed between -333 and -336.  Looking more closely at the trace, it appears that the clock granularity is .1usec, which is indeed pretty coarse.  Milos, you're not running on Hyper-V by any chance, are you?

[virt] hv: add the Hyper-V virtual clocksource driver

That's the only changelog entry that looked potentially relevant to me.

Comment 8 Jeff Moyer 2013-02-25 18:57:18 UTC
I'm closing this bugzilla.  If you run into this problem again, you can feel free to reopen this bug, or file a new bug against the kernel, since it's very likely to be a problem somewhere in the timekeeping code.


Note You need to log in before you can comment on or make changes to this bug.