849957 – blkrawverify announces bad events on i386 only

Bug 849957 - blkrawverify announces bad events on i386 only

Summary: blkrawverify announces bad events on i386 only

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	blktrace
Sub Component:
Version:	5.9
Hardware:	i386
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Jeff Moyer
QA Contact:	BaseOS QE Security Team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-08-21 10:19 UTC by Milos Malik
Modified:	2013-02-25 18:57 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-02-25 18:57:18 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
output of blktrace (734.26 KB, application/octet-stream) 2012-08-22 08:16 UTC, Milos Malik	no flags	Details
bzip2 compressed output of blkrawverify (98.53 KB, application/x-bzip) 2012-08-22 08:17 UTC, Milos Malik	no flags	Details
View All

Description Milos Malik 2012-08-21 10:19:45 UTC

Description of problem:
blkrawverify announces bad events when blktrace is executed with -a complete, fs, queue or read parameter. The interesting part is that the problem appears on i386/i686 machines only.

Version-Release number of selected component (if applicable):
blktrace-1.0.0-6.el5
kernel-2.6.18-333.el5
kernel-devel-2.6.18-333.el5
kernel-headers-2.6.18-333.el5

How reproducible:
always

Steps to Reproduce:
# echo 1 > /proc/sys/vm/drop_caches
# mount -t debugfs debugfs /sys/kernel/debug/
# dd if=/dev/hda of=/dev/null bs=1M &
[1] 12980
# blktrace -w 1 -d /dev/hda -a complete
Device: /dev/hda
  CPU  0:                    0 events,       18 KiB data
  Total:                     0 events (dropped 0),       18 KiB data
# blkrawverify hda
Verifying hda
    CPU 0 -- 24 bad
Wrote output to hda.verify.out
# blkrawverify hda
Verifying hda
    CPU 0 -- 24 bad
Wrote output to hda.verify.out
# killall dd
[1]+  Terminated              dd if=/dev/hda of=/dev/null bs=1M
# umount /sys/kernel/debug/
#

Comment 1 Milos Malik 2012-08-21 10:30:29 UTC

Following packages were installed during the time I tested the above-mentioned scenario.

# rpm -qa | grep -e kernel -e blktrace | sort
blktrace-1.0.0-6.el5
kernel-2.6.18-333.el5
kernel-2.6.18-336.el5
kernel-devel-2.6.18-333.el5
kernel-devel-2.6.18-336.el5
kernel-headers-2.6.18-336.el5
#

The problem appeared when 2.6.18-333.el5 kernel was loaded in memory. Once I rebooted the machine and the newer 2.6.18-336.el5 kernel was loaded in memory the problem disappeared. Is it possible that blkrawverify used some information from kernel-headers package which was inconsistent with the running kernel at that time?

Comment 2 Milos Malik 2012-08-21 10:40:05 UTC

These packages were updated today. The problem disappeared after the reboot. But before the reboot the problem was visible, I swear.

Aug 21 10:27:07 Updated: kernel-headers-2.6.18-336.el5.i386
Aug 21 10:28:51 Updated: glibc-common-2.5-102.i386
Aug 21 10:30:04 Installed: kernel-2.6.18-336.el5.i686
Aug 21 10:31:39 Installed: kernel-devel-2.6.18-336.el5.i686
Aug 21 10:32:10 Updated: glibc-2.5-102.i686
Aug 21 10:32:20 Updated: quagga-0.98.6-11.el5.i386
Aug 21 10:32:23 Updated: nscd-2.5-102.i386
Aug 21 10:32:27 Updated: 1:autofs-5.0.1-0.rc2.170.el5.i386
Aug 21 10:32:38 Updated: glibc-headers-2.5-102.i386
Aug 21 10:32:42 Updated: glibc-devel-2.5-102.i386
Aug 21 10:32:51 Updated: ricci-0.12.2-60.el5.i386

Comment 3 Eric Sandeen 2012-08-21 19:23:43 UTC

kernel headers should have no effect at all.

> Wrote output to hda.verify.out

Ok, do you have that file?  What errors did it find?

Comment 4 Milos Malik 2012-08-22 08:16:11 UTC

Created attachment 606162 [details]
output of blktrace

Comment 5 Milos Malik 2012-08-22 08:17:10 UTC

Created attachment 606164 [details]
bzip2 compressed output of blkrawverify

Comment 6 Jeff Moyer 2013-01-02 23:06:02 UTC

The blktrace output file does not look right--it has all events, not just completion events.  Are you sure you attached the right one?

I was unable to reproduce the problem.  What are you using for storage?  Is this a virtual machine?

Comment 7 Jeff Moyer 2013-01-02 23:40:29 UTC

blkrawverify does the following:

                        else if (bit->time <= save_time) {
                                INC_BAD("time regression");
                                nbad_time++;
                                continue;
                        }

All of the reported errors are of the type, "time regression."  Given that this is difficult to reproduce, it may be that the system in question did not have a very accurate clock, or that some bug was present in the platform time-keeping code that got fixed between -333 and -336.  Looking more closely at the trace, it appears that the clock granularity is .1usec, which is indeed pretty coarse.  Milos, you're not running on Hyper-V by any chance, are you?

[virt] hv: add the Hyper-V virtual clocksource driver

That's the only changelog entry that looked potentially relevant to me.

Comment 8 Jeff Moyer 2013-02-25 18:57:18 UTC

I'm closing this bugzilla.  If you run into this problem again, you can feel free to reopen this bug, or file a new bug against the kernel, since it's very likely to be a problem somewhere in the timekeeping code.

Note You need to log in before you can comment on or make changes to this bug.