60036 – sard-patch time statistics in /proc/partitions wrap too soon

Bug 60036 - sard-patch time statistics in /proc/partitions wrap too soon

Summary: sard-patch time statistics in /proc/partitions wrap too soon

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	7.2
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	low
Target Milestone:	---
Assignee:	Stephen Tweedie
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2002-02-19 07:48 UTC by Need Real Name
Modified:	2007-04-18 16:40 UTC (History)
CC List:	0 users
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2003-06-08 01:30:11 UTC
Embargoed:

Attachments	(Terms of Use)

Description Need Real Name 2002-02-19 07:48:18 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.7) Gecko/20011226

Description of problem:
The sard patch uses the following formula for conversion from ticks to
milliseconds in linux/drivers/block/genhd.c:

#define MSEC(x) ((x) * 1000 / HZ)

Normally, a millisecond counter would wrap over after ~1200 hours,
but this formula wraps after just ~12 hours of I/O time.  I easily
achieve that on my home computer with less than two weeks' uptime.

I propose a "smarter" definition:

#if 1000 % HZ == 0
#define MSEC(x) ((x) * (1000 / HZ))
#elif HZ % 1000 == 0
#define MSEC(x) ((x) / (HZ / 1000))
#else
#define MSEC(x) (((x) / HZ) * 1000 + ((x) % HZ) * 1000 / HZ)
#endif

Another option would be not to convert to milliseconds at all, but use
hz_to_std() and leave it up to the userland.  iostat currently expects ticks in
/proc/partitions anyway (see bug 60029).

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.Create a disk I/O load.
2.Wait enough time.
3.cat /proc/partitions
	

Actual Results:  major minor  #blocks  name     rio rmerge rsect ruse wio wmerge
wsect wuse running use aveq

   3     0   30015216 hda 955889 1994254 15902124 9372400 1452474 1668133
22339116 35636184 0 4976800 2835871


Expected Results:  Obviously, "wuse" and "aveq" have overflowed: they don't
divide by 10 (as a result of x * 1000 / 100 would), and "aveq" is even less than
"use," which is impossible.

Additional info:

Comment 1 Stephen Tweedie 2002-02-19 10:54:00 UTC

That's not a bug.  The sard stats were never designed to track total IO, but
rather to support interval IO tracking.  "iostat" will take two successive
snapshots of these stats and will display the ongoing IO statistics from them by
taking the difference, and that should still work even when the counters wrap.

Comment 2 Stephen Tweedie 2002-02-19 11:13:00 UTC

btw, most of the hz_to_std routines have the same overflow property.  I'll mull
this over --- it may well be worth cleaning this up somewhat.

Comment 3 Need Real Name 2002-02-19 14:28:07 UTC

iostat works fine when counters wrap at 2^32, but these ones wrap at 2^32 / 100.
Here's an output of iostat for an interval where aveq wrapped over, resulting in
bogus avgqu-sz:

Device:  rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s avgrq-sz avgqu-sz   await 
svctm  %util
hda       27.96   4.19 13.78  4.79  318.40   69.63    20.89 70921.74 2921.21 
43.84   8.14

Comment 4 Stephen Tweedie 2002-02-19 15:53:56 UTC

Yes, I realised that just after the previous reply.  I'm currently testing a
kernel with

#define MSEC(x) (((x) / HZ) * 1000 + ((x) % HZ) * 1000 / HZ)

defined unconditionally --- the compiler seems to create decent code for that. 
(gcc has a number of very nice optimisations for such expressions.)

Comment 5 Stephen Tweedie 2002-02-19 17:05:07 UTC

Fix checked in.

Note You need to log in before you can comment on or make changes to this bug.