Bug 63977

Summary:

iostat -d -x on devices as opposed to partitions reports crap

Product:

[Retired] Red Hat Linux

Reporter:

Michael Schwendt <bugs.michael>

Component:

kernel

Assignee:

Stephen Tweedie <sct>

Status:

CLOSED UPSTREAM

QA Contact:

Severity:

medium

Docs Contact:

Priority:

medium

Version:

CC:

jim.laverty, lev_makhlis

Target Milestone:

---

Target Release:

---

Hardware:

i386

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2004-09-10 11:54:36 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
cat /proc/partitions	none

Description Michael Schwendt 2002-04-22 22:03:57 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.9) Gecko/20020408

Description of problem:
Running iostat in periodic device statistics mode gives questionable results,
such as values that look like they are negative or converted incorrectly.


Version-Release number of selected component (if applicable):
4.0.3-2

How reproducible:
Always

Steps to Reproduce:
1. iostat -d 2 -x /dev/hda


Actual Results:  # iostat -d 2 -x /dev/hda
Linux 2.4.18-0.22 (localhost)      22/04/02

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz
avgqu-sz   await  svctm  %util
/dev/hda    14.14   1.74  0.88  1.47   43.85   25.73    21.93    12.86    29.64
    0.76  435.69 4253.49  99.84

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz
avgqu-sz   await  svctm  %util
/dev/hda     0.00   5.50  0.00  1.00    0.00   52.00     0.00    26.00    52.00
21474746.48    0.00 10000.00 100.00

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz
avgqu-sz   await  svctm  %util
/dev/hda     0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00
21474746.48    0.00   0.00 100.00


Expected Results:  The first entry might make sense. I question the subsequent
entries, especially the "average queue length" of 21474746.48 (hex 7fffdcd8) and
the service time of 10000.00. 

Since option -d 2 prints reports at two second intervals, a relative service
time of 10 seconds is questionable.


Additional info:

Field "aveq" for /dev/hda in "cat /proc/partitions" counts down.

iostat -d 1 -x /dev/hda prints even more questionable values, e.g.

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz
avgqu-sz   await  svctm  %util
/dev/hda     0.00  11.00  0.00  2.00    0.00  104.00     0.00    52.00    52.00
42949582.96    0.00 5000.00 100.00

42949582.96 (hex ffffdcd8) looks pretty much like an overflow, something
uninitialized, or something incorrectly converted.

Reports on partitions (instead of devices) gives more realistic values.

Comment 1 Trond Eivind Glomsrxd 2002-06-17 21:17:28 UTC

Please give sysstat 4.0.5-1 from http://people.redhat.com/teg/sysstat/ a try.

Comment 2 Michael Schwendt 2002-06-18 10:12:57 UTC

4.0.5-1? You offer 4.0.5-0.7 at your web space.

$ rpm -q sysstat
sysstat-4.0.5-0.7

$ iostat -d 1 -x /dev/hda
Linux 2.4.18-4 (localhost)        18/06/02

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz
avgqu-sz   await  svctm  %util
/dev/hda     8.39   4.75  3.76  1.41   57.24   23.65    28.62    11.82    15.62
   53.46   60.93 192.09  99.47

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz
avgqu-sz   await  svctm  %util
/dev/hda     0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00
42949652.96    0.00   0.00 100.00

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz
avgqu-sz   await  svctm  %util
/dev/hda     0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00
42949652.96    0.00   0.00 100.00

^C


$ iostat -d 2 -x /dev/hda
Linux 2.4.18-4 (localhost)        18/06/02

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz
avgqu-sz   await  svctm  %util
/dev/hda     8.33   4.72  3.74  1.41   56.81   23.54    28.40    11.77    15.62
   52.90   60.88 193.40  99.47

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz
avgqu-sz   await  svctm  %util
/dev/hda     0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00
21474816.48    0.00   0.00 100.00

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz
avgqu-sz   await  svctm  %util
/dev/hda     0.00   7.00  0.50  2.00    1.00   42.00     0.50    21.00    17.20
21474816.98   20.00 390.00  97.50

^C

That suspiciously high average queue size is still there for each second run and
subsequent runs, even for plain "iostat -d 1 -x" which prints stats for each
device in /proc/partitions according to the man page.

A negative number in /proc/partitions could be the cause (reproducible on other
machines). See attachment.

Comment 3 Michael Schwendt 2002-06-18 10:13:36 UTC

Created attachment 61384 [details]
cat /proc/partitions

Comment 4 Trond Eivind Glomsrxd 2002-06-18 14:25:12 UTC

Yes, I meant 4.0.5-0.7 (same package, but built on RHL 7.3). Reassigning to kernel.

Comment 5 Michael Schwendt 2002-12-28 12:25:04 UTC

Reproducible with Phoebe beta.

Comment 6 Jim Laverty 2003-03-31 15:58:01 UTC

We are experiencing the same with the 2.4.18-26.7.xsmp kernel.   I'm going to
test it with RedHat 9 hopefully tonight.

iostat -x 1 5
Linux 2.4.18-26.7.xsmp (marvin)        03/31/2003

avg-cpu:  %user   %nice    %sys   %idle
          17.27    0.00    6.24   76.49

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz
avgqu-sz   await  svctm  %util
/dev/sda     6.36   1.73  1.10  0.97   59.68   21.66    29.84    10.83    39.33
    0.05   29.51  19.79   0.41
/dev/sda1    0.01   0.30  0.03  0.50    0.25    6.49     0.13     3.24    12.77
    0.04   22.37  62.82   0.33
/dev/sda2    6.32   1.29  0.90  0.36   57.76   13.27    28.88     6.64    56.56
    0.01   67.68  26.30   0.33
/dev/sda3    0.03   0.12  0.17  0.10    1.63    1.72     0.82     0.86    12.38
    0.03   68.26 171.11   0.46
/dev/sda5    0.00   0.01  0.00  0.01    0.00    0.13     0.00     0.07    16.50
    0.00   23.75  11.55   0.00
/dev/sda6    0.00   0.00  0.00  0.00    0.03    0.04     0.02     0.02    14.99
    0.00   50.30  14.67   0.00
/dev/sda7    0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00    11.14
    0.00  133.04  46.80   0.00
/dev/sdb   386.36 472.26 33.85 30.11  558.29 1216.19   279.14   608.10    27.74
    0.03    0.94   0.18   0.12
/dev/sdb1  386.36 472.26 33.85 30.11  558.29 1216.19   279.14   608.10    27.74
    0.05    0.95   0.31   0.20

avg-cpu:  %user   %nice    %sys   %idle
           0.00    0.00    0.50   99.50

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz
avgqu-sz   await  svctm  %util
/dev/sda     0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00
42949662.96    0.00   0.00 100.00
/dev/sda1    0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00
   10.00    0.00   0.00 100.00
/dev/sda2    0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00
   30.00    0.00   0.00 100.00
/dev/sda3    0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
/dev/sda5    0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
/dev/sda6    0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
/dev/sda7    0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
/dev/sdb     0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00
42945922.96    0.00   0.00 100.00
/dev/sdb1    0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00
 1360.00    0.00   0.00 100.00

Comment 7 Jim Laverty 2003-04-04 14:42:25 UTC

Problem exits in Red Hat 9 also

[root@hades root]# iostat -v
sysstat version 4.0.7

[root@hades root]# iostat -x 1 5
Linux 2.4.20-8 (hades.kernel.org)  04/03/2003

avg-cpu:  %user   %nice    %sys   %idle
           0.42    0.19    0.37   99.02

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz
avgqu-sz   await  svctm  %util
/dev/hda     0.40   1.05  0.34  0.63    5.96   13.50     2.98     6.75    20.01
    0.46   53.23 172.44  16.76
/dev/hda1    0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     5.41
    0.00   81.76  77.65   0.00
/dev/hda2    0.40   1.03  0.34  0.62    5.95   13.33     2.98     6.66    19.96
    0.42   43.85   4.51   0.44
/dev/hda3    0.00   0.02  0.00  0.00    0.00    0.17     0.00     0.08    31.89
    0.09 1745.85 1699.63   0.91

avg-cpu:  %user   %nice    %sys   %idle
           1.00    0.00    0.00   99.00

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz
avgqu-sz   await  svctm  %util
/dev/hda     0.00  10.00  0.00  3.00    0.00  104.00     0.00    52.00    34.67
42949622.96    0.00 333.33 100.00
/dev/hda1    0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
/dev/hda2    0.00  10.00  0.00  3.00    0.00  104.00     0.00    52.00    34.67
    0.00    0.00   0.00   0.00
/dev/hda3    0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00

Comment 8 Rhett Butler 2003-06-19 21:32:54 UTC

We have several NFS servers that are exhibiting unexpected behavior in
iostat, /proc/partitions, and /proc/stat.

Like the other servers, this one is busy.  It's been up for just shy of
14 days, and has served over 268 million NFS operations in that time.
About 230 million of these operations are reads of one form or another
(attribute or data).

NFS exports are on both sda and sdb, but sdb is hit much much harder.

The machine is a dual P3, running Red Hat 7.3 with the following Red Hat 8
(or later) RPMs to fix some bogus NFS / UDP problems that were causing
horrible UDP fragment storms:

glibc-2.2.93-5.i686.rpm
glibc-common-2.2.93-5.i386.rpm
kernel-2.4.18-19.7.x.i686.rpm
kernel-smp-2.4.18-19.7.x.i686.rpm
lvm-1.0.3-9.i386.rpm
mkinitrd-3.4.28-1.i386.rpm
modutils-2.4.18-2.i386.rpm
sysstat-4.0.7-1.i386.rpm

Systat was also upgraded to fix some bad values in iostat, but we still
have some more values that need attention, and don't seem to be related to
iostat.  It seems to be pulling them out of kernel structures.

mach6> uname -a
Linux mach6 2.4.18-19.7.xsmp #1 SMP Thu Dec 12 07:56:58 EST 2002 i686 unknown
mach6> uptime
  1:57pm  up 13 days, 20:50,  5 users,  load average: 0.00, 0.00, 0.00
mach6>

When the machine is idle it reports strange disk IO.  According to /proc/stat
there were a total of 2 IOs during a five second sample (0 read, 2 write) on
sda (8,0).  sdb (8,1) reported no IOs whatsoever in /proc/stat.

/proc/partitions agrees with these values in the "rio" and "wio" fields.
For sda, "wuse" reports a gain of 14 while "use" increases by 5010 for both
sda and sdb.

I can understand a little discrepency since my data gathering is sloppy, but an
increase of 5000 seems like more than an acceptable error for my methods.
I thought ruse, wuse, and use were in jiffies, but the difference in "use"
in these samples makes it look more like milliseconds.

sda had two write operations, so I could believe they took the whole 5 seconds,
but sdb didn't have any operations running, so it shouldn't have increased
at all.

For "running", I have 1 IO for sda, and -1221(!) for sdb.  It seems to
increment "use" while "running" is non-zero. 

mach6> grep "disk_io" /proc/stat; grep "sd[ab] " /proc/partitions; sleep 5; grep
"disk_io" /proc/stat; grep "sd[ab] " /proc/partitions
disk_io: (8,0):(804840,371978,11425276,432862,14600600)
(8,1):(62536736,53942187,795268900,8594549,270634368)  
   8     0   17783239 sda 370981 1056267 11425276 2384935 421918 1392216
14600600 8099602 1 6220290 6322142
   8    16   35566479 sdb 53334583 46204376 795268900 3116846 8551503 25295176
270634368 2372618 -1221 5318048 889488

disk_io: (8,0):(804842,371978,11425276,432864,14600672)
(8,1):(62536736,53942187,795268900,8594549,270634368) 
   8     0   17783239 sda 370981 1056267 11425276 2384935 421920 1392223
14600672 8099616 1 6225300 6327165
   8    16   35566479 sdb 53334583 46204376 795268900 3116846 8551503 25295176
270634368 2372618 -1221 5323058 3161172
mach6>

During a similar period of idleness I can see a difference in behavior for
partitions with 0 in the "running" column.  sda2 is idle, and reports no
increase in "running".  sda1 is also idle, but has a small increase that
I think is understandable.  sda3 has 1 IO in flight, and increases by 5008.
This is reflected in sda as one would expect. (For what it's worth, sda3
is swap.)

Aside from use and aveq, none of the other numbers change in this output.

mach6> grep sda /proc/partitions ; sleep 5; grep sda /proc/partitions
   8     0   17783239 sda 371169 1056827 11431260 2386005 422411 1392864
14609760 8110319 1 7602849 7716488
   8     1    2048256 sda1 223390 353390 4617762 1082439 374176 507142 7088144
3297501 0 3045056 4380343
   8     2   14699475 sda2 36714 703428 5924978 1025425 32876 646425 5435240
6777636 0 840640 7803421 
   8     3    1020127 sda3 111063 3 888504 278142 15359 239297 2086376 6423841 1
5751292 3921321

   8     0   17783239 sda 371169 1056827 11431260 2386005 422413 1392872
14609840 8110332 1 7607857 7721509
   8     1    2048256 sda1 223390 353390 4617762 1082439 374178 507150 7088224
3297515 0 3045070 4380356
   8     2   14699475 sda2 36714 703428 5924978 1025425 32876 646425 5435240
6777636 0 840640 7803421
   8     3    1020127 sda3 111063 3 888504 278142 15359 239297 2086376 6423841 1
5756300 3926329
mach6>

sdb is interesting at this time as well, as it believes that there are two 
orders of magnitude more activity on sdb than on sdb1 (the only partition).
I would expect the activity on these devices to be similar as is the case
with the reads, but the writes are very different.  This may be due to a
misunderstanding on what the disk level statistics (sda vs. sda[0-9])
in /proc/partitions represent.

In any case, the 'if running is non-zero, increment use' concept is evident
here as well.

mach6> grep sdb /proc/partitions ; sleep 5; grep sdb /proc/partitions
   8    16   35566479 sdb 53634284 46523100 800239900 4620914 8551983 25296232
270647048 2387630 -1226 8103025 6490532
   8    17   35559846 sdb1 53634283 46523097 800239892 4624661 8551983 25296232
270647048 2389048 -69 6234560 2511029
   8    16   35566479 sdb 53634284 46523100 800239900 4620914 8551983 25296232
270647048 2387630 -1226 8108033 350954
   8    17   35559846 sdb1 53634283 46523097 800239892 4624661 8551983 25296232
270647048 2389048 -69 6239568 2165490
mach6>

Those statistics were during a time of very little activity according to
vmstat (r/b/w of 0, cpu 100% idle), although two NFS packets were received.

Looking at the same machine when sda is very busy (dd if=../junk of=./junk2,
where . and .. are on sda1 and ../junk is about a gig) we see 432 IOs
on sda (8,0) in /proc/stat (190 read, 242 write), and again, no activity
on sdb (8,1).

And again /proc/partitions agrees with the /proc/stat values in its own
"rio" and "wio" fields.

For sda, "ruse" reports a gain of 5027, "wuse" increases by 94572, 
and "use" increases by 5123.  (The wuse leaves me belwildered.)
sdb has no activity, but its "use" increases by 5123.  This agrees with sda.

mach6> grep "disk_io" /proc/stat; grep "sd[ab] " /proc/partitions; sleep 5; grep
"disk_io" /proc/stat; grep "sd[ab] " /proc/partitions
disk_io: (8,0):(806601,372543,11541444,434058,14624824)
(8,1):(62840434,54245143,800242540,8595291,270649944) 
   8     0   17783239 sda 371545 1070223 11541444 2394982 423099 1394048
14624824 8125229 2 1517397 1654940
   8    16   35566479 sdb 53634589 46523125 800242540 4621811 8552196 25296381
270649944 2400388 -1226 615155 1106701

disk_io: (8,0):(807033,372733,11713476,434300,14822640)
(8,1):(62840434,54245143,800242540,8595291,270649944)
   8     0   17783239 sda 371735 1091537 11713476 2400019 423341 1418533
14822640 8219801 2 1522520 1760106
   8    16   35566479 sdb 53634589 46523125 800242540 4621811 8552196 25296381
270649944 2400388 -1226 620278 3214454
mach6>

In no case does ruse + wuse = use.  This leads be to believe that they are
not related in the manner that I'm expecting, or that as independent
counters, ruse, wuse, and use are wrapping at different times.

On other similarly busy NFS servers (same OS, RPMs, hardware, etc.) I see
the same behavior.  On client machines (identical hardware, but comparatively
very little disk access), I have nice clean data in /proc/partitions.  No
negative numbers, and expected differences in "use" over similar busy
and idle samples.  iostat also reports believable values.

There seems to be a counter somewhere that is either overflowing,
underflowing, or otherwise not being updated properly.  Since the problem
is exhibited in /proc, I think this is a kernel bug and not an
iostat bug.

This problem is apparently not limited to SCSI, as per Bug #81393, although
in the IDE CD case it is not reported to require a great deal of IO.

Other related bugs: (AS2.1) Bug #56602 and (RH9) Bug #88695

Comment 9 Rhett Butler 2003-07-31 22:16:48 UTC

As a litmus test, I've found that when my machines have more than 20million rios
or wios the values iostat reports for avgqu-sz, svctm and %util will not be
reliable.  These bad values are possibly only indirectly related to the number
of ios, though.

In any case, I upgraded one of my file servers to kernel-smp-2.4.20-18.7 and
waited for its activity to build up enough to the point that I think the problem
is fixed.

mach6:/> cat /proc/partitions
major minor  #blocks  name     rio rmerge rsect ruse wio wmerge wsect
wuse running use aveq
                                                                                
   8     0   17783239 sda 2076618 5502265 60911140 3104747 790108
5293279 48753640 39311221 2 20673560 42668168
   8     1    2048256 sda1 76889 83970 1287562 411780 354609 414902
6187400 12851970 0 3294280 13266270
   8     2   14699475 sda2 1914559 5414493 58911874 2177297 381398
4690722 40612544 41907144 2 5042750 1215578
   8     3    1020127 sda3 85168 3796 711688 515930 54101 187655 1953696
27501910 0 13937990 28186640
   8    16   35566479 sdb 20214779 34107940 436172730 39801150 3861011
21490324 202962416 21299352 0 41095040 18538369
   8    17   35559846 sdb1 20214778 34107937 436172722 39804780 3861011
21490324 202962416 21300052 0 41095050 18535279

mach6:/> iostat -x 5
Linux 2.4.20-18.7smp (mach6)        07/31/03
                                                                                
avg-cpu:  %user   %nice    %sys   %idle
           0.52    0.00    2.47   97.01
                                                                                
Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
/dev/sda     6.24   5.99  2.35  0.89   69.05   55.15    34.52    27.57
38.25     0.48   14.80   7.20   2.34
/dev/sda1    0.09   0.47  0.09  0.40    1.46    7.00     0.73     3.50
17.32     0.15   30.74   7.64   0.37
/dev/sda2    6.14   5.31  2.17  0.43   66.79   45.94    33.39    22.97
43.33     0.01   19.20   2.20   0.57
/dev/sda3    0.00   0.21  0.10  0.06    0.80    2.21     0.40     1.10
19.14     0.32  201.16 100.07   1.58
/dev/sdb    38.57  24.30 22.86  4.37  493.27  229.54   246.64   114.77
26.55     0.21    2.54   1.71   4.65
/dev/sdb1   38.57  24.30 22.86  4.37  493.27  229.54   246.64   114.77
26.55     0.21    2.54   1.71   4.65

avg-cpu:  %user   %nice    %sys   %idle
           0.10    0.00   30.70   69.20
                                                                                
Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
/dev/sda   580.00 130.20 135.20 143.40 5728.00 2203.20  2864.00  1101.60
28.47 8504177.43   51.83   2.90  80.80
/dev/sda1    0.60   3.20  0.40  0.60    8.00   30.40     4.00    15.20
38.40     0.62   82.00  40.00   4.00
/dev/sda2  579.40 117.80 133.20 124.80 5707.20 1955.20  2853.60   977.60
29.70   135.36   53.25   3.09  79.60
/dev/sda3    0.00   9.20  1.60 18.00   12.80  217.60     6.40   108.80
11.76     6.20   31.63  16.53  32.40
/dev/sdb   4184.40  11.40 602.20  7.20 38294.40  142.40 19147.20
71.20    63.07    50.62    8.49   1.35  82.20
/dev/sdb1  4184.40  11.40 602.20  7.20 38294.40  142.40 19147.20
71.20    63.07    50.62    8.49   1.35  82.20

avg-cpu:  %user   %nice    %sys   %idle
           0.30    0.00   33.80   65.90
                                                                                
Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
/dev/sda     3.60  11.00  5.20 19.20   70.40  246.40    35.20   123.20
12.98    12.94   53.03   4.84  11.80
/dev/sda1    0.00   0.60  0.60  0.60    4.80    9.60     2.40     4.80
12.00     0.32   26.67  26.67   3.20
/dev/sda2    0.00  10.40  0.00 12.60    0.00  188.80     0.00    94.40
14.98    11.82   93.81   5.71   7.20
/dev/sda3    3.60   0.00  4.60  6.00   65.60   48.00    32.80    24.00
10.72     0.80    7.55   3.58   3.80
/dev/sdb     0.00 12239.40  0.80 143.00    6.40 99216.00     3.20
49608.00   690.00  1201.60  731.39   6.95 100.00
/dev/sdb1    0.00 12239.40  0.80 143.00    6.40 99216.00     3.20
49608.00   690.00  1201.60  731.39   6.95 100.00

avg-cpu:  %user   %nice    %sys   %idle
           0.00    0.00   29.20   70.80
                                                                                
Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
/dev/sda    19.60 108.80  4.60 20.80  193.60 1038.40    96.80   519.20
48.50     6.34   24.88   4.65  11.80
/dev/sda1    0.00   0.40  0.00  0.80    0.00    9.60     0.00     4.80
12.00     0.26   32.50  22.50   1.80
/dev/sda2    1.20  68.40  1.60 14.80   22.40  667.20    11.20   333.60
42.05     5.08   30.85   4.88   8.00
/dev/sda3   18.40  40.00  3.00  5.20  171.20  361.60    85.60   180.80
64.98     1.00   12.20   5.61   4.60
/dev/sdb     0.00 7572.40  2.80 138.00   20.80 61457.60    10.40
30728.80   436.64  1239.46 1016.89   7.09  99.80
/dev/sdb1    0.00 7572.40  2.80 138.00   20.80 61457.60    10.40
30728.80   436.64  1239.46 1016.89   7.09  99.80

These values were sampled during similar load conditions as my previous comments.

My performance monitoring scripts are gathering data that really looks like it's
valid since I upgraded the kernel.  I do occasionally get %util values that
exceed 100%, but I can live with that.

Additionally I've noticed a marked SCSI performance boost by upgrading to this
kernel revision.  Under the same load conditions as previous samples, I've got a
30% speed increase in await and 20 fold decrease in svctm.  Those values may be
skewed because of whatever was causing the previous kernel to report bad values,
but my time-critical applications are able to complete in 9-16% less time when
their NFS server is running the 2.4.20-18.7 kernel.

Comment 10 Michael Schwendt 2003-07-31 23:45:15 UTC

Original report was about "Valhalla", bouncing version to "Shrike". Will try it
on "Severn", too. For symptoms this is about, see top of page or comment #2.

Comment 11 Jim Laverty 2003-08-25 17:58:34 UTC

Still seeing avgrq-sz with extreme values with Linux 2.4.20-20.9smp kernel (Dell
450 Xeon SMP)

iostat -x 5
Linux 2.4.20-20.9smp (firewalker)       08/25/2003

avg-cpu:  %user   %nice    %sys   %idle
           0.29    0.00    0.10   99.61

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz
avgqu-sz   await  svctm  %util
/dev/hda     1.52   0.76  0.63  0.76   17.02   12.22     8.51     6.11    21.03
    8.48    6.36 718.64  99.92
/dev/hda1    0.15   0.48  0.10  0.40    1.97    7.03     0.99     3.52    18.00
    0.02    3.08   1.47   0.07
/dev/hda2    0.76   0.11  0.32  0.06    8.62    1.36     4.31     0.68    25.99
    0.05   11.80   3.51   0.13
/dev/hda3    0.60   0.17  0.20  0.30    6.35    3.69     3.18     1.84    20.38
    0.02    4.72   1.67   0.08
/dev/hda5    0.01   0.01  0.00  0.01    0.04    0.14     0.02     0.07    17.19
    0.00   41.54   9.55   0.01
/dev/hda6    0.00   0.00  0.00  0.00    0.01    0.00     0.01     0.00    12.92
    0.00    5.38   5.38   0.00
/dev/hda7    0.01   0.00  0.00  0.00    0.02    0.00     0.01     0.00    11.68
    0.00    7.10   7.10   0.00

avg-cpu:  %user   %nice    %sys   %idle
           0.00    0.00    0.00  100.00

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz
avgqu-sz   await  svctm  %util
/dev/hda     0.00   5.80  0.20  1.20    1.60   56.00     0.80    28.00    41.14
8589914.67    5.71 714.29 100.00
/dev/hda1    0.00   4.20  0.00  0.60    0.00   38.40     0.00    19.20    64.00
    0.04    6.67   3.33   0.20
/dev/hda2    0.00   1.60  0.20  0.60    1.60   17.60     0.80     8.80    24.00
    0.04    5.00   2.50   0.20
/dev/hda3    0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
/dev/hda5    0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
/dev/hda6    0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
/dev/hda7    0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice    %sys   %idle
           0.05    0.00    0.15   99.80

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz
avgqu-sz   await  svctm  %util
/dev/hda     0.00   0.40  0.00  0.40    0.00    6.40     0.00     3.20    16.00
8589914.59    0.00 2500.00 100.00
/dev/hda1    0.00   0.40  0.00  0.40    0.00    6.40     0.00     3.20    16.00
    0.00    0.00   0.00   0.00
/dev/hda2    0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
/dev/hda3    0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
/dev/hda5    0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
/dev/hda6    0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
/dev/hda7    0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00

Comment 12 Stephen Tweedie 2004-09-10 11:54:36 UTC

Can this be reproduced on a 2.6 kernel at all?

Comment 13 Michael Schwendt 2004-09-10 15:06:26 UTC

Red Hat Linux 9 didn't have a 2.6 kernel. ;o)

Can't reproduce it with Fedora Core 2.

Comment 14 Lev Makhlis 2004-09-10 21:31:59 UTC

2.6 uses "%u" when formatting /proc/diskstats, so it'll never print
negative numbers.  Other than that, I believe the problem stems from
the conversion between jiffies (in kernel structures) and milliseconds
(in procfs).  2.6 has a more robust conversion routine than 2.4, and I
think will always work fine with default HZ settings (1000 or 100).  I
suspect the problem may be partially reproducibe if one rebuilds the
2.6 kernel with HZ=1024, for instance.