Bug 19153

Summary: "iostat" seems broken
Product: [Retired] Red Hat Linux Reporter: Chris Evans <chris>
Component: sysstatAssignee: Preston Brown <pbrown>
Status: CLOSED RAWHIDE QA Contact: David Lawrence <dkl>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.0CC: sct, sebastien.godard, sysadmin
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2001-02-16 00:10:32 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Chris Evans 2000-10-15 22:29:49 UTC
(Using the latest RH7.0 update sysstat RPM)

iostat seems broken.
The command "iostat 1" yields some very strange results.

1) The %iowait field seems inverted. That is to say, when my disk is
totally idle, this
registers at 100%. When largely idle (playing an mp3), it registers at
about 98%.
I would expect figures of 0% and 2% respectively, to indicate the system is
not heavily
waiting on I/O!!

2) Here is an output fragment

Disks:         tps    Kb_read/s    Kb_wrtn/s    Kb_read    Kb_wrtn
hdisk0        0.00         0.00         0.00          0          0
hdisk1        0.00         0.00         0.00          0          0
hdisk2        0.00         0.00         0.00          0          0
hdisk3        0.00         0.00         0.00          0          0

(I only have one physical disk)
When I load up the disk, for example with the command "find /", the "tps"
field registers figures of around 40. However, very concerningly, the other
4 fields remain at 0.00 or 0.


One more comment - the kernel disk accounting patch exposes "average disk
queue depth", it would be very nice if the iostat program were also able to
report it.


cc: to Stephen because this could be a missing or incorrect version of the
userland iostat patch. The kernel patch seems fine, looking at
/proc/partitions.

I'm happy to test things as always.

Comment 1 Chris Evans 2000-10-16 19:12:25 UTC
Hmm, I just found another version of iostat, hidden at
ftp://ftp.uk.linux.org/pub/linux/sct/fs/profiling

It seems to be totally different?

BUT, it seems to work correctly and offers the following beautiful statistic:
"average request service time".
example:
         hda          hda1          hda2          hda3          hda4         
hda5          cpu
k/s t/s serv  k/s t/s serv  k/s t/s serv  k/s t/s serv  k/s t/s serv  k/s t/s
serv   us  sy  id
13624 107 36.6    0   0  0.0    0   0  0.0    0   0  0.0    0   0  0.0    0   0 
0.0    2  21  76
14590 116 37.0    0   0  0.0   16   0  0.0    0   0  0.0    0   0  0.0    0   0 
0.0    3  19  78

nifty, eh? That's 14Mb/sec and ~100 requests per second at ~40ms service time
each.
It's generated by "dd" from /dev/hda to /dev/null with 1024Mb blocksize.

Ideally, I'd like to see an iostat which:
a) Works
b) Offers the above "average service time in ms" statistic
c) Offers the %iowait statistic
d) Ideally would offer the average queue depth statistic
e) Offers the standard "kb/sec" and "req/sec" statistics

Unfortunately, that would seem to require a combination of the two different
iostat
programs.

Comment 2 Chris Evans 2000-10-17 00:44:29 UTC
OK!! These are indeed two different "iostat" programs.
Unfortunately, it seems that RH7.0 ships with the wrong one.
Playing with the one I mention above at ftp.uk.linux.org, it _does_ seem to
satisfy all the requirements I list above!

Stephen - can you point me to where the iostat.c file came from? I'd like to fix
a few bugs/uglies, and I'd like to base my work on the most recent version!

I'd suggest that this might warrant an update once the proper iostat.c has
been prettified.

Comment 3 Stephen Tweedie 2000-10-17 14:09:26 UTC
I've already fixed a couple of the iostat/systat versions out there for the
cleaned-up sard output, and I'll do the necessary for this one once I'm back in
the UK next week.

Comment 4 Chris Evans 2000-10-17 15:56:06 UTC
Can I volunteer to review the fixed packages?

Comment 5 Derek Tattersall 2001-01-10 20:47:23 UTC
iostat  from sysstat-3.3.3-2 from the RHL7.1 beta2 displays no IO activity for
the following command:
dd if=/dev/sda5 of=/dev/null bs=72k, and in fact iostat freezes and no longer
updates the display.

Comment 6 Chris Evans 2001-01-10 21:59:43 UTC
The problem is now two fold
1) The 2.4 kernel hasn't been patched with the enhance i/o statistics patch from
2.2 yet
- This needs doing, or you've screwed people relying on RH7.0 advanced
statistics
2) The default iostat program does not expose the cool enhanced statistics
available.
- The alternative iostat I quote above is better in this regard.

Comment 7 Preston Brown 2001-01-17 17:59:56 UTC
the new iostat is not broken anymore, but it doesn't do as much as the iostat
you reference.  However, it is maintained and works well in other ways.

I have forwarded on the iostat.c file you referenced to the maintainer of the
version we are currently shipping so that he may merge the two.

Comment 8 Chris Evans 2001-02-16 00:10:28 UTC
I just spotted something very interesting on
comp.os.linux.announce. It's a new iostat version.
In the author's own words:
---
There are two interesting things coming with sysstat-3.3.5:
1) The iostat command has been greatly improved and now takes full
advantage of Stephen Tweedie's kernel patch to display extended I/O
statistics.
---

However, also note (also from the author):
---
Please note that version 3.3.5 is a development release. The latest
stable version is still 3.2.4.
---
ftp://metalab.unc.edu/pub/Linux/system/status/
80kB  sysstat-3.3.5.tar.gz



Comment 9 Preston Brown 2001-03-05 21:24:36 UTC
we are up to this version in rawhide, and it appears very stable.  I've been
cooperating with the author.

Comment 10 Chris Evans 2001-03-08 20:24:14 UTC
Nice one. Wolverine has this version, it seems.
.
One nitpick: the %util value is scaled incorrectly - it ranges from 0% to 1000%,
i.e. a factor of 10 out.
I'm not re-opening the bug for such a minor point, but it would be nice to
get it fixed.

Comment 11 Stephen Tweedie 2001-03-09 17:53:55 UTC
The ticks output in current sard patches is biased to output 1000 ticks per
second: in other words, it is no longer dependent on "HZ".  This means that the
same parser will work correctly for both Intel and for architectures such as
Alpha where HZ=1000.


Comment 12 Need Real Name 2002-04-06 19:19:04 UTC
A guy pointed me that avg wait times and service times as displayed
by 'iostat -x' were wrong. On his RedHat system running 'iostat -x 10',
he gets values of about 200-400 ms and higher. It seems to be too high
by an order of magnitude, since the SCSI/Fc controller in his Compaq
and IBM machines are musch faster, there was I/O load contention, etc.

He sent me a very small patch (below) to fix this but that I am unable
to integrate in sysstat because I lack knowledge on the way kernel and
sct's patch work.
Could you tell me if this patch is acceptable and if I can apply it to sysstat?

--- sysstat-4.0.3-orig/iostat.c Fri Feb 11 14:15:19 2002
+++ sysstat-4.0.3/iostat.c      Thu Feb 14 11:30:05 2002
@@ -372,7 +372,8 @@
               tput   = nr_ios * HZ / itv;
               util   = ((double) current.ticks) / itv;
               svctm  = tput ? util / tput : 0.0;
-              await  = nr_ios ? (current.rd_ticks + current.wr_ticks) / nr_ios
* 1000.0 / HZ : 0.0;
+              /* kernel gives ticks already in milliseconds for all platforms
-> no need for further scaling */
+              await  = nr_ios ? (current.rd_ticks + current.wr_ticks) / nr_ios
: 0.0;
               arqsz  = nr_ios ? (current.rd_sectors + current.wr_sectors) /
nr_ios : 0.0;
 
               printf("/dev/%-5s", disk_hdr_stats[disk_index].name);
@@ -387,7 +388,8 @@
                      arqsz,
                      ((double) current.aveq) / itv,
                      await,
-                     svctm * 1000.0,
+                     /* again: ticks in milliseconds */
+                     svctm * 100.0,
                      /* NB: the ticks output in current sard patches is biased
to output 1000 ticks per second */
                      util * 10.0);
            }

Problem concerns every platforms with recent RedHat and sysstat installed.
Thx a lot for your help.