Bug 590148 - sysstat: iostat -x reports device util% > 100.00
Summary: sysstat: iostat -x reports device util% > 100.00
Keywords:
Status: CLOSED DUPLICATE of bug 445676
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: sysstat
Version: 5.5
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Ivana Varekova
QA Contact: BaseOS QE - Apps
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-05-07 20:31 UTC by John Sobecki
Modified: 2010-06-28 09:44 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-06-28 09:44:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
util > 100.00% patch (723 bytes, patch)
2010-05-07 20:33 UTC, John Sobecki
no flags Details | Diff

Description John Sobecki 2010-05-07 20:31:00 UTC
Description of problem:

Well this is kind of a silly problem, but it's been reported hundreds of 
times, and so here is a one-liner patch that keeps the end users of iostat happy.  

Under Heavy disk I/O load, the util% will exceed 100.00%, causing many monitoring tools (Oracle EM, etc) to get very upset at receiving an invalid percentage.

Occurs from RHEL 4.4 thru RHEL 5.5, example:

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
hda        13313.86   0.00 438.55  0.00 110024.10    0.00 55012.05     0.00  
250.88     1.14    2.59   2.58 112.95
dm-0         0.00   0.00 13753.01  0.00 110024.10    0.00 55012.05     0.00  
  8.00    35.42    2.57   0.08 113.25


Version-Release number of selected component (if applicable): sysstat-7.0.2-3

How reproducible: 100% 

Kick off 4 dd commands against one disk:

  dd if=/dev/sdb1 of=/dev/null bs=1M & 

Monitor with:

  iostat -x /dev/sdb1 5 


Steps to Reproduce:
1. Start multiple dd commands
2. Monitor with iostat -x
3. See util > 100.00%
  
Actual results:

Adding some debugging shows the following:

common.h:
#define S_VALUE(m,n,p)     (((double) ((n) - (m))) / (p) * HZ)

ostat.c:
 util = S_VALUE(ioj->tot_ticks, ioi->tot_ticks, itv);
 .....
 util = util / 10.0;

Instrumenting the code & reproducing shows the problem with an example 
100.10% overflow as follows:

   WARN: util: 100.10, ioj: 13639634, ioi: 13640635, itv: 100

Then, applying the S_VALUE macro from above:

 >>> n=13640635
 >>> m=13639634
 >>> p=100
 >>> HZ=100
 >>> x = ((((n) - (m))) / (p) * HZ)
 >>> print x
 1001.0

Now do the final divide by 10 like the code above..

 >>> print x/10.0
 100.1

Expected results:

Util% should never be reported greater than 100.00.    

Additional info:  Patch to be uploaded.

Comment 1 John Sobecki 2010-05-07 20:33:15 UTC
Created attachment 412433 [details]
util > 100.00% patch

Comment 2 Ivana Varekova 2010-05-10 10:58:19 UTC
Hello, 
if the problem is only in the range of +-2% then it is the similar as in bug https://bugzilla.redhat.com/show_bug.cgi?id=445676
the problem is caused by non-atomicity of read operations (The reason why we are not going to fix this issue (below 2%) is described in the mentioned bug). 
If the %util  value is greater then  102% that may be the different problem. In the example in bug description there are the values: 112.95 and  113.25. how do you get these values are they statistics relevant to some short interval (~ 1s) or are they relevant to some long delay (the first value iostat display)? Can you reproduce the problem?

Comment 3 John Sobecki 2010-05-24 13:34:21 UTC
In my testcase, it usually never exceeds 101%.  

Just kick off 4 dd commands like I show above and you should see 
the overflow. 

Thanks, John

Comment 4 Ivana Varekova 2010-05-26 05:34:56 UTC
I try to reproduce the problem with 4 dd command but everything works ok the value  was << 100%.
Please do the value ever exceeds 101%, if no then this bug is duplicate of 445676 and I will close it.

Comment 5 John Sobecki 2010-06-02 14:15:58 UTC
It's just like BZ 445676 and closing it really doesn't help our customers, or the applications that are effected by this invalid util% display.  

I just did this and got 100.17% util:

1) Run 4 copies of:

  dd if=/dev/mapper/VolGroup00-LogVol00 of=/dev/null bs=1M &

2) iostat -x /dev/hda 3 

Results:

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
hda            9591.64     0.00 1002.34  0.00 84623.41     0.00    84.43     5.86    5.84   1.00 100.17

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00   18.27   81.73    0.00    0.00

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
hda            9480.40     2.33 1002.33  0.66 83835.22    23.92    83.61     5.88    5.87   0.99  99.27

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00   19.40   80.60    0.00    0.00

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
hda            9990.97     0.00 1042.81  0.00 88246.15     0.00    84.62     5.75    5.52   0.95  98.80

Comment 6 Ivana Varekova 2010-06-28 09:44:07 UTC
As explained in comment #2 and in bug #445676 this behavior is acceptable. Upstream rejected proposed patch and recommended to not to fix this issue. Since Red Hat's policy is to closely cooperate with upstream, we do not plan to fix this bug in RHEL 5.

If you need to escalate this issue, contact your support representative, please.

*** This bug has been marked as a duplicate of bug 445676 ***


Note You need to log in before you can comment on or make changes to this bug.