Description of problem: Well this is kind of a silly problem, but it's been reported hundreds of times, and so here is a one-liner patch that keeps the end users of iostat happy. Under Heavy disk I/O load, the util% will exceed 100.00%, causing many monitoring tools (Oracle EM, etc) to get very upset at receiving an invalid percentage. Occurs from RHEL 4.4 thru RHEL 5.5, example: Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util hda 13313.86 0.00 438.55 0.00 110024.10 0.00 55012.05 0.00 250.88 1.14 2.59 2.58 112.95 dm-0 0.00 0.00 13753.01 0.00 110024.10 0.00 55012.05 0.00 8.00 35.42 2.57 0.08 113.25 Version-Release number of selected component (if applicable): sysstat-7.0.2-3 How reproducible: 100% Kick off 4 dd commands against one disk: dd if=/dev/sdb1 of=/dev/null bs=1M & Monitor with: iostat -x /dev/sdb1 5 Steps to Reproduce: 1. Start multiple dd commands 2. Monitor with iostat -x 3. See util > 100.00% Actual results: Adding some debugging shows the following: common.h: #define S_VALUE(m,n,p) (((double) ((n) - (m))) / (p) * HZ) ostat.c: util = S_VALUE(ioj->tot_ticks, ioi->tot_ticks, itv); ..... util = util / 10.0; Instrumenting the code & reproducing shows the problem with an example 100.10% overflow as follows: WARN: util: 100.10, ioj: 13639634, ioi: 13640635, itv: 100 Then, applying the S_VALUE macro from above: >>> n=13640635 >>> m=13639634 >>> p=100 >>> HZ=100 >>> x = ((((n) - (m))) / (p) * HZ) >>> print x 1001.0 Now do the final divide by 10 like the code above.. >>> print x/10.0 100.1 Expected results: Util% should never be reported greater than 100.00. Additional info: Patch to be uploaded.
Created attachment 412433 [details] util > 100.00% patch
Hello, if the problem is only in the range of +-2% then it is the similar as in bug https://bugzilla.redhat.com/show_bug.cgi?id=445676 the problem is caused by non-atomicity of read operations (The reason why we are not going to fix this issue (below 2%) is described in the mentioned bug). If the %util value is greater then 102% that may be the different problem. In the example in bug description there are the values: 112.95 and 113.25. how do you get these values are they statistics relevant to some short interval (~ 1s) or are they relevant to some long delay (the first value iostat display)? Can you reproduce the problem?
In my testcase, it usually never exceeds 101%. Just kick off 4 dd commands like I show above and you should see the overflow. Thanks, John
I try to reproduce the problem with 4 dd command but everything works ok the value was << 100%. Please do the value ever exceeds 101%, if no then this bug is duplicate of 445676 and I will close it.
It's just like BZ 445676 and closing it really doesn't help our customers, or the applications that are effected by this invalid util% display. I just did this and got 100.17% util: 1) Run 4 copies of: dd if=/dev/mapper/VolGroup00-LogVol00 of=/dev/null bs=1M & 2) iostat -x /dev/hda 3 Results: Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util hda 9591.64 0.00 1002.34 0.00 84623.41 0.00 84.43 5.86 5.84 1.00 100.17 avg-cpu: %user %nice %system %iowait %steal %idle 0.00 0.00 18.27 81.73 0.00 0.00 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util hda 9480.40 2.33 1002.33 0.66 83835.22 23.92 83.61 5.88 5.87 0.99 99.27 avg-cpu: %user %nice %system %iowait %steal %idle 0.00 0.00 19.40 80.60 0.00 0.00 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util hda 9990.97 0.00 1042.81 0.00 88246.15 0.00 84.62 5.75 5.52 0.95 98.80
As explained in comment #2 and in bug #445676 this behavior is acceptable. Upstream rejected proposed patch and recommended to not to fix this issue. Since Red Hat's policy is to closely cooperate with upstream, we do not plan to fix this bug in RHEL 5. If you need to escalate this issue, contact your support representative, please. *** This bug has been marked as a duplicate of bug 445676 ***