Bug 443190 - sar reports spurious values for some disk I/O snapshots
sar reports spurious values for some disk I/O snapshots
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.1
i386 Linux
low Severity low
: rc
: ---
Assigned To: Red Hat Kernel Manager
Red Hat Kernel QE team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-04-19 00:55 EDT by Tim Mooney
Modified: 2016-04-04 16:32 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-06-03 08:28:12 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
mooney: needinfo-


Attachments (Terms of Use)
sa data collected using /usr/lib/sa/sa1 -d 1 1 (1.47 MB, application/octet-stream)
2008-04-21 16:23 EDT, Tim Mooney
no flags Details

  None (edit)
Description Tim Mooney 2008-04-19 00:55:31 EDT
Description of problem:

I just finished taking the RH442 "Enterprise Performance Tuning and System
Monitoring" class yesterday.  During the class, it was pointed out that the
"sa1" cron job that gets installed when "sysstat" is installed doesn't record
disk I/O stats.  I've modified the cron job to run more frequently (every 3
minutes) and to also record disk I/O stats.

That's working, but I notice that sar occasionally reports very large values for
some of the fields.  This is, I believe, either an error in the data that was
recorded by the kernel or it's an error is how sar reports it.  Either way, it's
an error.

Version-Release number of selected component (if applicable):

I'm observing this issue on several RHEL 5.1 systems, completely up-to-date with
patches.  That currently means sysstat-7.0.0-3.el5 is installed.


How reproducible:

We get several of these data points a day that are clearly invalid.

Steps to Reproduce:
1. Install sysstat on RHEL 5.1 on a system that has relatively busy I/O.
2. Modify /etc/cron.d/sysstat so that "sa1" is passed the -d flag before the
   1 1 argument.  Additionally, modify how often the job runs so that it's
   */3 instead of */10.
3. wait a couple days so that you have sar data that includes I/O statistics,
   then run "sar -p -d -f /var/log/sa/saNN" against one of your data files.
4. You'll see some entries for avgqu-sz or await that are clearly incorrect.

Actual results:

Here's an example:

00:00:01          DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz    
await     svctm     %util
00:03:01          sda     29.54    538.02    393.43     31.54      0.47    
15.92      3.94     11.63
00:03:01          sdb     29.46    485.12    393.43     29.82      0.46    
15.52      3.94     11.61
00:03:01          md2     40.35    367.36     72.52     10.90      0.00     
0.00      0.00      0.00
00:03:01          md5     46.81    655.77    295.12     20.31      0.00     
0.00      0.00      0.00
00:03:01          md1      0.00      0.00      0.00      0.00      0.00     
0.00      0.00      0.00
00:03:01          md0      0.01      0.01      0.00      2.00      0.00     
0.00      0.00      0.00
00:03:01          sdc     53.54   4055.17    579.59     86.57      2.23    
41.72      5.16     27.65
00:03:01          sdd     49.63   4653.40    579.59    105.45      2.54    
51.15      6.74     33.45
00:03:01          md7    191.20   8710.05    533.45     48.34      0.00     
0.00      0.00      0.00
00:03:01        nodev    191.20   8710.05    533.45     48.34  23949.90   
144.53      2.10     40.24
00:03:01        nodev      1.65      0.00     13.23      8.00      0.03    
16.17      5.21      0.86
00:03:01        nodev     44.06    644.10    278.26     20.93      1.24    
28.26      1.99      8.76
00:03:01        nodev      0.75      8.99      3.58     16.65      0.02    
20.13     11.26      0.85
00:03:01        nodev      0.34      2.68      0.00      8.00      0.00     
2.67      2.67      0.09
00:06:01          sda     48.52    552.44    531.53     22.34      0.79    
16.20      4.59     22.27
00:06:01          sdb     48.40    490.45    531.53     21.11      0.81    
16.71      4.62     22.38
00:06:01          md2     61.64    349.61    189.18      8.74      0.00     
0.00      0.00      0.00
00:06:01          md5     55.90    693.09    310.68     17.96      0.00     
0.00      0.00      0.00
00:06:01          md1      0.00      0.00      0.00      0.00      0.00     
0.00      0.00      0.00
00:06:01          md0      0.10      0.19      0.00      2.00      0.00     
0.00      0.00      0.00
00:06:01          sdc     98.37   6806.56    474.27     74.01      6.70    
68.07      7.27     71.55
00:06:01          sdd     79.32   5847.83    474.27     79.70      7.25    
91.35      9.80     77.70
00:06:01          md7    247.95  12654.34    411.92     52.70      0.00     
0.00      0.00      0.00
00:06:01        nodev    247.95  12654.34    411.92     52.70     24.54 
97324.61      3.80     94.32
00:06:01        nodev      1.80      0.18     14.19      8.00      0.03    
16.24      4.32      0.78

Notice the 23949.90 avgqu-sz reading and the 97324.61 await reading

Expected results:

numbers more in line with the remaining readings.

Additional info:
Comment 1 Ivana Varekova 2008-04-21 06:17:08 EDT
Thanks for really detailed description - please could you send me file
/var/log/sa/saNN which causes this output?
Comment 2 Tim Mooney 2008-04-21 15:43:59 EDT
I'm attaching the file now.  Be advised that because of the combination of the
effects from bug # 429054 and bug # 430984 , we're forced to run a custom kernel
on these boxes.  As soon as an official Red Hat kernel is released that has one
(or hopefully both) of those issues fixed, we intend to switch back to the stock
RHEL 5.x kernel.
Comment 3 Tim Mooney 2008-04-21 16:23:28 EDT
Created attachment 303199 [details]
sa data collected using /usr/lib/sa/sa1 -d 1 1
Comment 4 Ivana Varekova 2008-04-30 07:44:19 EDT
Thanks for the data - sysstat is correct this is a kernel problem -> reassign to
kernel.
Comment 5 RHEL Product and Program Management 2014-03-07 07:48:50 EST
This bug/component is not included in scope for RHEL-5.11.0 which is the last RHEL5 minor release. This Bugzilla will soon be CLOSED as WONTFIX (at the end of RHEL5.11 development phase (Apr 22, 2014)). Please contact your account manager or support representative in case you need to escalate this bug.
Comment 6 RHEL Product and Program Management 2014-06-03 08:28:12 EDT
Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in RHEL5 stream. If the issue is critical for your business, please provide additional business justification through the appropriate support channels (https://access.redhat.com/site/support).

Note You need to log in before you can comment on or make changes to this bug.