Bug 517490 - The 'sar -d ' command outputs invalid data
Summary: The 'sar -d ' command outputs invalid data
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: sysstat
Version: 5.3
Hardware: All
OS: Linux
urgent
medium
Target Milestone: rc
: ---
Assignee: Ivana Varekova
QA Contact: BaseOS QE
URL:
Whiteboard:
Depends On:
Blocks: 502912 600275 609824
TreeView+ depends on / blocked
 
Reported: 2009-08-14 11:28 UTC by Issue Tracker
Modified: 2018-10-27 12:24 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
The "sar -d" command used to output invalid data about block devices. With this update, the sar utility recognizes disk registration and disk overflow statistics properly, and only correct and relevant data are now displayed.
Clone Of:
: 600275 (view as bug list)
Environment:
Last Closed: 2011-07-21 10:40:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
customer proposed patch to cope with wraparound (688 bytes, patch)
2009-08-14 11:35 UTC, Martin Poole
no flags Details | Diff
adjusted sar datafile with wraparound (1.25 MB, application/x-zip-compressed)
2009-08-14 11:36 UTC, Martin Poole
no flags Details
Alternate patch for counter wrap checks (1.08 KB, application/octet-stream)
2009-08-14 12:09 UTC, Martin Poole
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:1005 0 normal SHIPPED_LIVE Low: sysstat security, bug fix, and enhancement update 2011-07-21 10:39:28 UTC

Description Issue Tracker 2009-08-14 11:28:26 UTC
Escalated to Bugzilla from IssueTracker

Comment 1 Issue Tracker 2009-08-14 11:28:28 UTC
Event posted on 2009-08-13 05:30 BST by T.Miki

(1) Category
	Defect Report
(2) Abstract
	The "sar -d " command outputs invalid data.
(3) Symptom
	"sar -d -f xxxx" outputs invalid data.
         The xxxx file includes unsigned long-long lapped around data of
         nr_ios,rd_sect or wr_sect.
         Please see my test result bleow.

# sar -d -f el5_sadc_log_ffff_rd0
Linux 2.6.18-128.el5PAE (rhel5u3x86-vm)         08/11/09

13:36:00          DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
13:36:03       dev8-0      1.00      0.00     37.21     37.33      0.00      1.00      1.00      0.10
13:36:03       dev8-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
13:36:03       dev8-2  20401.66 461404.32  97706.31     27.41    335.34     16.44      6.92  14122.82 << invalid
13:36:03     dev253-0      4.65      0.00     37.21      8.00      0.00      0.21      0.21      0.10
13:36:03     dev253-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
13:36:06       dev8-0      1.00      0.00     24.00     24.00      0.00      1.00      1.00      0.10
13:36:06       dev8-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
13:36:06       dev8-2      1.00      0.00     24.00     24.00      0.00      1.00      1.00      0.10
13:36:06     dev253-0      3.00      0.00     24.00      8.00      0.00      0.56      0.33      0.10
13:36:06     dev253-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
13:36:09       dev8-0      1.00      0.00     10.67     10.67      0.00      0.33      0.33      0.03
13:36:09       dev8-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
13:36:09       dev8-2      1.00      0.00     10.67     10.67      0.00      0.33      0.33      0.03
13:36:09     dev253-0      1.33      0.00     10.67      8.00      0.00      0.25      0.25      0.03
13:36:09     dev253-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
13:36:12       dev8-0      1.00      0.00     18.60     18.67      0.00      0.33      0.33      0.03
13:36:12       dev8-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
13:36:12       dev8-2      1.00      0.00     18.60     18.67      0.00      0.33      0.33      0.03
13:36:12     dev253-0      2.33      0.00     18.60      8.00      0.00      0.29      0.14      0.03
13:36:12     dev253-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:       dev8-0      1.00      0.00     22.63     22.67      0.00      0.67      0.67      0.07
Average:       dev8-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:       dev8-2   5109.65 115543.01  24480.53     27.40     83.97     16.43      6.92   3536.62
Average:     dev253-0      2.83      0.00     22.63      8.00      0.00      0.32      0.24      0.07
Average:     dev253-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00

--- el5_sadc_log_ffff_rd0
contents of  disk_stats[ ]  for dev8-2

00000780: 0800 0000 0100 0000 1b31 15ff ffff ffff
00000790: ffff ffff ffff ffff 607c 0400 0000 0000
000007a0: 0000 0000 0000 0000 c8ab 0800 0000 0000
000007b0: 0ebb 0600 0000 0000 867c 0600 0000 0000
000007c0: d666 0f00 0000 0000 deef 0000 0000 0000 
000007d0: 0800 0000 0200 0000 c22d 1500 0000 0000

00001000: 1b31 1500 0000 0000 0000 0000 0000 0000.
00001010: d07c 0400 0000 0000 0000 0000 0000 0000
00001020: c8ab 0800 0000 0000 11bb 0600 0000 0000.
00001030: 897c 0600 0000 0000 d966 0f00 0000 0000
00001040: e1ef 0000 0000 0000 0800 0000 0200 0000


(4) Environment
	RHEL5.3
	sysstat-7.0.2-3.el5

(5) Recreation Steps

Please execute the sar command below.

# sar -d -f  el5_sadc_log_ffff_rd0

the "el5_sadc_log_ffff_rd0" is attatched file.

Regarding the "el5_sadc_log_ffff_rd0" file,
I created this file by "/usr/lib/sa/sadc -d 3 4 el5_sadc_log_ffff_rd0".
And I have modified this file to recreate the reported problem.

00000780: 0800 0000 0100 0000 1b31 1500 0000 0000 
00000790: 0000 0000 0000 0000 607c 0400 0000 0000  
000007a0: 0000 0000 0000 0000 c8ab 0800 0000 0000
000007b0: 0ebb 0600 0000 0000 867c 0600 0000 0000
000007c0: d666 0f00 0000 0000 deef 0000 0000 0000
000007d0: 0800 0000 0200 0000 c22d 1500 0000 0000 
  ↓
00000780: 0800 0000 0100 0000 1b31 15ff ffff ffff    <<<<<< changed
00000790: ffff ffff ffff ffff 607c 0400 0000 0000        <<<<<< changed
000007a0: 0000 0000 0000 0000 c8ab 0800 0000 0000
000007b0: 0ebb 0600 0000 0000 867c 0600 0000 0000
000007c0: d666 0f00 0000 0000 deef 0000 0000 0000
000007d0: 0800 0000 0200 0000 c22d 1500 0000 0000 


(6) Investigation
I have verified that check_disk_reg( ) function of sa_common.c 
clears the disk_stats[ ] in line 523-530.
If the nr_ios,rd_sect or wr_sect reaches near the unsigned long-long lapped around,
disk_stats[ ] struct will be broken by this logic.
Following is my verified information by gdb.

Breakpoint 1, check_disk_reg (file_hdr=0x8055380, st_disk=0x805424c, curr=1, ref=0, pos=2) at sa_common.c:510
510	   st_disk_i = st_disk[curr] + pos;
(gdb) s
506	{
(gdb) 
510	   st_disk_i = st_disk[curr] + pos;

(gdb) 
512	   while (index < file_hdr->sa_nr_disk) {
(gdb) 
513	      st_disk_j = st_disk[ref] + index;
(gdb) 
514	      if ((st_disk_i->major == st_disk_j->major) &&
(gdb) p/x st_disk_i->major
$5 = 0x8
(gdb) p/x st_disk_j->major
$6 = 0x8
(gdb) p/x st_disk_i->minor
$7 = 0x2
(gdb) p/x st_disk_j->minor
$8 = 0x2
(gdb) p/x st_disk_i->nr_ios
$9 = 0xefe1
(gdb) p/x st_disk_j->nr_ios
$10 = 0xefde
(gdb)p/x st_disk_i->rd_sect
$11 = 0x15311b
(gdb) p/x rd_sect_j->rd_sect
$12 = 0xffffffffff15311b
(gdb) p/x st_disk_j->rd_sect                  st_disk_i->wr_sect
$13 = 0x47cd0
(gdb) p/x st_disk_j->wr_sect
$14 = 0x47c60
(gdb) s
523		 if ((st_disk_i->nr_ios < st_disk_j->nr_ios) || ...   <<<<<<<<<<<<<<<<<<<< bug!!
(gdb) 
warning: Source file is more recent than executable.
96	  return __builtin___memset_chk (__dest, __ch, __len, __bos0 (__dest));
(gdb) s
553	   st_disk_j->major = st_disk_i->major;
(gdb) s
554	   st_disk_j->minor = st_disk_i->minor;
(gdb) s
557	}
(gdb) p/x st_disk_i->major
$16 = 0x8
(gdb) p/x st_disk_j->major
$17 = 0x8
(gdb) p/x st_disk_i->minor
$18 = 0x2
(gdb) p/x st_disk_j->minor
$19 = 0x2
(gdb) p/x st_disk_i->nr_ios
$20 = 0xefe1
(gdb) p/x st_disk_j->nr_ios
$21 = 0x0
(gdb) p/x st_disk_i->rd_sect
$22 = 0x15311b
(gdb) p/x st_disk_j->rd_sect
$23 = 0x0
(gdb) p/x st_disk_i->wr_sect
$24 = 0x47cd0
(gdb) p/x st_disk_j->wr_sect
$25 = 0x0


Following is my patch data for sa_common.c.

--- snip ---
--- sa_common.c_ORG     2006-10-22 18:23:45.000000000 +0900
+++ sa_common.c 2009-08-11 14:44:59.000000000 +0900
@@ -520,14 +520,15 @@
          * NB: AFAIK, such a device cannot be unregistered with current
          * kernels.
          */
+/*
         if ((st_disk_i->nr_ios < st_disk_j->nr_ios) ||
             (st_disk_i->rd_sect < st_disk_j->rd_sect) ||
             (st_disk_i->wr_sect < st_disk_j->wr_sect)) {
-
            memset(st_disk_j, 0, DISK_STATS_SIZE);
            st_disk_j->major = st_disk_i->major;
            st_disk_j->minor = st_disk_i->minor;
         }
+*/
         return index;
       }
       index++;


Following is the output of the sar command updated by above patch. 

# /usr/src/redhat/BUILD/sysstat-7.0.2/sar -d -f el5_sadc_log_ffff_rd0 
Linux 2.6.18-128.el5PAE (rhel5u3x86-vm) 	08/11/09

13:36:00          DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
13:36:03       dev8-0      1.00      0.00     37.21     37.33      0.00      1.00      1.00      0.10
13:36:03       dev8-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
13:36:03       dev8-2      1.00 5573825.91     37.21 5592442.67      0.00      1.00      1.00      0.10    <<<<< corrected
13:36:03     dev253-0      4.65      0.00     37.21      8.00      0.00      0.21      0.21      0.10
13:36:03     dev253-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
13:36:06       dev8-0      1.00      0.00     24.00     24.00      0.00      1.00      1.00      0.10
13:36:06       dev8-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
13:36:06       dev8-2      1.00      0.00     24.00     24.00      0.00      1.00      1.00      0.10
13:36:06     dev253-0      3.00      0.00     24.00      8.00      0.00      0.56      0.33      0.10
13:36:06     dev253-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
13:36:09       dev8-0      1.00      0.00     10.67     10.67      0.00      0.33      0.33      0.03
13:36:09       dev8-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
13:36:09       dev8-2      1.00      0.00     10.67     10.67      0.00      0.33      0.33      0.03
13:36:09     dev253-0      1.33      0.00     10.67      8.00      0.00      0.25      0.25      0.03
13:36:09     dev253-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
13:36:12       dev8-0      1.00      0.00     18.60     18.67      0.00      0.33      0.33      0.03
13:36:12       dev8-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
13:36:12       dev8-2      1.00      0.00     18.60     18.67      0.00      0.33      0.33      0.03
13:36:12     dev253-0      2.33      0.00     18.60      8.00      0.00      0.29      0.14      0.03
13:36:12     dev253-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:       dev8-0      1.00      0.00     22.63     22.67      0.00      0.67      0.67      0.07
Average:       dev8-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:       dev8-2      1.00 1395775.04     22.63 1398124.00      0.00      0.67      0.67      0.07
Average:     dev253-0      2.83      0.00     22.63      8.00      0.00      0.32      0.24      0.07
Average:     dev253-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
# 

(7) Related Documentation/Related Bugzilla #
none

(8) Attachments
Following files are attached in this issue.
 - test file for recreation this problem(sadc -d 3 4) : el5_sadc_log_ffff_rd0
 - recreated package : sysstat-7.0.2-3.el5.i386.rpm
 - recreated package source : sysstat-7.0.2-3.el5.src.rpm
 - recreated package debuginfo :sysstat-debuginfo-7.0.2-3.el5.i386.rpm
 - sosreport archive of recreated system : sosreport-rhel5u3x86-vm-629023-09de2b.tar.bz2
 - md5 sum data of sosreport archive  : sosreport-rhel5u3x86-vm-629023-09de2b.tar.bz2.md5

(9) Business Impacts
The customer executes the sar command to monitor their system(*1) performance every 10 minutes .
 (*1) They are using this system on business.
If this problem occurs, the performance data will show very serious.
And the customer has to analyse system performance.
This needless man power cost is so heavy loss for the customer.

(10) Requests
I would like you to investigate and fix this problem at 5.5.
This event sent from IssueTracker by mpoole  [Support Engineering Group]
 issue 329782

Comment 2 Martin Poole 2009-08-14 11:35:20 UTC
Created attachment 357439 [details]
customer proposed patch to cope with wraparound

Comment 3 Martin Poole 2009-08-14 11:36:18 UTC
Created attachment 357440 [details]
adjusted sar datafile with wraparound

Comment 4 Martin Poole 2009-08-14 12:05:44 UTC
The customer provided patch does raise some questions.

In the comment prior to the code that is proposed to be removed it notes

          * Disk found.
          * If a counter has decreased, then we may assume that the
          * corresponding device was unregistered, then registered again.
          * NB: AFAIK, such a device cannot be unregistered with current
          * kernels.

And the check tests whether any of the new values for the device are lower than the old ones.

The problem is that the comment regarding devices being unregistered is incorrect. Hot-swap of disks, USB storage, etc can all cause a device to disappear and reappear with corresponding re-use of major & minor numbers.  Since any change like this invalidates the statistics for the given device it is appropriate to zero values when this occurs.  On the other hand the customer patch assumes this is never the case and lets the values wrap.

This unfortunately gives us two conflicting use cases.

It seems more prudent to make the assumption that if only one or two values wrap (and appear reduced) then zeroing is not applicable. If, on the other hand all of the counters appear to have reduced then it is more likely that this is a new device and old values should be assumed as 0.

I have produced a patch based on these assumptions.

Comment 5 Martin Poole 2009-08-14 12:09:12 UTC
Created attachment 357445 [details]
Alternate patch for counter wrap checks

Comment 8 RHEL Program Management 2009-11-06 19:09:38 UTC
This request was evaluated by Red Hat Product Management for
inclusion, but this component is not scheduled to be updated in
the current Red Hat Enterprise Linux release. If you would like
this request to be reviewed for the next minor release, ask your
support representative to set the next rhel-x.y flag to "?".

Comment 18 Tomas Capek 2011-06-29 08:26:23 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
The "sar -d" command used to output invalid data about block devices. With this update, the sar utility recognizes disk registration and disk overflow statistics properly, and only correct and relevant data are now displayed.

Comment 19 errata-xmlrpc 2011-07-21 10:40:40 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-1005.html


Note You need to log in before you can comment on or make changes to this bug.