Bug 2228949

Summary: The `sar -B` reports '%vmeff' beyond 100
Product: Red Hat Enterprise Linux 8 Reporter: Abdul Rehman Quadri <aquadri>
Component: sysstatAssignee: Lukáš Zaoral <lzaoral>
Status: CLOSED MIGRATED QA Contact: qe-baseos-daemons
Severity: low Docs Contact: Šárka Jana <sjanderk>
Priority: unspecified    
Version: ---CC: juha.laiho, sjanderk, vagrawal
Target Milestone: rcKeywords: MigratedToJIRA
Target Release: ---Flags: pm-rhel: mirror+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
.The `%vmeff` metric from the `sysstat` package displays incorrect values The `sysstat` package provides the `%vmeff` metric to measure the page reclaim efficiency. The values of the `%vmeff` column returned by the `sar -B` command are incorrect because `sysstat` does not parse all relevant `/proc/vmstat` values provided by later kernel versions. To work around this problem, you can calculate the `%vmeff` value manually from the `/proc/vmstat` file. For details, see link:https://access.redhat.com/solutions/7027076[Why the `sar(1)` tool reports `%vmeff` values beyond 100 % in RHEL 8 and RHEL 9?]
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-10-06 16:36:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Abdul Rehman Quadri 2023-08-03 17:41:36 UTC
Description:
The `sar -B` commands report the paging statistics where the output
includes '%vmeff' column.
The `%vmeff` as per the man page is [pgsteal / pgscan].
 
However, the sysstat package shipped with RHEL 8 is reporting the '%vmeff'
value beyond 100. It appears the number of pages being stolen is getting
high from the number of pages scanned causing the percentage to go beyond 100.
 
 
Version-Release number of selected component (if applicable):
sysstat-11.7.3
 
 
How reproducible:
Always
 
 
Steps to Reproduce:
1. Stress the system for memory utilization so that the cache is reclaimed & swapping occurs.
2. sar -B
3. And look for the `%vmeff` column
 
 
Actual results:
# sar -B 1
Linux 4.18.0-477.13.1.el8_8.x86_64 (8rhel)      08/02/2023      _x86_64_        (2 CPU)
 
11:20:59 PM  pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s pgscand/s pgsteal/s    %vmeff
 
11:22:33 PM   2392.00  59816.00 267663.00     49.00   7591.00  72128.00 221023.00  13772.00      4.70
11:22:34 PM      0.00  44344.00  10756.00      0.00  11213.00   9747.00  12186.00  21048.00     95.96
11:22:35 PM      0.00  56852.00   4489.00      0.00   6734.00  11552.00   8879.00  12422.00     60.80
11:22:36 PM      0.00 145852.00      0.00      0.00   5919.00  40928.00  33272.00   9052.00     12.20
11:22:37 PM      0.00      0.00      0.00      0.00    451.00      0.00      0.00      0.00      0.00
11:22:38 PM      0.00 107164.71  74018.63      0.00  72985.29  34425.49  60266.67 135484.31    143.08  <<---------------
11:22:39 PM      0.00 661857.43 168127.72      0.00 171582.18 136464.36 273825.74 337794.06     82.33
11:22:40 PM   6345.16 396100.00  97660.48    238.71 101485.48 164530.65 189000.81 199556.45     56.45
11:22:41 PM 221642.11 420126.32 106085.53   8265.79 108259.21 179317.11  33044.74 208792.11     98.32
11:22:42 PM 188316.00 194596.00  50992.00   7308.00  53776.00 121291.00      0.00 100698.00     83.02
11:22:44 PM 126050.60 120689.16  37369.28   4750.00  34137.95 205112.65      0.00  62277.11     30.36
11:22:45 PM 167085.11 156646.81  46920.21   6481.91  42398.94  81422.34      0.00  78972.34     96.99
11:22:46 PM   3776.00    324.00    446.00     77.00   4999.00   4992.00      0.00   9760.00    195.51  <<---------------
11:22:47 PM 101304.00  91288.00  29284.00   4021.00  24876.00  46793.00      0.00  44660.00     95.44
11:22:48 PM  64076.00  57072.00  17995.00   2567.00  15425.00  29758.00      0.00  28138.00     94.56
11:22:49 PM 185700.00 177892.00  51774.00   7266.00  47268.00  91999.00      0.00  89062.00     96.81
11:22:50 PM  20348.00  35032.00   6457.00    734.00   8275.00  16884.00      0.00  15244.00     90.29
11:22:51 PM  52048.00  64032.00  16016.00   1857.00  16275.00  32504.00      0.00  30404.00     93.54
11:22:52 PM 128552.00 118348.00  36891.00   4865.00  33756.00  61833.00      0.00  61324.00     99.18
11:22:53 PM  50684.00  57192.00  15486.00   1878.00  11881.00  25835.00      0.00  21800.00     84.38
11:22:54 PM   1524.00     48.00    273.00     38.00   4130.00   3665.00      0.00   7256.00    197.98  <<---------------
 
 
Expected results:
The value of %vmeff should be <=100%


When the same test is run in RHEL 7 with the same memory & the swappiness value
# sar -B 1
Linux 3.10.0-1160.88.1.el7.x86_64 (7rhel)       08/02/2023      _x86_64_        (2 CPU)
 
11:21:06 PM  pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s pgscand/s pgsteal/s    %vmeff
 
11:21:39 PM    714.43 209566.17  51124.38     23.38   1208.96   1022.39    905.47    957.71     49.68
11:21:41 PM    223.33 102963.33  55827.50     20.00  72180.00  71380.83   5201.67  71120.00     92.87
11:21:42 PM    125.00 234335.00  73311.25      6.25  92026.25  88120.00  28610.00  89993.75     77.10
11:21:43 PM      0.00 235892.75  68480.43      0.00  68121.01  31860.87  85258.70  67514.49     57.65
11:21:44 PM     39.67 174941.32  39371.90      1.65  39785.95  17560.33  64147.93  39501.65     48.34
11:21:45 PM    423.81 625538.10 247097.62     38.10 438609.52  85995.24  18441.67  73275.00     70.16
11:21:46 PM    182.61  63239.13  59243.48     18.48  43644.57  46348.91  12268.48  43234.78     73.76
11:21:47 PM   3107.69 327111.54 175599.04    121.15 188900.96 108803.85  55619.23 105988.46     64.46
11:21:48 PM    878.79 260593.94  62313.64    101.52  62625.76  35265.15  80795.45  61692.42     53.16
11:21:49 PM     49.06 237984.91  58366.04     11.32  58866.98  32422.64 117079.25  58439.62     39.09


Additional info:
Something similar was reported on Github:
https://github.com/sysstat/sysstat/issues/343

Comment 6 Lukáš Zaoral 2023-08-21 12:28:36 UTC
The upstream has decided to remove the %vmeff column altogether with the following explanation in commit https://github.com/sysstat/sysstat/commit/7a78deaab97687bda9ca4bfb3eb98ceba2f87343.

> Remove %vmeff metric displayed by sar -B (paging statistics).
> With recent kernels, this metric was wrongly calculated. Decision was
> made to remove it as it was more a kernel metric than a system one.

Comment 7 Lukáš Zaoral 2023-08-21 12:58:51 UTC
Relate manual page changes: https://github.com/sysstat/sysstat/commit/c52e38145bedb9ab6c352c2e60e04fdb6607b046

Comment 10 RHEL Program Management 2023-10-06 16:35:47 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 11 RHEL Program Management 2023-10-06 16:36:09 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.