Bug 1730107
| Summary: | Hugepage data is incorrect | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Charles Haithcock <chaithco> |
| Component: | pcp | Assignee: | Nathan Scott <nathans> |
| Status: | CLOSED ERRATA | QA Contact: | Jan Kurik <jkurik> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.7 | CC: | agerstmayr, jkurik, lmiksik, mgoodwin, nathans, patrickm |
| Target Milestone: | rc | Keywords: | Bugfix, Triaged |
| Target Release: | 7.9 | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | pcp-4.3.4 | Doc Type: | No Doc Update |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-09-29 19:24:55 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1782202 | ||
Thanks for the detailed analysis Charles. Yes, the values are off by a factor 1024 as a result of an assumption in proc_meminfo.c::refresh_proc_meminfo around line 115 which is assuming all values here need to be converted from kbytes to bytes (not so for the hugepages metrics). I'll work on a fix today, get it resolved upstream and then propose it for 7.8. commit 29092aa58df23fcc43f813d54b16a733d19f770c
Author: Nathan Scott <nathans>
Date: Tue Jul 16 12:28:26 2019 +1000
pmdalinux: fix hugepage metric value calculations
The values for some of the Linux kernel hugepage metrics were
being incorrectly multiplied by 1024. This was due to a code
assumption in proc_meminfo.c::refresh_proc_meminfo around line
115, which was assuming all meminfo values needed conversion
from kbytes to bytes - not so for these metrics.
The fix involves removing this assumption (for all values from
/proc/meminfo) and individually applying unit conversion where
needed only. Updated calculations are now reflected in qa/821.
Resolves Red Hat BZ #1730107.
(In reply to Nathan Scott from comment #2) > Thanks for the detailed analysis Charles. ... > I'll work on a fix today, get it resolved upstream and then propose it for > 7.8. You are so welcome, and thank you for the quick turnaround on this! Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Low: pcp security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:3869 |
Description of problem: pmrep is showing incorrect values for hugepages. From a customer's sosreport and pcp data: $ grep HugePages_Total sosreport-<HOSTNAME>-02415112-2019-07-11-nflredr/proc/meminfo HugePages_Total: 49152 $ pmrep mem.util.{hugepagesTotal,hugepagesFree,hugepagesRsvd,hugepagesSurp,hugepagesTotalBytes} mem.vmstat.nr_shmem_hugepages -a pcp/pmlogger/<HOSTNAME>/20190710.0.xz | head m.u.hugepagesTotal m.u.hugepagesFree m.u.hugepagesRsvd m.u.hugepagesSurp m.u.hugepagesTotalBytes m.v.nr_shmem_hugepages count count count count byte count N/A N/A N/A N/A N/A N/A 50331648 26900480 26900480 0 105553116266496 N/A 50331648 26900480 26900480 0 105553116266496 N/A 50331648 26900480 26900480 0 105553116266496 N/A From my own system: r7 # grep ^Huge /proc/meminfo HugePages_Total: 250 HugePages_Free: 250 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB r7 # pmrep mem.util.{hugepagesTotal,hugepagesFree,hugepagesRsvd,hugepagesSurp,hugepagesTotalBytes} mem.vmstat.nr_shmem_hugepages m.u.hugepagesTotal m.u.hugepagesFree m.u.hugepagesRsvd m.u.hugepagesSurp m.u.hugepagesTotalBytes m.v.nr_shmem_hugepages count count count count byte count 256000 256000 0 0 536870912000 N/A 256000 256000 0 0 536870912000 N/A 256000 256000 0 0 536870912000 N/A 256000 256000 0 0 536870912000 N/A Version-Release number of selected component (if applicable): From customer's sosreport: $ grep pcp sosreport-e1eep2dmkldn17-02415112-2019-07-11-nflredr/installed-rpms pcp-4.1.0-5.el7_6.x86_64 Wed Jul 10 14:55:52 2019 pcp-conf-4.1.0-5.el7_6.x86_64 Wed Jul 10 14:55:38 2019 pcp-doc-4.1.0-5.el7_6.noarch Wed Jul 10 14:55:55 2019 pcp-libs-4.1.0-5.el7_6.x86_64 Wed Jul 10 14:55:38 2019 pcp-pmda-dm-4.1.0-5.el7_6.x86_64 Wed Jul 10 14:55:39 2019 pcp-pmda-nfsclient-4.1.0-5.el7_6.x86_64 Wed Jul 10 14:55:39 2019 pcp-selinux-4.1.0-5.el7_6.x86_64 Wed Jul 10 14:55:39 2019 pcp-system-tools-4.1.0-5.el7_6.x86_64 Wed Jul 10 14:55:54 2019 pcp-zeroconf-4.1.0-5.el7_6.x86_64 Wed Jul 10 14:55:55 2019 python-pcp-4.1.0-5.el7_6.x86_64 Wed Jul 10 14:55:54 2019 And my own system : r7 # rpm -q pcp pcp-4.1.0-5.el7_6.x86_64 How reproducible: 100% Steps to Reproduce: 1. Run a pmrep or pminfo -dtf command against the hugepage 2. 3. Actual results: The hugepage count seems to be 1024x larger than it is supposed to be: 50331648 / 1024 49152.00 <--- expected from customer sosreport 256000/1024 250.00 <--- expected from my system Expected results: Correct values Additional info: - This is seen in pmrep and pminfo so maybe something with the pmda? - The mem.numa.util versions seem to be correct: $ pmrep mem.numa.util.hugepagesTotal -a pcp/pmlogger/<HOSTNAME>.ffm.cms/20190710.0.xz | head m.n.u.hugepagesTotal node0 count N/A 49152 <--- expected 49152 49152 r7 # pminfo mem.numa.util.hugepagesTotal -dtf mem.numa.util.hugepagesTotal [per-node total count of hugepages] Data Type: 64-bit unsigned int InDom: 60.19 0xf000013 Semantics: instant Units: count inst [0 or "node0"] value 250 <--- expected