Bug 1730107

Summary: Hugepage data is incorrect
Product: Red Hat Enterprise Linux 7 Reporter: Charles Haithcock <chaithco>
Component: pcpAssignee: Nathan Scott <nathans>
Status: CLOSED ERRATA QA Contact: Jan Kurik <jkurik>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.7CC: agerstmayr, jkurik, lmiksik, mgoodwin, nathans, patrickm
Target Milestone: rcKeywords: Bugfix, Triaged
Target Release: 7.9   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: pcp-4.3.4 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-09-29 19:24:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1782202    

Description Charles Haithcock 2019-07-15 20:48:41 UTC
Description of problem:

pmrep is showing incorrect values for hugepages. 

From a customer's sosreport and pcp data: 

 $ grep HugePages_Total sosreport-<HOSTNAME>-02415112-2019-07-11-nflredr/proc/meminfo 
HugePages_Total:   49152

 $ pmrep mem.util.{hugepagesTotal,hugepagesFree,hugepagesRsvd,hugepagesSurp,hugepagesTotalBytes} mem.vmstat.nr_shmem_hugepages -a pcp/pmlogger/<HOSTNAME>/20190710.0.xz | head
  m.u.hugepagesTotal  m.u.hugepagesFree  m.u.hugepagesRsvd  m.u.hugepagesSurp  m.u.hugepagesTotalBytes  m.v.nr_shmem_hugepages
               count              count              count              count                     byte                   count
                 N/A                N/A                N/A                N/A                      N/A                     N/A
            50331648           26900480           26900480                  0          105553116266496                     N/A
            50331648           26900480           26900480                  0          105553116266496                     N/A
            50331648           26900480           26900480                  0          105553116266496                     N/A


From my own system:

 r7 # grep ^Huge /proc/meminfo 
HugePages_Total:     250
HugePages_Free:      250
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

 r7 # pmrep mem.util.{hugepagesTotal,hugepagesFree,hugepagesRsvd,hugepagesSurp,hugepagesTotalBytes} mem.vmstat.nr_shmem_hugepages 
  m.u.hugepagesTotal  m.u.hugepagesFree  m.u.hugepagesRsvd  m.u.hugepagesSurp  m.u.hugepagesTotalBytes  m.v.nr_shmem_hugepages
               count              count              count              count                     byte                   count
              256000             256000                  0                  0             536870912000                     N/A
              256000             256000                  0                  0             536870912000                     N/A
              256000             256000                  0                  0             536870912000                     N/A
              256000             256000                  0                  0             536870912000                     N/A



Version-Release number of selected component (if applicable):

From customer's sosreport: 

 $ grep pcp sosreport-e1eep2dmkldn17-02415112-2019-07-11-nflredr/installed-rpms 
pcp-4.1.0-5.el7_6.x86_64                                    Wed Jul 10 14:55:52 2019
pcp-conf-4.1.0-5.el7_6.x86_64                               Wed Jul 10 14:55:38 2019
pcp-doc-4.1.0-5.el7_6.noarch                                Wed Jul 10 14:55:55 2019
pcp-libs-4.1.0-5.el7_6.x86_64                               Wed Jul 10 14:55:38 2019
pcp-pmda-dm-4.1.0-5.el7_6.x86_64                            Wed Jul 10 14:55:39 2019
pcp-pmda-nfsclient-4.1.0-5.el7_6.x86_64                     Wed Jul 10 14:55:39 2019
pcp-selinux-4.1.0-5.el7_6.x86_64                            Wed Jul 10 14:55:39 2019
pcp-system-tools-4.1.0-5.el7_6.x86_64                       Wed Jul 10 14:55:54 2019
pcp-zeroconf-4.1.0-5.el7_6.x86_64                           Wed Jul 10 14:55:55 2019
python-pcp-4.1.0-5.el7_6.x86_64                             Wed Jul 10 14:55:54 2019


And my own system :

 r7 # rpm -q pcp
pcp-4.1.0-5.el7_6.x86_64


How reproducible:

100%


Steps to Reproduce:
1. Run a pmrep or pminfo -dtf command against the hugepage 
2.
3.

Actual results:

The hugepage count seems to be 1024x larger than it is supposed to be: 


50331648 / 1024
49152.00     <--- expected from customer sosreport


256000/1024
250.00       <--- expected from my system


Expected results:

Correct values


Additional info:

- This is seen in pmrep and pminfo so maybe something with the pmda?
- The mem.numa.util versions seem to be correct: 


 $ pmrep mem.numa.util.hugepagesTotal -a pcp/pmlogger/<HOSTNAME>.ffm.cms/20190710.0.xz | head
  m.n.u.hugepagesTotal
                 node0
                 count
                   N/A
                 49152        <--- expected
                 49152
                 49152


 r7 # pminfo mem.numa.util.hugepagesTotal -dtf

mem.numa.util.hugepagesTotal [per-node total count of hugepages]
    Data Type: 64-bit unsigned int  InDom: 60.19 0xf000013
    Semantics: instant  Units: count
    inst [0 or "node0"] value 250           <--- expected

Comment 2 Nathan Scott 2019-07-16 02:19:18 UTC
Thanks for the detailed analysis Charles.  Yes, the values are off by a factor 1024 as a result of an assumption in proc_meminfo.c::refresh_proc_meminfo around line 115 which is assuming all values here need to be converted from kbytes to bytes (not so for the hugepages metrics).

I'll work on a fix today, get it resolved upstream and then propose it for 7.8.

Comment 3 Nathan Scott 2019-07-16 04:02:51 UTC
commit 29092aa58df23fcc43f813d54b16a733d19f770c
Author: Nathan Scott <nathans>
Date:   Tue Jul 16 12:28:26 2019 +1000

    pmdalinux: fix hugepage metric value calculations
    
    The values for some of the Linux kernel hugepage metrics were
    being incorrectly multiplied by 1024.  This was due to a code
    assumption in proc_meminfo.c::refresh_proc_meminfo around line
    115, which was assuming all meminfo values needed conversion
    from kbytes to bytes - not so for these metrics.
    
    The fix involves removing this assumption (for all values from
    /proc/meminfo) and individually applying unit conversion where
    needed only.  Updated calculations are now reflected in qa/821.
    
    Resolves Red Hat BZ #1730107.

Comment 4 Charles Haithcock 2019-07-16 15:36:45 UTC
(In reply to Nathan Scott from comment #2)
> Thanks for the detailed analysis Charles.  
... 
> I'll work on a fix today, get it resolved upstream and then propose it for
> 7.8.

You are so welcome, and thank you for the quick turnaround on this!

Comment 9 errata-xmlrpc 2020-09-29 19:24:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Low: pcp security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:3869