Bug 1730107

Summary:	Hugepage data is incorrect
Product:	Red Hat Enterprise Linux 7	Reporter:	Charles Haithcock <chaithco>
Component:	pcp	Assignee:	Nathan Scott <nathans>
Status:	CLOSED ERRATA	QA Contact:	Jan Kurik <jkurik>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	7.7	CC:	agerstmayr, jkurik, lmiksik, mgoodwin, nathans, patrickm
Target Milestone:	rc	Keywords:	Bugfix, Triaged
Target Release:	7.9
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	pcp-4.3.4	Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-09-29 19:24:55 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1782202

Description Charles Haithcock 2019-07-15 20:48:41 UTC

Description of problem:

pmrep is showing incorrect values for hugepages. 

From a customer's sosreport and pcp data: 

 $ grep HugePages_Total sosreport-<HOSTNAME>-02415112-2019-07-11-nflredr/proc/meminfo 
HugePages_Total:   49152

 $ pmrep mem.util.{hugepagesTotal,hugepagesFree,hugepagesRsvd,hugepagesSurp,hugepagesTotalBytes} mem.vmstat.nr_shmem_hugepages -a pcp/pmlogger/<HOSTNAME>/20190710.0.xz | head
  m.u.hugepagesTotal  m.u.hugepagesFree  m.u.hugepagesRsvd  m.u.hugepagesSurp  m.u.hugepagesTotalBytes  m.v.nr_shmem_hugepages
               count              count              count              count                     byte                   count
                 N/A                N/A                N/A                N/A                      N/A                     N/A
            50331648           26900480           26900480                  0          105553116266496                     N/A
            50331648           26900480           26900480                  0          105553116266496                     N/A
            50331648           26900480           26900480                  0          105553116266496                     N/A


From my own system:

 r7 # grep ^Huge /proc/meminfo 
HugePages_Total:     250
HugePages_Free:      250
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

 r7 # pmrep mem.util.{hugepagesTotal,hugepagesFree,hugepagesRsvd,hugepagesSurp,hugepagesTotalBytes} mem.vmstat.nr_shmem_hugepages 
  m.u.hugepagesTotal  m.u.hugepagesFree  m.u.hugepagesRsvd  m.u.hugepagesSurp  m.u.hugepagesTotalBytes  m.v.nr_shmem_hugepages
               count              count              count              count                     byte                   count
              256000             256000                  0                  0             536870912000                     N/A
              256000             256000                  0                  0             536870912000                     N/A
              256000             256000                  0                  0             536870912000                     N/A
              256000             256000                  0                  0             536870912000                     N/A



Version-Release number of selected component (if applicable):

From customer's sosreport: 

 $ grep pcp sosreport-e1eep2dmkldn17-02415112-2019-07-11-nflredr/installed-rpms 
pcp-4.1.0-5.el7_6.x86_64                                    Wed Jul 10 14:55:52 2019
pcp-conf-4.1.0-5.el7_6.x86_64                               Wed Jul 10 14:55:38 2019
pcp-doc-4.1.0-5.el7_6.noarch                                Wed Jul 10 14:55:55 2019
pcp-libs-4.1.0-5.el7_6.x86_64                               Wed Jul 10 14:55:38 2019
pcp-pmda-dm-4.1.0-5.el7_6.x86_64                            Wed Jul 10 14:55:39 2019
pcp-pmda-nfsclient-4.1.0-5.el7_6.x86_64                     Wed Jul 10 14:55:39 2019
pcp-selinux-4.1.0-5.el7_6.x86_64                            Wed Jul 10 14:55:39 2019
pcp-system-tools-4.1.0-5.el7_6.x86_64                       Wed Jul 10 14:55:54 2019
pcp-zeroconf-4.1.0-5.el7_6.x86_64                           Wed Jul 10 14:55:55 2019
python-pcp-4.1.0-5.el7_6.x86_64                             Wed Jul 10 14:55:54 2019


And my own system :

 r7 # rpm -q pcp
pcp-4.1.0-5.el7_6.x86_64


How reproducible:

100%


Steps to Reproduce:
1. Run a pmrep or pminfo -dtf command against the hugepage 
2.
3.

Actual results:

The hugepage count seems to be 1024x larger than it is supposed to be: 


50331648 / 1024
49152.00     <--- expected from customer sosreport


256000/1024
250.00       <--- expected from my system


Expected results:

Correct values


Additional info:

- This is seen in pmrep and pminfo so maybe something with the pmda?
- The mem.numa.util versions seem to be correct: 


 $ pmrep mem.numa.util.hugepagesTotal -a pcp/pmlogger/<HOSTNAME>.ffm.cms/20190710.0.xz | head
  m.n.u.hugepagesTotal
                 node0
                 count
                   N/A
                 49152        <--- expected
                 49152
                 49152


 r7 # pminfo mem.numa.util.hugepagesTotal -dtf

mem.numa.util.hugepagesTotal [per-node total count of hugepages]
    Data Type: 64-bit unsigned int  InDom: 60.19 0xf000013
    Semantics: instant  Units: count
    inst [0 or "node0"] value 250           <--- expected

Comment 2 Nathan Scott 2019-07-16 02:19:18 UTC

Thanks for the detailed analysis Charles.  Yes, the values are off by a factor 1024 as a result of an assumption in proc_meminfo.c::refresh_proc_meminfo around line 115 which is assuming all values here need to be converted from kbytes to bytes (not so for the hugepages metrics).

I'll work on a fix today, get it resolved upstream and then propose it for 7.8.

Comment 3 Nathan Scott 2019-07-16 04:02:51 UTC

commit 29092aa58df23fcc43f813d54b16a733d19f770c
Author: Nathan Scott <nathans>
Date:   Tue Jul 16 12:28:26 2019 +1000

    pmdalinux: fix hugepage metric value calculations
    
    The values for some of the Linux kernel hugepage metrics were
    being incorrectly multiplied by 1024.  This was due to a code
    assumption in proc_meminfo.c::refresh_proc_meminfo around line
    115, which was assuming all meminfo values needed conversion
    from kbytes to bytes - not so for these metrics.
    
    The fix involves removing this assumption (for all values from
    /proc/meminfo) and individually applying unit conversion where
    needed only.  Updated calculations are now reflected in qa/821.
    
    Resolves Red Hat BZ #1730107.

Comment 4 Charles Haithcock 2019-07-16 15:36:45 UTC

(In reply to Nathan Scott from comment #2)
> Thanks for the detailed analysis Charles.  
... 
> I'll work on a fix today, get it resolved upstream and then propose it for
> 7.8.

You are so welcome, and thank you for the quick turnaround on this!

Comment 9 errata-xmlrpc 2020-09-29 19:24:55 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Low: pcp security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:3869