Bug 1730107 - Hugepage data is incorrect
Summary: Hugepage data is incorrect
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: pcp
Version: 7.7
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: rc
: 7.9
Assignee: Nathan Scott
QA Contact: Jan Kurik
URL:
Whiteboard:
Depends On:
Blocks: 1782202
TreeView+ depends on / blocked
 
Reported: 2019-07-15 20:48 UTC by Charles Haithcock
Modified: 2020-02-02 14:47 UTC (History)
6 users (show)

Fixed In Version: pcp-4.3.4
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)

Description Charles Haithcock 2019-07-15 20:48:41 UTC
Description of problem:

pmrep is showing incorrect values for hugepages. 

From a customer's sosreport and pcp data: 

 $ grep HugePages_Total sosreport-<HOSTNAME>-02415112-2019-07-11-nflredr/proc/meminfo 
HugePages_Total:   49152

 $ pmrep mem.util.{hugepagesTotal,hugepagesFree,hugepagesRsvd,hugepagesSurp,hugepagesTotalBytes} mem.vmstat.nr_shmem_hugepages -a pcp/pmlogger/<HOSTNAME>/20190710.0.xz | head
  m.u.hugepagesTotal  m.u.hugepagesFree  m.u.hugepagesRsvd  m.u.hugepagesSurp  m.u.hugepagesTotalBytes  m.v.nr_shmem_hugepages
               count              count              count              count                     byte                   count
                 N/A                N/A                N/A                N/A                      N/A                     N/A
            50331648           26900480           26900480                  0          105553116266496                     N/A
            50331648           26900480           26900480                  0          105553116266496                     N/A
            50331648           26900480           26900480                  0          105553116266496                     N/A


From my own system:

 r7 # grep ^Huge /proc/meminfo 
HugePages_Total:     250
HugePages_Free:      250
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

 r7 # pmrep mem.util.{hugepagesTotal,hugepagesFree,hugepagesRsvd,hugepagesSurp,hugepagesTotalBytes} mem.vmstat.nr_shmem_hugepages 
  m.u.hugepagesTotal  m.u.hugepagesFree  m.u.hugepagesRsvd  m.u.hugepagesSurp  m.u.hugepagesTotalBytes  m.v.nr_shmem_hugepages
               count              count              count              count                     byte                   count
              256000             256000                  0                  0             536870912000                     N/A
              256000             256000                  0                  0             536870912000                     N/A
              256000             256000                  0                  0             536870912000                     N/A
              256000             256000                  0                  0             536870912000                     N/A



Version-Release number of selected component (if applicable):

From customer's sosreport: 

 $ grep pcp sosreport-e1eep2dmkldn17-02415112-2019-07-11-nflredr/installed-rpms 
pcp-4.1.0-5.el7_6.x86_64                                    Wed Jul 10 14:55:52 2019
pcp-conf-4.1.0-5.el7_6.x86_64                               Wed Jul 10 14:55:38 2019
pcp-doc-4.1.0-5.el7_6.noarch                                Wed Jul 10 14:55:55 2019
pcp-libs-4.1.0-5.el7_6.x86_64                               Wed Jul 10 14:55:38 2019
pcp-pmda-dm-4.1.0-5.el7_6.x86_64                            Wed Jul 10 14:55:39 2019
pcp-pmda-nfsclient-4.1.0-5.el7_6.x86_64                     Wed Jul 10 14:55:39 2019
pcp-selinux-4.1.0-5.el7_6.x86_64                            Wed Jul 10 14:55:39 2019
pcp-system-tools-4.1.0-5.el7_6.x86_64                       Wed Jul 10 14:55:54 2019
pcp-zeroconf-4.1.0-5.el7_6.x86_64                           Wed Jul 10 14:55:55 2019
python-pcp-4.1.0-5.el7_6.x86_64                             Wed Jul 10 14:55:54 2019


And my own system :

 r7 # rpm -q pcp
pcp-4.1.0-5.el7_6.x86_64


How reproducible:

100%


Steps to Reproduce:
1. Run a pmrep or pminfo -dtf command against the hugepage 
2.
3.

Actual results:

The hugepage count seems to be 1024x larger than it is supposed to be: 


50331648 / 1024
49152.00     <--- expected from customer sosreport


256000/1024
250.00       <--- expected from my system


Expected results:

Correct values


Additional info:

- This is seen in pmrep and pminfo so maybe something with the pmda?
- The mem.numa.util versions seem to be correct: 


 $ pmrep mem.numa.util.hugepagesTotal -a pcp/pmlogger/<HOSTNAME>.ffm.cms/20190710.0.xz | head
  m.n.u.hugepagesTotal
                 node0
                 count
                   N/A
                 49152        <--- expected
                 49152
                 49152


 r7 # pminfo mem.numa.util.hugepagesTotal -dtf

mem.numa.util.hugepagesTotal [per-node total count of hugepages]
    Data Type: 64-bit unsigned int  InDom: 60.19 0xf000013
    Semantics: instant  Units: count
    inst [0 or "node0"] value 250           <--- expected

Comment 2 Nathan Scott 2019-07-16 02:19:18 UTC
Thanks for the detailed analysis Charles.  Yes, the values are off by a factor 1024 as a result of an assumption in proc_meminfo.c::refresh_proc_meminfo around line 115 which is assuming all values here need to be converted from kbytes to bytes (not so for the hugepages metrics).

I'll work on a fix today, get it resolved upstream and then propose it for 7.8.

Comment 3 Nathan Scott 2019-07-16 04:02:51 UTC
commit 29092aa58df23fcc43f813d54b16a733d19f770c
Author: Nathan Scott <nathans@redhat.com>
Date:   Tue Jul 16 12:28:26 2019 +1000

    pmdalinux: fix hugepage metric value calculations
    
    The values for some of the Linux kernel hugepage metrics were
    being incorrectly multiplied by 1024.  This was due to a code
    assumption in proc_meminfo.c::refresh_proc_meminfo around line
    115, which was assuming all meminfo values needed conversion
    from kbytes to bytes - not so for these metrics.
    
    The fix involves removing this assumption (for all values from
    /proc/meminfo) and individually applying unit conversion where
    needed only.  Updated calculations are now reflected in qa/821.
    
    Resolves Red Hat BZ #1730107.

Comment 4 Charles Haithcock 2019-07-16 15:36:45 UTC
(In reply to Nathan Scott from comment #2)
> Thanks for the detailed analysis Charles.  
... 
> I'll work on a fix today, get it resolved upstream and then propose it for
> 7.8.

You are so welcome, and thank you for the quick turnaround on this!


Note You need to log in before you can comment on or make changes to this bug.