Bug 704511 - RHEL6.1 mm: hugepages can cause negative commitlimit
Summary: RHEL6.1 mm: hugepages can cause negative commitlimit
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.3
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Andrea Arcangeli
QA Contact: Petr Beňas
URL:
Whiteboard:
: 522574 (view as bug list)
Depends On:
Blocks: 580953 688933
TreeView+ depends on / blocked
 
Reported: 2011-05-13 13:59 UTC by Russ Anderson
Modified: 2015-01-04 23:00 UTC (History)
15 users (show)

Fixed In Version: kernel-2.6.32-169.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-12-06 13:31:13 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:1530 0 normal SHIPPED_LIVE Moderate: Red Hat Enterprise Linux 6 kernel security, bug fix and enhancement update 2011-12-06 01:45:35 UTC

Description Russ Anderson 2011-05-13 13:59:45 UTC
Description of problem:

If the total size of hugepages allocated on a system is over half of the total memory size, commitlimit becomes a negative number. That calculation is used in __vm_enough_memory() in mm/mmap.c.  This currently broken in the community.

Version-Release number of selected component (if applicable):

RHEL6.1 (and community)

How reproducible:

100%

Steps to Reproduce:
1. Allocate enough hugepages to consume over half of available memory
   For example "default_hugepagesz=1G hugepagesz=1G hugepages=64" on the
   linux kernel bootline.
2. After the system is booted, "cat /proc/meminfo".  Look at the "CommitLimit:"
   entry.  If is is a huge number (ie "CommitLimit:    737869762947802600 kB"
   it hit the problem.
  
Actual results:

CommitLimit:    737869762947802600 kB

Expected results:

A CommitLimit number less than the size of memory.

Additional info:

This is also broken in the community.  Reported it http://marc.info/?l=linux-kernel&m=130515303205772&w=2

If the total size of hugepages allocated on a system is over half of the total memory size, commitlimit becomes a negative number.
                    
What happens in fs/proc/meminfo.c is this calculation: 

        allowed = ((totalram_pages - hugetlb_total_pages())
                * sysctl_overcommit_ratio / 100) + total_swap_pages; 
          
The problem is that hugetlb_total_pages() is larger than totalram_pages resulting in a negative number.  Since allowed is an unsigned long the 
negative shows up as a big number. 
                       
A similar calculation occurs in __vm_enough_memory() in mm/mmap.c.
                    
A symptom of this problem is that /proc/meminfo prints a very large 
CommitLimit number.  
                    
CommitLimit:    737869762947802600 kB 
                          
To reproduce the problem reserve over half of memory as hugepages.  For 
example "default_hugepagesz=1G hugepagesz=1G hugepages=64". Then look at /proc/meminfo .

uv1-sys:~ # cat /proc/meminfo
MemTotal:       32395508 kB
MemFree:        32029276 kB
Buffers:            8656 kB
Cached:            89548 kB 
SwapCached:            0 kB 
Active:            55336 kB 
Inactive:          73916 kB 
Active(anon):      31220 kB 
Inactive(anon):       36 kB
Active(file):      24116 kB  
Inactive(file):    73880 kB 
Unevictable:           0 kB 
Mlocked:               0 kB  
SwapTotal:             0 kB  
SwapFree:              0 kB 
Dirty:              1692 kB 
Writeback:             0 kB 
AnonPages:         31132 kB 
Mapped:            15668 kB 
Shmem:               152 kB  
Slab:              70256 kB 
SReclaimable:      17148 kB 
SUnreclaim:        53108 kB 
KernelStack:        6536 kB 
PageTables:         3704 kB  
NFS_Unstable:          0 kB 
Bounce:                0 kB
WritebackTmp:          0 kB            
CommitLimit:    737869762947802600 kB
Committed_AS:     394044 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      713960 kB 
VmallocChunk:   34325764204 kB 
HardwareCorrupted:     0 kB 
HugePages_Total:      32 
HugePages_Free:       32  
HugePages_Rsvd:        0 
HugePages_Surp:        0 
Hugepagesize:    1048576 kB 
DirectMap4k:       16384 kB
DirectMap2M:     2064384 kB  
DirectMap1G:    65011712 kB

Comment 2 Russ Anderson 2011-05-18 15:38:16 UTC
Turns out that when hugepages are allocated totalram_pages gets decremented 
so there is no need to subtract hugetlb_total_pages().

Sent a fix to the community.

http://marc.info/?l=linux-mm&m=130573288915598&w=2

Comment 4 Russ Anderson 2011-06-03 12:00:14 UTC
Rafael Aquini has proposed an alternative solution that works.
http://marc.info/?l=linux-kernel&m=130706986010785&w=2

Comment 5 Andrea Arcangeli 2011-06-03 15:51:14 UTC
I thought the first solution was not ok, the second looks better.

Comment 6 RHEL Program Management 2011-06-07 18:19:54 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 7 Dor Laor 2011-06-13 07:02:42 UTC
*** Bug 522574 has been marked as a duplicate of this bug. ***

Comment 8 Aristeu Rozanski 2011-07-18 15:28:01 UTC
Patch(es) available on kernel-2.6.32-169.el6

Comment 14 Rafael Aquini 2011-09-23 12:20:40 UTC
Howdy Peter,

Seems that your attempt to setup gigantic hugepages is being 'rejected', as your hugepage size still reported as the ordinary 2MB:

> Hugepagesize:       2048 kB

Check if the processor you're using for your tests supports 1GB pages. If the CPU supports 1GB pages it has the PDPE1GB flag.

Cheers!
--aquini

Comment 16 Petr Beňas 2011-09-26 08:31:22 UTC
Reprodiuced in 2.6.32-168.el6.x86_64 and verified in 2.6.32-169.el6.x86_64.

Comment 17 errata-xmlrpc 2011-12-06 13:31:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2011-1530.html


Note You need to log in before you can comment on or make changes to this bug.