Bug 206113 - [PATCH][RHEL4U4] Fix estimate-mistake (e820-memory-hole and numnodes) of available_memory in x86_64
Summary: [PATCH][RHEL4U4] Fix estimate-mistake (e820-memory-hole and numnodes) of avai...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.4
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Larry Woodman
QA Contact: Brian Brock
URL:
Whiteboard:
: 206111 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-09-12 07:24 UTC by Masaki MAENO
Modified: 2008-07-24 19:11 UTC (History)
2 users (show)

Fixed In Version: RHSA-2008-0665
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-07-24 19:11:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
numnodes patch (fix potential nodes to online nodes) (502 bytes, patch)
2006-09-12 07:24 UTC, Masaki MAENO
no flags Details | Diff
e820 memory_holes patch (4.24 KB, patch)
2006-09-12 07:34 UTC, Masaki MAENO
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2008:0665 0 normal SHIPPED_LIVE Moderate: Updated kernel packages for Red Hat Enterprise Linux 4.7 2008-07-24 16:41:06 UTC

Description Masaki MAENO 2006-09-12 07:24:24 UTC
Description of problem:
The available memory for which kernel can be used is misunderstood more than 
an actual amount in x86_64.
Therefore, it doesn't work correctly even if the kernel parameter such 
as dirty_ratio is set. 

Though "'[PATCH] Fix NUMA node sizing in nr_free_zone_pages (detail: 
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-
2.6.git;a=commitdiff;h=e310fd43256b3cf4d37f6447b8f7413ca744657a;hp=5fa918b451f62
5870cd4275ca908b2392ee86a51 )' and 'Fix node-info from ACPI'" was taken with 
RHEL4U4, the bug still exists.

Therefore, I offer two patches.
(1) numnodes patch
 It is corrected to use the number of potential nodes when memory setting is 
initialized in mm/page_alloc.c, kernel/sched.c and so on. (Correctness is the 
number of online nodes.)
 - I made a patch for RHEL4U4 that closed into x86_64 because the impact of 
Vanilla Kernel patches is too large.
    filename: linux-2.6.9-numnodes.patch

(2) e820 memory_holes patch
 It is corrected to misidentify memory_holes to available_memory by this patch.
 detail: http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-
2.6.git;a=commit;h=485761bd6a72d33b3d4fa884927b2b0d983b701e
 - I had Vanilla Kernel patch fit RHEL4U4.
     filename: linux-2.6.13-e820.patch

I verified kernel-2.6.9-42.EL that applied these two patches, and passed. 
I strongly hope for the RHEL4U4-Update/RHEL4U5 adoption of this patch


Version-Release number of selected component (if applicable):
RHEL4U4: kernel-2.6.9-42.EL


Steps to Reproduce:
1. set vm.dirty_ratio={10 or 20 or 30, etc..}(const. dirty_background_ratio=10)
2. keep creating a lot of dirty pages 
   (example: 
       # dd if=/dev/zero of=/tmp/test01.bin & 
       # dd if=/dev/zero of=/tmp/test02.bin & ... [5 - 10 processes] )
3. keep doing output dirty-pages/total-pages(%) during execute (2)
   (example: Dirty/MemTotal x 100 in /proc/meminfo )

Actual results (Before applying the e820-holes/numnodes patches):
It doesn't become a value(= Dirty/MemTotal x 100) set to vm.dirty_ratio.  

# echo "20" > /proc/sys/vm/dirty_ratio
# ./output_dirty-memtotal.sh
# dd ... & x 10 
    ...
2006/09/12_14:57:42: 15
2006/09/12_14:57:43: 21
2006/09/12_14:57:44: 28
2006/09/12_14:57:46: 35
2006/09/12_14:57:47: 41
2006/09/12_14:57:48: 41
2006/09/12_14:57:49: 41
2006/09/12_14:57:50: 41
2006/09/12_14:57:51: 41
2006/09/12_14:57:52: 41
2006/09/12_14:57:53: 41
2006/09/12_14:57:54: 41
2006/09/12_14:57:55: 41
2006/09/12_14:57:56: 41
    ...
#

Expected results (After applying the e820-holes/numnodes patches):
It becomes a value(= Dirty/MemTotal x 100) set to vm.dirty_ratio.  

# echo "20" > /proc/sys/vm/dirty_ratio
# ./output_dirty-memtotal.sh
# dd ... & x 10 
    ...
2006/09/12_14:51:38: 0
2006/09/12_14:51:42: 12
2006/09/12_14:51:45: 15
2006/09/12_14:51:46: 15
2006/09/12_14:51:47: 17
2006/09/12_14:51:48: 17
2006/09/12_14:51:49: 18
2006/09/12_14:51:50: 18
2006/09/12_14:51:51: 20
2006/09/12_14:51:52: 20
2006/09/12_14:51:53: 20
2006/09/12_14:51:54: 20
2006/09/12_14:51:55: 20
2006/09/12_14:51:56: 20
    ...
#

Additional info:
By the way, the e820 memory_hole was very small (about 70MB) in my machine 
(=Actual/Expected results).
But, there is a machine that memory_hole is very large (about ?MB - ?GB), too.

- Before applying the patches (default kernel-2.6.9-42.EL):
# dmesg
    ...
On node 0 totalpages: 1048575                     <<<<<<<<<< memory_holes
  DMA zone: 4096 pages, LIFO batch:1
  Normal zone: 1044479 pages, LIFO batch:16   
  HighMem zone: 0 pages, LIFO batch:1
On node 1 totalpages: 1048575
  DMA zone: 0 pages, LIFO batch:1
  Normal zone: 1048575 pages, LIFO batch:16
  HighMem zone: 0 pages, LIFO batch:1
    ...
This costs you 64 MB of RAM
Mapping aperture over 65536 KB of RAM @ 4000000
Built 4 zonelists                                 <<<<<<<<<< numnodes
    ...

- After applying the patches (e820-holes/numnodes patched kernel-2.6.9-42.EL):
# dmesg
    ...
On node 0 totalpages: 1031946                     <<<<<<<<<< memory_holes
  DMA zone: 3994 pages, LIFO batch:1
  Normal zone: 1027952 pages, LIFO batch:16
  HighMem zone: 0 pages, LIFO batch:1
On node 1 totalpages: 1048575
  DMA zone: 0 pages, LIFO batch:1
  Normal zone: 1048575 pages, LIFO batch:16
  HighMem zone: 0 pages, LIFO batch:1
    ...
This costs you 64 MB of RAM
Mapping aperture over 65536 KB of RAM @ 4000000
Built 2 zonelists                                 <<<<<<<<<< numnodes
    ...

Comment 1 Masaki MAENO 2006-09-12 07:24:25 UTC
Created attachment 136054 [details]
numnodes patch  (fix potential nodes to online nodes)

Comment 2 Masaki MAENO 2006-09-12 07:34:59 UTC
Created attachment 136056 [details]
e820 memory_holes patch

- numnodes patch (linux-2.6.9-numnodes.patch)
- e820 memory_holes patch (linux-2.6.13-e820.patch)

Comment 3 Linda Wang 2006-12-04 22:42:51 UTC
*** Bug 206111 has been marked as a duplicate of this bug. ***

Comment 4 RHEL Program Management 2007-05-09 09:39:33 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 5 RHEL Program Management 2007-09-07 19:43:36 UTC
This request was previously evaluated by Red Hat Product Management
for inclusion in the current Red Hat Enterprise Linux release, but
Red Hat was unable to resolve it in time.  This request will be
reviewed for a future Red Hat Enterprise Linux release.

Comment 6 Masaki MAENO 2007-09-10 06:52:39 UTC
I understood the evaluation and status of RedHat.

I think that this problem is not a critical problem, too. 
But, I hope that you will take it to RHEL4U7 because I think that these patch
are safe.


Comment 7 Larry Woodman 2007-09-14 18:11:07 UTC
Actually, these patches should no longer be necessary because I made these
changes to get_dirty_limits in RHEL4-U6.

-----------------------------------------------------------------------------

+       /*
+        * Arbitrarily assume that 10% of the slab
+        * is reclaimable.  2.6.19 and beyond actually
+        * track the amount of slab which is reclaimable,
+        * the statistic we really need here.  In the absence
+        * of that we prefer to be conservative here.
+        */
+       available_memory = read_page_state(nr_slab) / 10;
+
+       for_each_zone(zone) {
+#ifdef CONFIG_HIGHMEM
+               if (is_highmem(zone) && mapping &&
+                   !(mapping_gfp_mask(mapping) & __GFP_HIGHMEM)) {
+                       no_highmem++;
+                       continue;
+               }
+#endif
+               available_memory += zone->nr_active;
+               available_memory += zone->nr_inactive;
+               available_memory += zone->free_pages;
+       }
--------------------------------------------------------------------------

Can someone verify that the latest RHEL4-U6 does the right this in these cases???

Thanks, Larry Woodman


Comment 8 Masaki MAENO 2007-09-18 04:31:08 UTC
I understand that "available_memory" is calculated based on not "total_pages"
but an presumption empty area (= Slab * 10% + Active + Inactive + Free) of that
time.
(My Impression: "Active * about 50%" is better than "Active".)

I think that the problem will be improved if corrected like this.


But, I cannot prepare the machine (CPU: AMD64 Opteron939 x 2 (NUMA node:
potential node = 4, active node = 1)) for the verification now.
I am sorry.
Please verify that latest RHEL4.6 beta does the right this in its cases.


Comment 9 Masaki MAENO 2007-11-14 05:52:38 UTC
Larry Woodman,

Sorry. I understood its patch.

I confirmed that vm.dirty_ratio became to work
bacause of get_dirty_limits() by linux-2.6.9-vm-balance.patch
into kernel-2.6.9-55.0.6.EL.

Thank you very much.


Comment 11 Vivek Goyal 2008-03-14 23:38:00 UTC
Committed in 68.21.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 15 errata-xmlrpc 2008-07-24 19:11:51 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0665.html


Note You need to log in before you can comment on or make changes to this bug.