Bug 2104445

Summary: RHEL9.1: in low memory conditions, page_frag_alloc may corrupt the memory.
Product: Red Hat Enterprise Linux 9 Reporter: Maurizio Lombardi <mlombard>
Component: kernelAssignee: Maurizio Lombardi <mlombard>
kernel sub component: Memory Management QA Contact: Li Wang <liwan>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: chuhu, ddutile, liwan
Version: 9.1Keywords: Bugfix, Patch, Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-5.14.0-178.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-05-09 07:58:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
test kernel module none

Description Maurizio Lombardi 2022-07-06 09:48:02 UTC
Created attachment 1894892 [details]
test kernel module

Description of problem:

Calling page_frag_alloc() with a fragsize > 4096 (on x86_64) corrupts the memory if the system is in OOM conditions and the kernel will crash when calling
page_frag_free().

I was not able to make the kernel crash with fragsize <= 4096

Steps to Reproduce:

I prepared a simple kernel module, it requires 2 parameters: the first one is the amount of memory you want to allocate with page_frag_alloc(), the second one is size of the fragment

I tested it on a machine with ~7Gb of free memory.

Example of output:

3Gb of memory will be used with frag size = 1024 byte. No issue:

#insmod oomk.ko memory_size_gb=3 fragsize=1024

[  177.875107] Test begins, memory size = 3 fragsize = 1024
[  177.974538] Test completed!

10 Gb of memory, 1024 byte frag. page allocation failure but the kernel handles it and doesn't crash:

#insmod oomk.ko memory_size_gb=10 fragsize=1024

[  215.104801] Test begins, memory size = 10 fragsize = 1024
[  215.227854] insmod: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0
[  215.230231] CPU: 1 PID: 1738 Comm: insmod Kdump: loaded Tainted: G           OE    --------- ---  5.14.0-124.kpq0.el9.x86_64 #1
[  215.232344] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  215.233523] Call Trace:
[  215.234001]  dump_stack_lvl+0x34/0x44
[  215.234894]  warn_alloc+0x134/0x160
[  215.235592]  __alloc_pages_slowpath.constprop.0+0x809/0x840
[  215.236687]  ? get_page_from_freelist+0xc6/0x500
[  215.237569]  __alloc_pages+0x1fa/0x230
[  215.238381]  page_frag_alloc_align+0x16c/0x1a0
[...]
[  215.315722] allocation number 7379888 failed!
[  215.426227] Test completed!

4Gb, 4097 byte frag. No issues:

#insmod oomk.ko memory_size_gb=4 fragsize=4097
[  417.268821] Test begins, memory size = 4 fragsize = 4097
[  417.343840] Test completed!

10Gb, 4097 byte frag. Kernel crashes:

#insmod oomk.ko memory_size_gb=10 fragsize=4097
[  623.461505] BUG: Bad page state in process insmod  pfn:10a80c
[  623.462634] page:000000000654dc14 refcount:0 mapcount:0 mapping:000000007a56d6cd index:0x0 pfn:0x10a80c
[  623.464401] memcg:ffff900343a5b501
[  623.465058] aops:0xffff9003409e5d38 with invalid host inode 00003524480055f0
[  623.466394] flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
[  623.467632] raw: 0017ffffc0000000 dead000000000100 dead000000000122 ffff900346cf2900
[  623.469069] raw: 0000000000000000 0000000000100010 00000000ffffffff ffff900343a5b501
[  623.470521] page dumped because: page still charged to cgroup

[...]

[  626.632838] general protection fault, probably for non-canonical address 0xdead000000000108: 0000 [#1] PREEMPT SMP PTI
[  626.633913] ------------[ cut here ]------------
[  626.639981] CPU: 0 PID: 722 Comm: agetty Kdump: loaded Tainted: G    B      OE    --------- ---  5.14.0-124.kpq0.el9.x86_64 #1
[  626.640923] WARNING: CPU: 1 PID: 22 at mm/slub.c:4566 __ksize+0xc4/0xe0
[  626.645018] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  626.645021] RIP: 0010:___slab_alloc+0x1b7/0x5c0

Comment 1 Maurizio Lombardi 2022-07-06 10:50:01 UTC
Maybe I found the issue: in __page_frag_cache_refill() if the page allocation with order=3 fails, then it retries with
order=0, thus allocating a 4096 byte cache page.

if fragsize is > 4096 this will corrupt the memory.

It looks like page_frag_alloc() is in general unsafe for fragsize > PAGE_SIZE;
I wonder why this condition is not enforced in the code.

Comment 2 Maurizio Lombardi 2022-07-06 10:53:03 UTC
Will work on a patch right now,
I guess the solution is to check that nc->size (the cache size) is big enough for fragsize, otherwise page_frag_alloc() should return NULL to prevent memory corruptions.

Comment 3 Maurizio Lombardi 2022-07-06 12:36:24 UTC
This patch should solve the problem:

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4dc0d333279f..fdd8d671876a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5550,6 +5550,8 @@ void *page_frag_alloc_align(struct page_frag_cache *nc,
                /* reset page count bias and offset to start of new frag */
                nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
                offset = size - fragsz;
+               if (unlikely(offset < 0))
+                       return NULL;
        }
 
        nc->pagecnt_bias--;

Comment 4 Maurizio Lombardi 2022-07-06 14:35:07 UTC
(In reply to Maurizio Lombardi from comment #3)
> This patch should solve the problem:

Tested, it seems to work, I improved it to avoid leaking cache pages.

This is the current version:

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4dc0d333279f..c6b40b85c55d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5544,12 +5544,17 @@ void *page_frag_alloc_align(struct page_frag_cache *nc,
                /* if size can vary use size else just use PAGE_SIZE */
                size = nc->size;
 #endif
-               /* OK, page count is 0, we can safely set it */
-               set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1);
-
                /* reset page count bias and offset to start of new frag */
                nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
                offset = size - fragsz;
+               if (unlikely(offset < 0)) {
+                       free_the_page(page, compound_order(page));
+                       nc->va = NULL;
+                       return NULL;
+               }
+
+               /* OK, page count is 0, we can safely set it */
+               set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1);
        }
 
        nc->pagecnt_bias--;

Comment 8 Maurizio Lombardi 2022-07-29 13:22:56 UTC
Patch picked up by Andrew Morton

https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-unstable

Comment 9 Maurizio Lombardi 2022-07-29 13:23:41 UTC
(In reply to Maurizio Lombardi from comment #8)
> Patch picked up by Andrew Morton
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-
> unstable

Correct link: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-unstable&id=6309f8daaef315140c8ffdd2492563973e8d42d5

Comment 24 errata-xmlrpc 2023-05-09 07:58:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: kernel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:2458