RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2104445 - RHEL9.1: in low memory conditions, page_frag_alloc may corrupt the memory.
Summary: RHEL9.1: in low memory conditions, page_frag_alloc may corrupt the memory.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: kernel
Version: 9.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Maurizio Lombardi
QA Contact: Li Wang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-06 09:48 UTC by Maurizio Lombardi
Modified: 2023-05-09 09:35 UTC (History)
3 users (show)

Fixed In Version: kernel-5.14.0-178.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-05-09 07:58:06 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
test kernel module (10.00 KB, application/x-tar)
2022-07-06 09:48 UTC, Maurizio Lombardi
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Gitlab redhat/centos-stream/src/kernel centos-stream-9 merge_requests 1317 0 None opened mm: prevent page_frag_alloc() from corrupting the memory 2022-10-04 19:21:07 UTC
Red Hat Issue Tracker RHELPLAN-127076 0 None None None 2022-07-06 10:19:41 UTC
Red Hat Product Errata RHSA-2023:2458 0 None None None 2023-05-09 07:58:38 UTC

Description Maurizio Lombardi 2022-07-06 09:48:02 UTC
Created attachment 1894892 [details]
test kernel module

Description of problem:

Calling page_frag_alloc() with a fragsize > 4096 (on x86_64) corrupts the memory if the system is in OOM conditions and the kernel will crash when calling
page_frag_free().

I was not able to make the kernel crash with fragsize <= 4096

Steps to Reproduce:

I prepared a simple kernel module, it requires 2 parameters: the first one is the amount of memory you want to allocate with page_frag_alloc(), the second one is size of the fragment

I tested it on a machine with ~7Gb of free memory.

Example of output:

3Gb of memory will be used with frag size = 1024 byte. No issue:

#insmod oomk.ko memory_size_gb=3 fragsize=1024

[  177.875107] Test begins, memory size = 3 fragsize = 1024
[  177.974538] Test completed!

10 Gb of memory, 1024 byte frag. page allocation failure but the kernel handles it and doesn't crash:

#insmod oomk.ko memory_size_gb=10 fragsize=1024

[  215.104801] Test begins, memory size = 10 fragsize = 1024
[  215.227854] insmod: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0
[  215.230231] CPU: 1 PID: 1738 Comm: insmod Kdump: loaded Tainted: G           OE    --------- ---  5.14.0-124.kpq0.el9.x86_64 #1
[  215.232344] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  215.233523] Call Trace:
[  215.234001]  dump_stack_lvl+0x34/0x44
[  215.234894]  warn_alloc+0x134/0x160
[  215.235592]  __alloc_pages_slowpath.constprop.0+0x809/0x840
[  215.236687]  ? get_page_from_freelist+0xc6/0x500
[  215.237569]  __alloc_pages+0x1fa/0x230
[  215.238381]  page_frag_alloc_align+0x16c/0x1a0
[...]
[  215.315722] allocation number 7379888 failed!
[  215.426227] Test completed!

4Gb, 4097 byte frag. No issues:

#insmod oomk.ko memory_size_gb=4 fragsize=4097
[  417.268821] Test begins, memory size = 4 fragsize = 4097
[  417.343840] Test completed!

10Gb, 4097 byte frag. Kernel crashes:

#insmod oomk.ko memory_size_gb=10 fragsize=4097
[  623.461505] BUG: Bad page state in process insmod  pfn:10a80c
[  623.462634] page:000000000654dc14 refcount:0 mapcount:0 mapping:000000007a56d6cd index:0x0 pfn:0x10a80c
[  623.464401] memcg:ffff900343a5b501
[  623.465058] aops:0xffff9003409e5d38 with invalid host inode 00003524480055f0
[  623.466394] flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
[  623.467632] raw: 0017ffffc0000000 dead000000000100 dead000000000122 ffff900346cf2900
[  623.469069] raw: 0000000000000000 0000000000100010 00000000ffffffff ffff900343a5b501
[  623.470521] page dumped because: page still charged to cgroup

[...]

[  626.632838] general protection fault, probably for non-canonical address 0xdead000000000108: 0000 [#1] PREEMPT SMP PTI
[  626.633913] ------------[ cut here ]------------
[  626.639981] CPU: 0 PID: 722 Comm: agetty Kdump: loaded Tainted: G    B      OE    --------- ---  5.14.0-124.kpq0.el9.x86_64 #1
[  626.640923] WARNING: CPU: 1 PID: 22 at mm/slub.c:4566 __ksize+0xc4/0xe0
[  626.645018] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  626.645021] RIP: 0010:___slab_alloc+0x1b7/0x5c0

Comment 1 Maurizio Lombardi 2022-07-06 10:50:01 UTC
Maybe I found the issue: in __page_frag_cache_refill() if the page allocation with order=3 fails, then it retries with
order=0, thus allocating a 4096 byte cache page.

if fragsize is > 4096 this will corrupt the memory.

It looks like page_frag_alloc() is in general unsafe for fragsize > PAGE_SIZE;
I wonder why this condition is not enforced in the code.

Comment 2 Maurizio Lombardi 2022-07-06 10:53:03 UTC
Will work on a patch right now,
I guess the solution is to check that nc->size (the cache size) is big enough for fragsize, otherwise page_frag_alloc() should return NULL to prevent memory corruptions.

Comment 3 Maurizio Lombardi 2022-07-06 12:36:24 UTC
This patch should solve the problem:

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4dc0d333279f..fdd8d671876a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5550,6 +5550,8 @@ void *page_frag_alloc_align(struct page_frag_cache *nc,
                /* reset page count bias and offset to start of new frag */
                nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
                offset = size - fragsz;
+               if (unlikely(offset < 0))
+                       return NULL;
        }
 
        nc->pagecnt_bias--;

Comment 4 Maurizio Lombardi 2022-07-06 14:35:07 UTC
(In reply to Maurizio Lombardi from comment #3)
> This patch should solve the problem:

Tested, it seems to work, I improved it to avoid leaking cache pages.

This is the current version:

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4dc0d333279f..c6b40b85c55d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5544,12 +5544,17 @@ void *page_frag_alloc_align(struct page_frag_cache *nc,
                /* if size can vary use size else just use PAGE_SIZE */
                size = nc->size;
 #endif
-               /* OK, page count is 0, we can safely set it */
-               set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1);
-
                /* reset page count bias and offset to start of new frag */
                nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
                offset = size - fragsz;
+               if (unlikely(offset < 0)) {
+                       free_the_page(page, compound_order(page));
+                       nc->va = NULL;
+                       return NULL;
+               }
+
+               /* OK, page count is 0, we can safely set it */
+               set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1);
        }
 
        nc->pagecnt_bias--;

Comment 8 Maurizio Lombardi 2022-07-29 13:22:56 UTC
Patch picked up by Andrew Morton

https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-unstable

Comment 9 Maurizio Lombardi 2022-07-29 13:23:41 UTC
(In reply to Maurizio Lombardi from comment #8)
> Patch picked up by Andrew Morton
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-
> unstable

Correct link: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-unstable&id=6309f8daaef315140c8ffdd2492563973e8d42d5

Comment 24 errata-xmlrpc 2023-05-09 07:58:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: kernel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:2458


Note You need to log in before you can comment on or make changes to this bug.