Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1726896

Summary:	mm: fix race on soft-offlining free huge pages
Product:	Red Hat Enterprise Linux 7	Reporter:	Li Wang <liwan>
Component:	kernel	Assignee:	Artem Savkov <asavkov>
kernel sub component:	Memory Management	QA Contact:	Li Wang <liwan>
Status:	CLOSED NOTABUG	Docs Contact:
Severity:	high
Priority:	unspecified	CC:	aquini, bugproxy, ddutile, fujita.hiroshi, hannsj_uhl, jstancek, lwoodman, mm-maint
Version:	7.5
Target Milestone:	pre-dev-freeze
Target Release:	7.9
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-05-19 00:34:45 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1726983
Bug Blocks:	1729246

Description Li Wang 2019-07-04 03:38:57 UTC

Description of problem:

commit 6bc9b56433b76e40d11099338d27fbc5cd2935ca
Author: Naoya Horiguchi <n-horiguchi.nec.com>
Date:   Thu Aug 23 17:00:38 2018 -0700

    mm: fix race on soft-offlining free huge pages
    
    Patch series "mm: soft-offline: fix race against page allocation".
    
    Xishi recently reported the issue about race on reusing the target pages
    of soft offlining.  Discussion and analysis showed that we need make
    sure that setting PG_hwpoison should be done in the right place under
    zone->lock for soft offline.  1/2 handles free hugepage's case, and 2/2
    hanldes free buddy page's case.


Without this above patch, ltp/move_pages12 failed on rhel7(3.10.0-1059.el7.x86_64.debug) as:

# ./move_pages12 
tst_test.c:1100: INFO: Timeout per run is 0h 05m 00s
move_pages12.c:235: INFO: Free RAM 129844088 kB
move_pages12.c:253: INFO: Increasing 2048kB hugepages pool on node 0 to 12
move_pages12.c:263: INFO: Increasing 2048kB hugepages pool on node 1 to 12
move_pages12.c:179: INFO: Allocating and freeing 4 hugepages on node 0
move_pages12.c:179: INFO: Allocating and freeing 4 hugepages on node 1
move_pages12.c:169: PASS: Bug not reproduced
tst_test.c:1145: BROK: Test killed by SIGBUS!

reproducer: https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/move_pages/move_pages12.c

Comment 1 Rafael Aquini 2019-07-24 15:22:43 UTC

May I ask you how are you sure that the particular issue is solved by the pointed out patch?

Also, a couple more questions on this case:
 a) do you have the console logs for the registered failure;
 b) how consistent is your reproducer; and 
 c) when it started to happen? at kernel-3.10.0-1059.el7 or at an earlier build?

Thanks in advance!
-- Rafael

Comment 2 Li Wang 2019-07-25 05:27:41 UTC

(In reply to Rafael Aquini from comment #1)
> May I ask you how are you sure that the particular issue is solved by the
> pointed out patch?

The test#2(in move_pages12) is going to simulate the race condition, where move_pages() and soft offline are called on a single hugetlb page concurrently. But, it returns EBUSY and reports FAIL in soft-offline a moving hugepage as a result in upstream v5.2 kernel testing. 

I confirmed with Naoya Horiguchi and he pointed out that because of this new fix commit 6bc9b56433b7(mm: fix race on soft-offlining free huge pages) change on the return value of madvise(MADV_SOFT_OFFLINE), and we see -EBUSY when hugepage migration succeeded and error containment failed, that consider this EBUSY as error, but a good report for application.

And it says that patch also fixes another bz: a race condition between soft offline and hugetlb_fault which causes unexpected process SIGBUS killing and/or hugetlb allocation failure. So I tried it on rhel7 and get a failure like that:

err_log:
tst_test.c:1096: INFO: Timeout per run is 0h 05m 00s
move_pages12.c:236: INFO: Free RAM 119568 kB
move_pages12.c:254: INFO: Increasing 2048kB hugepages pool on node 0 to 83
move_pages12.c:264: INFO: Increasing 2048kB hugepages pool on node 1 to 94
move_pages12.c:180: INFO: Allocating and freeing 4 hugepages on node 0
move_pages12.c:180: INFO: Allocating and freeing 4 hugepages on node 1
move_pages12.c:170: PASS: Bug not reproduced
tst_test.c:1141: BROK: Test killed by SIGBUS!
move_pages12.c:114: FAIL: move_pages failed: ESRCH

dmesg
[ 9868.180669] MCE: Killing move_pages12:29616 due to hardware memory corruption fault at 2aaaaac00018
[ 9990.049875] Soft offlining page 50e00 at 2aaaaac00000
[ 9990.052218] Soft offlining page 50c00 at 2aaaaae00000
[ 9990.060395] Soft offlining page 51000 at 2aaaaac00000


This patch changes soft offline semantics where it sets PageHWPoison flag only after containment of the error page completes successfully.


-               if (PageHuge(page))
-                       dissolve_free_huge_page(page);
+               /*
+                * We set PG_hwpoison only when the migration source hugepage
+                * was successfully dissolved, because otherwise hwpoisoned
+                * hugepage remains on free hugepage list, then userspace will
+                * find it as SIGBUS by allocation failure. That's not expected
+                * in soft-offlining.
+                */
+               ret = dissolve_free_huge_page(page);
+               if (!ret) {
+                       if (set_hwpoison_free_buddy_page(page))
+                               num_poisoned_pages_inc();
+               }


> 
> Also, a couple more questions on this case:
>  a) do you have the console logs for the registered failure;

see above.

>  b) how consistent is your reproducer; and 
>  c) when it started to happen? at kernel-3.10.0-1059.el7 or at an earlier
> build?

Not sure, this reproducer is a new port to LTP, and I just run it from RHEL7.7(kernel-3.10.0-1059.el7) and mainline kernel v5.2, I guess the rhel8 also need to fix this problem too.

If anything I was wrong, feel free to correct me.


Li Wang

Comment 3 Li Wang 2019-07-25 05:32:13 UTC

Btw, this is the original discussion on LTP ML:
  http://lists.linux.it/pipermail/ltp/2019-June/012299.html

Comment 4 Rafael Aquini 2019-07-25 13:17:07 UTC

(In reply to Li Wang from comment #2)
> (In reply to Rafael Aquini from comment #1)
> > May I ask you how are you sure that the particular issue is solved by the
> > pointed out patch?
> 
> The test#2(in move_pages12) is going to simulate the race condition, where
> move_pages() and soft offline are called on a single hugetlb page
> concurrently. But, it returns EBUSY and reports FAIL in soft-offline a
> moving hugepage as a result in upstream v5.2 kernel testing. 
> 
> I confirmed with Naoya Horiguchi and he pointed out that because of this new
> fix commit 6bc9b56433b7(mm: fix race on soft-offlining free huge pages)
> change on the return value of madvise(MADV_SOFT_OFFLINE), and we see -EBUSY
> when hugepage migration succeeded and error containment failed, that
> consider this EBUSY as error, but a good report for application.
> 
> And it says that patch also fixes another bz: a race condition between soft
> offline and hugetlb_fault which causes unexpected process SIGBUS killing
> and/or hugetlb allocation failure. So I tried it on rhel7 and get a failure
> like that:
> 
> err_log:
> tst_test.c:1096: INFO: Timeout per run is 0h 05m 00s
> move_pages12.c:236: INFO: Free RAM 119568 kB
> move_pages12.c:254: INFO: Increasing 2048kB hugepages pool on node 0 to 83
> move_pages12.c:264: INFO: Increasing 2048kB hugepages pool on node 1 to 94
> move_pages12.c:180: INFO: Allocating and freeing 4 hugepages on node 0
> move_pages12.c:180: INFO: Allocating and freeing 4 hugepages on node 1
> move_pages12.c:170: PASS: Bug not reproduced
> tst_test.c:1141: BROK: Test killed by SIGBUS!
> move_pages12.c:114: FAIL: move_pages failed: ESRCH
> 
> dmesg
> [ 9868.180669] MCE: Killing move_pages12:29616 due to hardware memory
> corruption fault at 2aaaaac00018
> [ 9990.049875] Soft offlining page 50e00 at 2aaaaac00000
> [ 9990.052218] Soft offlining page 50c00 at 2aaaaae00000
> [ 9990.060395] Soft offlining page 51000 at 2aaaaac00000
> 
> 
> This patch changes soft offline semantics where it sets PageHWPoison flag
> only after containment of the error page completes successfully.
> 
> 
> -               if (PageHuge(page))
> -                       dissolve_free_huge_page(page);
> +               /*
> +                * We set PG_hwpoison only when the migration source hugepage
> +                * was successfully dissolved, because otherwise hwpoisoned
> +                * hugepage remains on free hugepage list, then userspace
> will
> +                * find it as SIGBUS by allocation failure. That's not
> expected
> +                * in soft-offlining.
> +                */
> +               ret = dissolve_free_huge_page(page);
> +               if (!ret) {
> +                       if (set_hwpoison_free_buddy_page(page))
> +                               num_poisoned_pages_inc();
> +               }
> 
> 
> > 
> > Also, a couple more questions on this case:
> >  a) do you have the console logs for the registered failure;
> 
> see above.
> 
> >  b) how consistent is your reproducer; and 
> >  c) when it started to happen? at kernel-3.10.0-1059.el7 or at an earlier
> > build?
> 
> Not sure, this reproducer is a new port to LTP, and I just run it from
> RHEL7.7(kernel-3.10.0-1059.el7) and mainline kernel v5.2, I guess the rhel8
> also need to fix this problem too.
> 
> If anything I was wrong, feel free to correct me.

Nope, nothing wrong, I'm just double-checking the fact to be sure we're (a) really
hitting the condition described at the patch-fix; and (b) if we know that's a regression
or if it is something that have always been there.

Thanks for the information, Li.

-- Rafael

Comment 7 Jan Stancek 2019-10-24 09:02:45 UTC

Also strange is that I'm unable to release most of hugepages. I run move_pages12 test couple times and hugepage count just keeps increasing:

# cat /sys/devices/system/node/node{0,1}/hugepages/hugepages-2048kB/nr_hugepages
38
24

# echo 0 > /proc/sys/vm/nr_hugepages
# echo 0 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
# echo 0 > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages

# cat /sys/devices/system/node/node{0,1}/hugepages/hugepages-2048kB/nr_hugepages
36
24

# cat /proc/meminfo  | grep Huge
AnonHugePages:      6144 kB
HugePages_Total:      60
HugePages_Free:       16
HugePages_Rsvd:        0
HugePages_Surp:       24
Hugepagesize:       2048 kB

Doesn't look to be recent issue, I see same behavior with 7.7GA, 7.6GA kernels.

Comment 8 IBM Bug Proxy 2019-12-11 16:40:31 UTC

------- Comment From mbringm.com 2019-12-11 11:32 EDT-------
Aneesh:
Please take a look at this.

Comment 9 IBM Bug Proxy 2020-01-20 10:20:46 UTC

------- Comment From sadas034.com 2020-01-20 05:12 EDT-------
(In reply to comment #4)
> Also strange is that I'm unable to release most of hugepages. I run
> move_pages12 test couple times and hugepage count just keeps increasing:
> # cat
> /sys/devices/system/node/node{0,1}/hugepages/hugepages-2048kB/nr_hugepages
> 38
> 24
> # echo 0 > /proc/sys/vm/nr_hugepages
> # echo 0 >
> /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
> # echo 0 >
> /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages
> # cat
> /sys/devices/system/node/node{0,1}/hugepages/hugepages-2048kB/nr_hugepages
> 36
> 24
> # cat /proc/meminfo  | grep Huge
> AnonHugePages:      6144 kB
> HugePages_Total:      60
> HugePages_Free:       16
> HugePages_Rsvd:        0
> HugePages_Surp:       24
> Hugepagesize:       2048 kB
> Doesn't look to be recent issue, I see same behavior with 7.7GA, 7.6GA
> kernels.

With ppc64, this test triggers a kernel crash on 7.7 and older GA kernels as observed in BZ178206. The fix for that is now included but I don't see any issues with freeing huge pages even after running this test several times.

Comment 10 IBM Bug Proxy 2020-03-10 16:52:51 UTC

------- Comment From mbringm.com 2020-03-10 12:43 EDT-------
RedHat: Any updates on this one?