497653 – get "bad pmd" when forking process with hugepage shared memeory segments

Bug 497653 - get "bad pmd" when forking process with hugepage shared memeory segments

Summary: get "bad pmd" when forking process with hugepage shared memeory segments

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.3
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Larry Woodman
QA Contact:	Zhouping Liu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-04-25 19:20 UTC by starlight
Modified:	2023-09-14 01:16 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-11-11 20:03:52 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
testcase (3.78 KB, text/plain) 2009-05-15 05:50 UTC, starlight	no flags	Details
the reproducer program (3.65 KB, text/plain) 2011-08-02 06:26 UTC, Zhouping Liu	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Linux Kernel	13302	0	None	None	None	2019-08-05 04:45:20 UTC

Description starlight 2009-04-25 19:20:37 UTC

Description of problem:
=======================

We have a daemon that fork/execs scripts periodically.
Presently switching to hugepage allocations for shared
memory segments.

Works, but now the dmesg log is flooded with messages like

   mm/memory.c:118: bad pmd ffff8102d3ab8fb0(80000003974000e7).
   mm/memory.c:118: bad pmd ffff8102d3ab8fb8(80000003976000e7).
   mm/memory.c:118: bad pmd ffff8102d3ab8fc0(80000003970000e7).
   mm/memory.c:118: bad pmd ffff8102d3ab8fc8(80000003972000e7).

whenever fork() is executed by the daemon.  Would use
posix_spawn() but apparently in Linux this is just a
library function that performs a fork/exec.

Appears on both an Opteron and a Core 2 server.


Version-Release number of selected component (if applicable):
=============================================================

kernel 2.6.18-128.1.6.el5

How reproducible:
=================

Allocate hugepage shared memory segments and attache them.
Then issue fork().
  
Actual results:

"bad pmd" messages from kernel

Expected results:

No errors.

Additional info:

No error messages under RHEL4 kernel 2.6.9-78.0.13.ELsmp.

Comment 1 starlight 2009-04-27 20:40:49 UTC

Tried vfork() and this works-around the issue.  Not a great 
solution since the effective user ID for the child cannot be 
changed to non-root when using vfork() rather than fork().

Also it's bad that the parent daemon (multi-threaded) is
blocked until the exec() call issued.

Best solution would be for Linux to implement a proper 
posix_spawn() system call.

Comment 2 starlight 2009-05-15 05:50:23 UTC

Created attachment 344083 [details]
testcase

At least on the DL160 this testcase wrecks kernel
2.6.18-128.1.6.el5 100% every time.  hugepages=2048
should be set.

Comment 3 starlight 2009-06-02 03:29:48 UTC

Upstream reports and patches:

http://bugzilla.kernel.org/show_bug.cgi?id=13302
[current activity and patches all here]

http://bugzilla.kernel.org/show_bug.cgi?id=13192

http://bugzilla.kernel.org/show_bug.cgi?id=12134

Comment 4 Larry Woodman 2010-06-09 18:50:15 UTC

The attached patch that was posted to rhkernel-list fixes this problem:

--- linux-2.6.18.x86_64/arch/i386/mm/hugetlbpage.c.orig 2010-06-09 10:01:41.000000000 -0400
+++ linux-2.6.18.x86_64/arch/i386/mm/hugetlbpage.c      2010-06-09 10:02:27.000000000 -0400
@@ -26,12 +26,15 @@
        unsigned long sbase = saddr & PUD_MASK;
        unsigned long s_end = sbase + PUD_SIZE;

+       /* allow segments to share if only one is marked locked */
+       unsigned long vm_flags = vma->vm_flags & ~VM_LOCKED;
+       unsigned long svm_flags = svma->vm_flags & ~VM_LOCKED;
        /*
         * match the virtual addresses, permission and the alignment of the
         * page table page.
         */
        if (pmd_index(addr) != pmd_index(saddr) ||
-           vma->vm_flags != svma->vm_flags ||
+           vm_flags != svm_flags ||
            sbase < svma->vm_start || svma->vm_end < s_end)
                return 0;



Larry Woodman

Comment 5 Zhouping Liu 2011-08-02 06:26:42 UTC

Created attachment 516255 [details]
the reproducer program

I updated the reproducer.
 
Before running the reproduce case, you may need to
set overcommit_memory and hugepages, do like this:
# echo 2048 > /proc/sys/vm/nr_hugepages
# echo 1 > /proc/sys/vm/overcommit_memory
and I reproduced it on kernel-2.6.18-128.1.6.el5 and kernel-2.6.18-164.el5

thanks.

Comment 6 RHEL Program Management 2011-08-12 01:25:06 UTC

Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request.

Comment 7 starlight 2011-08-12 04:39:32 UTC

This is a serious system corruption bug with an upstream fix.

Comment 8 RHEL Program Management 2012-01-09 13:55:11 UTC

This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.8 and Red Hat does not plan to fix this issue the currently developed update.

Contact your manager or support representative in case you need to escalate this bug.

Comment 11 masanari iida 2012-11-28 08:30:03 UTC

(In reply to comment #4)
> The attached patch that was posted to rhkernel-list fixes this problem:
> 
> --- linux-2.6.18.x86_64/arch/i386/mm/hugetlbpage.c.orig 2010-06-09
> 10:01:41.000000000 -0400
> +++ linux-2.6.18.x86_64/arch/i386/mm/hugetlbpage.c      2010-06-09
> 10:02:27.000000000 -0400
> @@ -26,12 +26,15 @@
>         unsigned long sbase = saddr & PUD_MASK;
>         unsigned long s_end = sbase + PUD_SIZE;
> 
> +       /* allow segments to share if only one is marked locked */
> +       unsigned long vm_flags = vma->vm_flags & ~VM_LOCKED;
> +       unsigned long svm_flags = svma->vm_flags & ~VM_LOCKED;
>         /*
>          * match the virtual addresses, permission and the alignment of the
>          * page table page.
>          */
>         if (pmd_index(addr) != pmd_index(saddr) ||
> -           vma->vm_flags != svma->vm_flags ||
> +           vm_flags != svm_flags ||
>             sbase < svma->vm_start || svma->vm_end < s_end)
>                 return 0;
> 
> 
> 
> Larry Woodman

Larry, I found this patch is already included in RHEL5.6(2.6.18-238) and later.
But Changelog doesn't include this BZ#, and this case is "Assigned" status.
Would you mind to double check if this symptom is fixed?

Comment 14 Larry Woodman 2012-11-29 15:31:24 UTC

Masaki, yes this problem is fixed by the patch in Comment #4 and it is in RHEL5.6.
Do you know if anyone has seen this "bad pmd" message while running RHEL5.6 or later???

Larry

Comment 15 starlight 2012-11-29 15:44:04 UTC

Can re-test if it would be helpful.
Currently running 2.6.18-308.el5.

Comment 16 masanari iida 2012-11-30 00:59:24 UTC

Larry, Thanks for the confirmation.
One of my customer encountered this symptom on RHEL5.3.
When I was looking for a solution, I found this BZ and get confused.

Comment 19 starlight 2012-11-30 02:08:06 UTC

I am the original reporter.  Eventually
figured out that the Linux implementation
of vfork() only blocks the calling thread
and allows modification of user id, group
id and other process attributes (unlike
traditional UNIX vfork) so we never went
back and tested fork()--especially as
RH management declared that it was not
and would not be fixed.

However a simple environment variable
tweak will put fork() back so I am
willing to re-test it if it makes
a difference to anyone.

Comment 20 Andrius Benokraitis 2013-11-11 20:03:52 UTC

After thorough deliberation, this bugzilla is not planned on being addressed in the Red Hat Enterprise Linux 5 time frame. Current efforts are focused on Red Hat Enterprise Linux 6, and future major releases.

Comment 21 Red Hat Bugzilla 2023-09-14 01:16:03 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.