Bug 497653
Summary: | get "bad pmd" when forking process with hugepage shared memeory segments | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | starlight | ||||||
Component: | kernel | Assignee: | Larry Woodman <lwoodman> | ||||||
Status: | CLOSED WONTFIX | QA Contact: | Zhouping Liu <zliu> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 5.3 | CC: | anton, aquini, ccui, masanari_iida, mfuruta, nobody+295318, qcai, syeghiay | ||||||
Target Milestone: | rc | Keywords: | Reopened | ||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2013-11-11 20:03:52 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
starlight
2009-04-25 19:20:37 UTC
Tried vfork() and this works-around the issue. Not a great solution since the effective user ID for the child cannot be changed to non-root when using vfork() rather than fork(). Also it's bad that the parent daemon (multi-threaded) is blocked until the exec() call issued. Best solution would be for Linux to implement a proper posix_spawn() system call. Created attachment 344083 [details]
testcase
At least on the DL160 this testcase wrecks kernel
2.6.18-128.1.6.el5 100% every time. hugepages=2048
should be set.
Upstream reports and patches: http://bugzilla.kernel.org/show_bug.cgi?id=13302 [current activity and patches all here] http://bugzilla.kernel.org/show_bug.cgi?id=13192 http://bugzilla.kernel.org/show_bug.cgi?id=12134 The attached patch that was posted to rhkernel-list fixes this problem: --- linux-2.6.18.x86_64/arch/i386/mm/hugetlbpage.c.orig 2010-06-09 10:01:41.000000000 -0400 +++ linux-2.6.18.x86_64/arch/i386/mm/hugetlbpage.c 2010-06-09 10:02:27.000000000 -0400 @@ -26,12 +26,15 @@ unsigned long sbase = saddr & PUD_MASK; unsigned long s_end = sbase + PUD_SIZE; + /* allow segments to share if only one is marked locked */ + unsigned long vm_flags = vma->vm_flags & ~VM_LOCKED; + unsigned long svm_flags = svma->vm_flags & ~VM_LOCKED; /* * match the virtual addresses, permission and the alignment of the * page table page. */ if (pmd_index(addr) != pmd_index(saddr) || - vma->vm_flags != svma->vm_flags || + vm_flags != svm_flags || sbase < svma->vm_start || svma->vm_end < s_end) return 0; Larry Woodman Created attachment 516255 [details]
the reproducer program
I updated the reproducer.
Before running the reproduce case, you may need to
set overcommit_memory and hugepages, do like this:
# echo 2048 > /proc/sys/vm/nr_hugepages
# echo 1 > /proc/sys/vm/overcommit_memory
and I reproduced it on kernel-2.6.18-128.1.6.el5 and kernel-2.6.18-164.el5
thanks.
Development Management has reviewed and declined this request. You may appeal this decision by reopening this request. This is a serious system corruption bug with an upstream fix. This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.8 and Red Hat does not plan to fix this issue the currently developed update. Contact your manager or support representative in case you need to escalate this bug. (In reply to comment #4) > The attached patch that was posted to rhkernel-list fixes this problem: > > --- linux-2.6.18.x86_64/arch/i386/mm/hugetlbpage.c.orig 2010-06-09 > 10:01:41.000000000 -0400 > +++ linux-2.6.18.x86_64/arch/i386/mm/hugetlbpage.c 2010-06-09 > 10:02:27.000000000 -0400 > @@ -26,12 +26,15 @@ > unsigned long sbase = saddr & PUD_MASK; > unsigned long s_end = sbase + PUD_SIZE; > > + /* allow segments to share if only one is marked locked */ > + unsigned long vm_flags = vma->vm_flags & ~VM_LOCKED; > + unsigned long svm_flags = svma->vm_flags & ~VM_LOCKED; > /* > * match the virtual addresses, permission and the alignment of the > * page table page. > */ > if (pmd_index(addr) != pmd_index(saddr) || > - vma->vm_flags != svma->vm_flags || > + vm_flags != svm_flags || > sbase < svma->vm_start || svma->vm_end < s_end) > return 0; > > > > Larry Woodman Larry, I found this patch is already included in RHEL5.6(2.6.18-238) and later. But Changelog doesn't include this BZ#, and this case is "Assigned" status. Would you mind to double check if this symptom is fixed? Masaki, yes this problem is fixed by the patch in Comment #4 and it is in RHEL5.6. Do you know if anyone has seen this "bad pmd" message while running RHEL5.6 or later??? Larry Can re-test if it would be helpful. Currently running 2.6.18-308.el5. Larry, Thanks for the confirmation. One of my customer encountered this symptom on RHEL5.3. When I was looking for a solution, I found this BZ and get confused. I am the original reporter. Eventually figured out that the Linux implementation of vfork() only blocks the calling thread and allows modification of user id, group id and other process attributes (unlike traditional UNIX vfork) so we never went back and tested fork()--especially as RH management declared that it was not and would not be fixed. However a simple environment variable tweak will put fork() back so I am willing to re-test it if it makes a difference to anyone. After thorough deliberation, this bugzilla is not planned on being addressed in the Red Hat Enterprise Linux 5 time frame. Current efforts are focused on Red Hat Enterprise Linux 6, and future major releases. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |