Bug 106939 - vm hang if MAP_LOCKED is set
vm hang if MAP_LOCKED is set
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.0
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Larry Woodman
Brian Brock
:
Depends On:
Blocks: 170417 186960
  Show dependency treegraph
 
Reported: 2003-10-13 16:56 EDT by Larry Troan
Modified: 2016-04-18 05:43 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-03-30 16:16:47 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Successful RHL9 strace (20.44 KB, text/plain)
2003-10-26 16:20 EST, Larry Troan
no flags Details
Erroneous RHEL3 (EL3) strace (8.17 KB, text/plain)
2003-10-26 16:21 EST, Larry Troan
no flags Details
Test Case (11.00 KB, application/octet-stream)
2004-06-07 11:46 EDT, Frank Hirtz
no flags Details

  None (edit)
Description Larry Troan 2003-10-13 16:56:40 EDT
When loading a g++ application that has lib*.so DLLs associated with it, the
/etc/ld.so.conf points to the *.so files, but they do not load properly.  
This works on AS 2.1, RH 8.0, or RH 9.
The application and its shared objects are rebuilt when the glibc is different,
of course.

The application cannot load the lib*.so at all in Beta2, and hangs while trying
to load it in RC1.
----------
Action by: andrewrcress
Issue Registered
----------
Action by: arjanv
this reports needs more information, like any printed error messages, and maybe
even a reproducer....


Category set to: Applications
Status set to: Waiting on Client

----------
Action by: andrewrcress
I can't attach the app source, but here is what gdb shows.  I'll need to go back
and step through _dl_open more, but that's where it is failing.
Very strange, since the shared object file is there in the path specified in
ld.so.conf.

(gdb) step
64              h_biosUpdateProtocol = LOAD_LIB(BIOSUPDATE_DLL_FNAME);
(gdb)
0x0021d92c in _dl_open () from /lib/tls/libc.so.6
(gdb)
Single stepping until exit from function _dl_open,
which has no line number information.
0x00f0a456 in _dlerror_run () from /lib/libdl.so.2
(gdb)
Single stepping until exit from function _dlerror_run,
which has no line number information.
0x00f09f94 in dlopen@@GLIBC_2.1 () from /lib/libdl.so.2
(gdb)
Single stepping until exit from function dlopen@@GLIBC_2.1,
which has no line number information.
CreateInterface(_EFI_BIOSUPDATE_INTERFACE*&, void*&) (
    p_biosUpdateIf=@0x80529f4, h_biosUpdateProtocol=@0x80529f8)
    at BiosUpdateInit.cpp:66
66              if ( h_biosUpdateProtocol )


----------
Action by: arjanv
your application isn't static linked, right ?


----------
Action by: ltroan
Escalated to Bugzilla

ISSUE TRACKER 28334 opened as sev 2
(frigin tool won't let me select g++ component ;-()
Comment 1 Larry Troan 2003-10-13 17:05:12 EDT
FROM ISSUE TRACKER
Event posted 10-10-2003 03:30pm by andrewrcress with duration of 0.00        It
is dynamically linked.  
I can make it work in Beta2 by moving the libraries to a path that it does
search, but it hangs in RC1.
The application is rebuilt from source on each target system, so the libs match.

Here is the LD_DEBUG output.
*********  on Beta2:
# uname -r
2.4.21-1.1931.2.399.entsmp
# cat /etc/ld.so.conf
/usr/kerberos/lib
/usr/X11R6/lib
/usr/lib/qt-3.1/lib
/usr/lib/sane
/usr/local/flashupdt/lib
# ldconfig
# export LD_DEBUG=libs
# ./flashupdt -h      (should produce help/usage output)
[... snip ...]
    14259:     find library=libbiosupdate.so; searching
    14259:      search cache=/etc/ld.so.cache
    14259:      search
path=/lib/tls/i686/mmx:/lib/tls/i686:/lib/tls/mmx:/lib/tls:/lib/i686/mmx:/lib/i686:/lib/mmx:/lib:/usr/lib/tls/i686/mmx:/usr/lib/tls/i686:/usr/lib/tls/mmx:/usr/lib/tls:/usr/lib/i686/mmx:/usr/lib/i686:/usr/lib/mmx:/usr/lib
           (system search path)
    14259:       trying file=/lib/tls/i686/mmx/libbiosupdate.so
    14259:       trying file=/lib/tls/i686/libbiosupdate.so
    14259:       trying file=/lib/tls/mmx/libbiosupdate.so
    14259:       trying file=/lib/tls/libbiosupdate.so
    14259:       trying file=/lib/i686/mmx/libbiosupdate.so
    14259:       trying file=/lib/i686/libbiosupdate.so
    14259:       trying file=/lib/mmx/libbiosupdate.so
    14259:       trying file=/lib/libbiosupdate.so
    14259:       trying file=/usr/lib/tls/i686/mmx/libbiosupdate.so
    14259:       trying file=/usr/lib/tls/i686/libbiosupdate.so
    14259:       trying file=/usr/lib/tls/mmx/libbiosupdate.so
    14259:       trying file=/usr/lib/tls/libbiosupdate.so
    14259:       trying file=/usr/lib/i686/mmx/libbiosupdate.so
    14259:       trying file=/usr/lib/i686/libbiosupdate.so
    14259:       trying file=/usr/lib/mmx/libbiosupdate.so
    14259:       trying file=/usr/lib/libbiosupdate.so
    14259:

*ERROR*  Unable to load the shared object "libbiosupdate.so".
[... snip ...]
#
NOTE: If I copy the lib*.so files into /lib/tls/i686/ (or /usr/lib/i686/), it works.

****** on EL3.0 RC1:
# uname -r
2.4.21-3.ELsmp
# export LD_DEBUG=libs
# ./flashupdt -h
    17497:     find library=libdl.so.2; searching
    17497:      search cache=/etc/ld.so.cache
    17497:       trying file=/lib/libdl.so.2
    17497:
    17497:     find library=libstdc++.so.5; searching
    17497:      search cache=/etc/ld.so.cache
    17497:       trying file=/usr/lib/libstdc++.so.5
    17497:
    17497:     find library=libm.so.6; searching
    17497:      search cache=/etc/ld.so.cache
    17497:       trying file=/lib/tls/libm.so.6
    17497:
    17497:     find library=libc.so.6; searching
    17497:      search cache=/etc/ld.so.cache
    17497:       trying file=/lib/tls/libc.so.6
    17497:
    17497:     find library=libgcc_s.so.1; searching
    17497:      search cache=/etc/ld.so.cache
    17497:       trying file=/lib/libgcc_s.so.1
    17497:
    17497:
    17497:     calling init: /lib/tls/libc.so.6
    17497:
    17497:
    17497:     calling init: /lib/libgcc_s.so.1
    17497:
    17497:
    17497:     calling init: /lib/tls/libm.so.6
    17497:
    17497:
    17497:     calling init: /usr/lib/libstdc++.so.5
    17497:
    17497:
    17497:     calling init: /lib/libdl.so.2
    17497:
    17497:
    17497:     initialize program: ./flashupdt
    17497:
    17497:
    17497:     transferring control: ./flashupdt
    17497:
One-Boot Flash Update Utility Ver 1.0
Copyright (c) Intel Corporation 2003

    17497:     find library=libbiosupdate.so; searching
    17497:      search cache=/etc/ld.so.cache
    17497:       trying file=/usr/local/flashupdt/lib/libbiosupdate.so
    17497:
    17497:
    17497:     calling init: /usr/local/flashupdt/lib/libbiosupdate.so
    17497:
    17497:     find library=libbud.so; searching
    17497:      search cache=/etc/ld.so.cache
    17497:       trying file=/usr/local/flashupdt/lib/libbud.so
    17497:
    17497:
    17497:     calling init: /usr/local/flashupdt/lib/libbud.so
    17497:
(Now process is hung, kill -9 won't stop it, and ps -ef hangs too. However, I
can start a new ssh/login session after this.)
Comment 2 Bill Nottingham 2003-10-14 00:27:31 EDT
Does your program have a DT_RPATH?
Comment 4 Larry Troan 2003-10-26 16:20:07 EST
FROM ISSUE TRACKER
Event posted 10-22-2003 02:13pm by andrewrcress with duration of 0.00        
No, I didn't have DT_RPATH set in the environment.  I never had to set it before
with AS 2.1, RH 8.0, or RH 9. When I do set it, it seems to help; it gets
farther.  Now the Beta2 & RC1 systems have the same symptoms.
Both hang trying to load the 2nd library (libbud.so) during init.  

It hangs in a system call to mmap (mmap2), that looks the same as it did under
RH 9.Is there something different about mmap2 in RH EL3?  
I have attached a successful strace from RH9, and the one that hangs on EL3.

This shouldn't really be a library problem any more, I guess, but a mmap2
problem now.

Status set to: Waiting on Tech    

-----------------------------------------------
Event posted 10-22-2003 02:14pm by andrewrcress with duration of 0.00
strace-rh9.dbg
Attaching the successful strace with RH9

File uploaded: strace-rh9.dbg    

-----------------------------------------------
Event posted 10-22-2003 02:15pm by andrewrcress with duration of 0.00
strace-EL3.dbg
Attaching the strace of the same app hanging on EL3 RC1.

File uploaded: strace-EL3.dbg

------------------------------------------------
Event posted 10-24-2003 03:18pm by andrewrcress with duration of 0.00        
I noticed that the driver on EL3 sees an extra 0x2000 bit set in vm_flags, even
though the utility doesn't pass anything different to it.
 0x2000 = VM_LOCKED.
 mmap converts MAP_LOCKED to VM_LOCKED. I see that this code is new in EL3.
mm/mmap.c:
if (flags & MAP_LOCKED)
        vm_flags |= VM_LOCKED;

When I remove the VM_LOCKED bit from the vm_flags in the driver, before calling
remap_page_range, it works.  This is the problem.
Since the nature of the problem report is now different, should this be a
separate issue#?
Comment 5 Larry Troan 2003-10-26 16:20:51 EST
Created attachment 95492 [details]
Successful RHL9 strace
Comment 6 Larry Troan 2003-10-26 16:21:39 EST
Created attachment 95493 [details]
Erroneous RHEL3 (EL3) strace
Comment 7 Larry Troan 2003-12-15 10:44:18 EST
FROM ISSUE TRACKER
Event posted 12-08-2003 04:48am by arjanv with duration of 0.00      
 > When I remove the VM_LOCKED bit from the vm_flags in the driver,
before calling remap_page_range, it works.  This is the problem.


Which driver ? Are the sources available so that we can check it for
bugs and incompatibilities with RHEL3 ?
Comment 8 Larry Troan 2004-01-12 12:01:19 EST
FROM ISSUE TRACKER...
Event posted 01-09-2004 12:30pm by andrewrcress with duration of 0.00
       This is an Intel firmware update driver (ofu = One-Boot
Firmware Update).  It is not open-source, yet anyway, but I can send
the source to specific individual(s) for review under CNDA 72897
between Intel and RedHat.  Just treat it as confidential for now. 
I'll email the source to Larry [T]Roan.  
Comment 9 Larry Troan 2004-01-12 12:05:15 EST
Created attachment 96899
Comment 10 Frank Hirtz 2004-04-07 09:51:32 EDT
Logged against the beta, but is still an issue on the GA release.
Changing assignment to reflect current reality.
Comment 14 Frank Hirtz 2004-06-07 11:45:49 EDT
A version of this driver has now been released with the BSD license,
and it is attached as ofu-bsd.tar.gz.
In flash_ud_module.c, line 353 has added code for EL3 to work around
this problem, if that line is commented out/removed,
the problem occurs.
Comment 15 Frank Hirtz 2004-06-07 11:46:52 EDT
Created attachment 100923 [details]
Test Case
Comment 16 Andrew Cress 2004-08-19 16:08:50 EDT
The RH EL3 mmap man page says that MAP_LOCKED is not supported until 
Linux kernel 2.5.37 and later.
So, the EL3 kernel source should be changed in mm/mmap.c to delete 
lines 544-545:
if (flags & MAP_LOCKED)
      vm_flags |= VM_LOCKED;
These two lines were added by RedHat in EL3 and are not present in 
the kernel.org source base. 

This change will avoid the vm hang condition that still exists in EL3 
U2, and in EL3 U3 beta (2.4.21-17.ELsmp).

BTW, the summary line should be changed to "vm hang if MAP_LOCKED is 
set".  
Comment 18 Larry Woodman 2005-07-29 15:59:23 EDT
 
Its the make_pages_present() call that causes the problem, this wires the pages
into the address space.  When the virtual size exceeds the amount of RAM it
exhausts memory and its not reclaimable.  

---------------------------------------------------------
       if (vm_flags & VM_LOCKED) {
                mm->locked_vm += len >> PAGE_SHIFT;
                make_pages_present(addr, addr + len);
---------------------------------------------------------

This wont result in a hang unless the process is root.  Do you agree???

Larry Woodman


Comment 19 Andrew Cress 2005-07-29 16:22:53 EDT
Larry,

Perhaps so, but in this case the process that calls mmap() is a kernel 
loadable module which is always root.  
I'm not sure whether it was exceeding the RAM at the time the driver loaded, 
since these test systems usually have either 512MB or 1GB of RAM.

Andy
Comment 20 Larry Woodman 2005-08-15 14:24:59 EDT
Can someone grab AltSysrq-M, P, T and W outputs when this happens so I can see
what the system is doing?

Thanks, Larry Woodman
Comment 21 Andrew Cress 2005-08-15 14:34:38 EDT
Larry,
See the ofu-bsd.tar.gz "Test Case" attachment under comment #15 for the 
loadable module we used (BSD License) with Linux 2.4 (EL3).  Just insmod the 
module to get the error.  It will go a lot faster, both for debug and for 
testing fixes, if the problem is reproduced at RedHat.
  
Andy
Comment 24 Larry Woodman 2005-09-30 16:05:46 EDT
I do not know what to do about this, the module simply calls mmap() with
VM_LOCKED on a system that doesnt have enough RAM.

Larry
Comment 25 Andrew Cress 2005-09-30 16:20:42 EDT
Doesn't have enough RAM?  That seems very unlikely since most of our test 
systems run >=1GB RAM and barely use a fraction of that.  Note that the mmap()
works if VM_LOCKED is not set.
The real question is whether any mmap() call with VM_LOCKED can succeed 
without hanging if called from a kernel module (driver).  

Andy
Comment 26 Ernie Petrides 2005-10-05 15:02:01 EDT
Larry Troan, you created this bugzilla as private to Intel.  Does
it really need to remain that way?  If not, please uncheck the
"Intel Confidential Group" box below, and then I'll make the bug
public.  Thanks in advance.
Comment 29 Ernie Petrides 2005-10-07 17:39:42 EDT
Hello, Keve.  Could you please do this in bugzilla directly?  I don't
have the perms to uncheck the "Intel Confidential Group" box below.

Thanks in advance.  -ernie
Comment 31 Larry Woodman 2005-10-21 11:40:22 EDT
The problem is that RHEL3 has imported the VM_LOCKED functionality from the
upstream 2.6 kernel yet the RHEL3 mmap man page says that MAP_LOCKED is not
supported until Linux kernel 2.5.37 and later.

Removing the following lines from do_mmap_pgoff() fixes this problem but breaks
the overall RHEL3 memory locking scheme:

        if (flags & MAP_LOCKED)
                vm_flags |= VM_LOCKED;

Since I cant solve both problems in the near term I'm NAKing this for RHEL3-U7.

Larry Woodman

Note You need to log in before you can comment on or make changes to this bug.