| Summary: | prelink of 32-bit libc on 64-bit Fedora appears to cause call to mmap to crash | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Michael Wells <michaelawells> | ||||
| Component: | prelink | Assignee: | Jakub Jelinek <jakub> | ||||
| Status: | CLOSED NOTABUG | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 16 | CC: | jakub | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2012-01-31 20:40:15 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
That program is of course invalid. You can't blindly use mmap with MAP_FIXED without bothering to check that something isn't mmapped there already. (In reply to comment #1) > That program is of course invalid. You can't blindly use mmap with MAP_FIXED > without bothering to check that something isn't mmapped there already. The man page for "mmap" distributed with Fedora 16 says this about MAP_FIXED: "MAP_FIXED Don't interpret addr as a hint: place the mapping at exactly that address. addr must be a multiple of the page size. If the memory region specified by addr and len overlaps pages of any existing mapping(s), then the overlapped part of the existing mapping(s) will be discarded. If the specified address cannot be used, mmap() will fail. Because requiring a fixed address for a mapping is less portable, the use of this option is discouraged." I assumed that the phrase "mmap() will fail" meant that mmap will return a failure code, not segfault. I am wrong to assume that? One of the published error codes (from the manpage) is: "EINVAL We don't like addr, length, or offset (e.g., they are too large, or not aligned on a page boundary)." But the address isn't too large, or not aligned on a page boundary, there is nothing wrong with the length or offset either. All that is wrong is that you haven't checked whether there isn't something different already mmapped and what consequences will have discarding the previous mapping. Of course discarding the mapping of the C library or other shared libraries or binaries that are used has usually fatal consequences. The safe way to use MAP_FIXED is e.g. to do a non-MAP_FIXED, e.g. PROT_NONE, mmap with the desired address as a hint, and if that succeeds and gives you the expected address, you can mmap MAP_FIXED over it (e.g. portions of it). If you strace some program, you'll see that this is what e.g. the dynamic linker does. Thanks for the information. I think I now understand what was causing our problem. In case you are interested how this came up, here is some background information: To maximize the available dynamic space, the SBCL implementation needs to obtain as large a block of memory in the first 2GB of address space as possible. The SBCL implementation isn't using MAP_FIXED, but it does insist that the address returned by mmap be the same as was requested, or it will fail gracefully. We need a large image. We are able to obtain an image size of nearly 2GB when libc is not prelinked. We are not able to obtain this much space when libc is prelinked, we can only obtain around 1.2GB. What was particularly vexing is that we could get SBCL to allocate the larger amount immediately after installing glibc. SBCL would continue to work as we expected for as long as a day after reinstalling glibc. This was very mysterious -- why would it work at first, and not later? I noticed that the size of glibc was changing, but not the timestamp. It took me quite some time to figure out what was changing glibc. Using auditctl, I discovered there was a daily anacron job which ran prelink. (We are migrating from a Linux implementation which wasn't prelinking libc to Fedora 16, which I _now_ realize is prelinking. It is important to use that our 32-bit Lisp programs continue to work...) You can link the program as PIE (compile with -fpie and link with -pie), then the load addresses of libraries will be ignored (though, of course nothing will guarantee they won't be mmapped there). Or you can use LD_USE_LOAD_BIAS=0 env var. Still, that doesn't affect where the kernel will mmap the dynamic linker. If you need that much memory in a single process, you should really use 64-bit programs btw. In many cases, 32-bit Lisp programs run faster and take up less memory and 64-bit Lisp programs. |
Created attachment 558703 [details] small test program Description of problem: I had trouble getting a 32-bit version of SBCL working correctly using 64-bit version Fedora 16. I isolated the problem to a call to mmap, and later determined that certain calls to mmap do not work correctly after /lib/libc-2.14.90.so is prelinked. I'm providing a small program which can be used to reproduce the crash when calling mmap. Version-Release number of selected component (if applicable): Provided with Fedora 16. How reproducible: Start from Fedora 16 64-bit installation with gcc installed. Add packages to build 32-bit binary: yum install glibc.i686 yum install glibc-devel.i686 yum install libgcc.i686 Compile provided program as a 32-bit binary: gcc -m32 mmap-test.c -o mmap-test The program crashes when /lib/libc-2.14.90.so is prelinked. To see program work correctly when /lib/libc-2.14.90.so is not prelinked: Reinstall "clean" version of /lib/libc-2.14.90.so: yum reinstall glibc.i686 Run program: ./mmap-test Expected output: before mmap call after mmap call actual: 0x9000000 addr: 0x9000000 To see crash: Note size of /lib/libc-2.14.90.so before prelink. Force prelink (as root): /etc/cron.daily/prelink Note that size of /lib/libc-2.14.90.so has changed. Run program: ./mmap-test See crash: before mmap call Segamentation fault (core dumped)