Description of problem: calloc() sometimes returns buffers that are not zero-filled. This happens only when the process address space is locked by a call to mlockall. When the glibc calloc implementation knows that part of a buffer it is about to return has just been allocated by growing the heap (acquiring new pages from the kernel that are zero filled), it does not clear that part of the buffer (because presumably it is already filled with zeros). When the C-heap is shrunk, memory not needed anymore is marked with a call to madvise with the MADV_DONTNEED flag. Subsequent use of this space will be assumed to be filled with zeros and won't be cleared by the C-heap allocator on a calloc. Except, that when memory is locked (with a call to mlockall), the kernel considers madvise with the MADV_DONTNEED flag to be unsupported and returns an error that is ignored by the libc. Roughly, the chain of events that leads to the crash is: 1- the application allocates a bunch of buffers with malloc etc. and uses them 2- when it's done, it calls free on those buffers. The C-heap is shrunk. The space that composed those buffers is released by calling madvise(MADV_DONTNEED). 3- the application calls calloc(). The C-heap needs to be grown again. It reclaims some of the space that was released by the call to madvise. Because it is the result of an extension of the heap, calloc does not fill the buffer with zeros. When the madvise call in 2- succeeds, all this works fine because space marked with madvise when it is accessed again is filled with zeros. When the madvise call failed in 2- because the memory of the application is locked, calloc() in 3- returns a buffer filled with garbage. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Do you get this same behavior when running the standard RHEL5 kernel? Just trying to initially narrow things down to kernel vs glibc.
There is not enough information contained in this bug report for us to understand the problem and attempt to resolve. Please provide the test case and accompanying descriptive information.
I've looked at the problem. After ignoring the misleading subject I've found a problem which is likely the cause for the issue which is observed. We have to be less optimistic about the state of memory in arenas other than the main arena when madvise is used. The upstream glibc cvs contains a fix. Testing with the next rawhide build when it's done would be appreciated.
Just checked to see where specifically the package can be found. The answer was that this package has not yet been built and provided externally. This bugzilla will be updated when the new package is available. Here's some instructions on accessing the Fedora rawhide to get the latest available packages. Easiest is to download from koji. Go to http://koji.fedoraproject.org/koji/, search for glibc (in the search field), select the most recent version, and then download the individual RPMs
I'll try the fix either with rawhide or the cvs libc and let you not if the issue is gone.
Backported in glibc-2.5-20.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
qe_ack+ for rhel5.2 we'll need some testcase (or at least some testing hints), or verification from Sun (comment 6)
I've tried the 2.7 glibc from: http://koji.fedoraproject.org/koji/buildinfo?buildID=31131 No more crashes. The bug is fixed as far as I can tell.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0083.html