Bug 405781

Summary: calloc() broken when process address space is locked
Product: Red Hat Enterprise Linux 5 Reporter: Roland Westrelin <roland.westrelin>
Component: glibcAssignee: Jakub Jelinek <jakub>
Status: CLOSED ERRATA QA Contact:
Severity: high Docs Contact:
Priority: low    
Version: 5.2CC: drepper
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2008-0083 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-05-21 16:52:52 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Roland Westrelin 2007-11-30 11:20:07 UTC
Description of problem:

calloc() sometimes returns buffers that are not zero-filled. This happens only
when the process address space is locked by a call to mlockall.

When the glibc calloc implementation knows that part of a buffer it is about to
return has just been allocated by growing the heap (acquiring new pages from the
kernel that are zero filled), it does not clear that part of the buffer (because
presumably it is already filled with zeros).

When the C-heap is shrunk, memory not needed anymore is marked with a
call to madvise with the MADV_DONTNEED flag.

Subsequent use of this space will be assumed to be filled with zeros and
won't be cleared by the C-heap allocator on a calloc. Except, that when
memory is locked (with a call to mlockall), the kernel considers madvise
with the MADV_DONTNEED flag to be unsupported and returns an error that
is ignored by the libc.

Roughly, the chain of events that leads to the crash is:

1- the application allocates a bunch of buffers with malloc etc. and uses them

2- when it's done, it calls free on those buffers. The C-heap is shrunk. The
space that composed those buffers is released by calling madvise(MADV_DONTNEED).

3- the application calls calloc(). The C-heap needs to be grown again. It
reclaims some of the space that was released by the call to madvise. Because it
is the result of an extension of the heap, calloc does not fill the buffer with
zeros.

When the madvise call in 2- succeeds, all this works fine because space marked
with madvise when it is accessed again is filled with zeros. When the madvise
call failed in 2- because the memory of the application is locked, calloc() in
3- returns a buffer filled with garbage. 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Tim Burke 2007-12-17 01:01:19 UTC
Do you get this same behavior when running the standard RHEL5 kernel?  Just
trying to initially narrow things down to kernel vs glibc.


Comment 2 Tim Burke 2007-12-17 13:19:23 UTC
There is not enough information contained in this bug report for us to
understand the problem and attempt to resolve.  Please provide the test case and
accompanying descriptive information.


Comment 3 Ulrich Drepper 2007-12-17 18:47:02 UTC
I've looked at the problem.  After ignoring the misleading subject I've found a
problem which is likely the cause for the issue which is observed.  We have to
be less optimistic about the state of memory in arenas other than the main arena
when  madvise is used.  The upstream glibc cvs contains a fix.  Testing with the
next rawhide build when it's done would be appreciated.

Comment 4 Tim Burke 2007-12-20 14:13:32 UTC
Just checked to see where specifically the package can be found.  The answer was
that this package has not yet been built and provided externally.  This bugzilla
will be updated when the new package is available.

Here's some instructions on accessing the Fedora rawhide to get the latest
available packages.  Easiest is to download from koji.  Go to
http://koji.fedoraproject.org/koji/, search for glibc (in the search field),
select the most recent version, and then download the individual RPMs

Comment 6 Roland Westrelin 2007-12-21 14:38:51 UTC
I'll try the fix either with rawhide or the cvs libc and let you not if the
issue is gone.

Comment 7 Jakub Jelinek 2008-01-11 12:43:55 UTC
Backported in glibc-2.5-20.

Comment 8 RHEL Program Management 2008-01-11 12:45:11 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 9 Petr Muller 2008-01-11 13:42:28 UTC
qe_ack+ for rhel5.2
we'll need some testcase (or at least some testing hints), or verification from 
Sun (comment 6)

Comment 11 Roland Westrelin 2008-01-17 09:19:37 UTC
I've tried the 2.7 glibc from:
http://koji.fedoraproject.org/koji/buildinfo?buildID=31131

No more crashes. The bug is fixed as far as I can tell.

Comment 14 errata-xmlrpc 2008-05-21 16:52:52 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0083.html