Bug 457508 - realloc doesn't fully copy the old data
Summary: realloc doesn't fully copy the old data
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: 9
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: Denys Vlasenko
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 478499
TreeView+ depends on / blocked
 
Reported: 2008-08-01 08:23 UTC by Need Real Name
Modified: 2009-09-12 07:48 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-07-14 17:54:38 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Testcase that *should* have reproduced this if the alleged failure mode is real (1.58 KB, text/x-csrc)
2008-08-14 14:14 UTC, Denys Vlasenko
no flags Details
Patch which is expected to fix this bug. (449 bytes, patch)
2008-08-14 16:44 UTC, Denys Vlasenko
no flags Details | Diff
Instrumentation patch (prints sizes of malloc/free) (1.88 KB, patch)
2008-09-22 11:33 UTC, Peter Markovski
no flags Details | Diff
Additional patch fixing two more instances of the same problem (735 bytes, patch)
2009-09-11 16:41 UTC, Stefan Ring
no flags Details | Diff

Description Need Real Name 2008-08-01 08:23:56 UTC
Description of problem:
In rare cases the glibc realloc doesn't copy all the old data (it misses 8
bytes). Before the call to realloc the allocated memory has the 'other arena'
bit set. After the call this bit isn't set anymore, ie it returns a memory area
from the normal arena.  

Version-Release number of selected component (if applicable):
2.7 and 2.8 (ie f8 and f9), possibly all other versions of glibc.

How reproducible:
So its hard to reproduce. It requires an alloc to come from one of the other
arena's and then when the realloc happens it enough should be freed such that
the new memory area will come again from the normal arena. 


Actual results:
Last 'size_t' isn't equal to the last size_t from before the realloc call.

Expected results:
realloc should copy also this last size_t.

Additional info:
Looking at the code of public_rEALLOc, the case where the initial alloc using
the arena of the old pointer fails it falls back to 'malloc/copy'. In this copy
the number of bytes copied is 'chunk size' - 2* SIZE_T. I expect that should be
only one size_t. I think only incase of mmaped allocated chuncks 2*SIZE_T should
be substracted.

Comment 1 Denys Vlasenko 2008-08-13 15:50:38 UTC
Trying to reproduce...

Comment 2 Denys Vlasenko 2008-08-14 14:14:48 UTC
Created attachment 314319 [details]
Testcase that *should* have reproduced this if the alleged failure mode is real

This testcase allocates lots of 128-$smallnum blocks until they reach a mmaped hole. Thus we have a out-of-arena block now.

Allocate one more above it. Free all allocated blocks except these two.
Do realloc. Print addresses and contents before and after.

It doesn't show the corruption.

If you want to instrument malloc, the easiest way is to download standalone ptmalloc2 (instead of one in libc):

http://www.malloc.de/malloc/ptmalloc2-current.tar.gz

build it with "make nothread", and link testcase against it:

gcc -O2 testcase.c libmalloc.a


You will need a tiny tweak to get ptmalloc2 compile:

diff -urp ptmalloc2z/malloc.c ptmalloc2/malloc.c
--- ptmalloc2z/malloc.c 2004-11-04 18:31:04.000000000 +0100
+++ ptmalloc2/malloc.c  2008-08-14 15:44:02.000000000 +0200
@@ -1624,6 +1624,8 @@ do {

 #include <fcntl.h>
 #ifndef LACKS_SYS_MMAN_H
+  #define _GNU_SOURCE
+#include <linux/mman.h>
 #include <sys/mman.h>
 #endif





However. The testcase does not reveal any corruption with glibc or with sepaately built ptmalloc2.

> Looking at the code of public_rEALLOc, the case where the initial alloc using
the arena of the old pointer fails it falls back to 'malloc/copy'. In this copy
the number of bytes copied is 'chunk size' - 2* SIZE_T. I expect that should be
only one size_t

I see no such code in public_rEALLOc (it does such malloc+copy+free only if chunk_is_mmapped()).

There is something similar in _int_realloc(), but as I see it, if malloc fails to allocate new chunk in the same arena, it just returns NULL:

        newmem = _int_malloc(av, nb - MALLOC_ALIGN_MASK);
        if (newmem == 0)
          return 0; /* propagate failure */

Additionally, I greped for MALLOC_COPY and all three instances seem to be cxopying corrent amount of data (at first glance).

Comment 3 Denys Vlasenko 2008-08-14 14:24:06 UTC
After a closer look, glibc's public_rEALLOc is a bit different.

While latest ptmalloc2 handles non-mmaped reallocations like this:

  newp = _int_realloc(ar_ptr, oldmem, bytes);

  (void)mutex_unlock(&ar_ptr->mutex);
  assert(!newp || chunk_is_mmapped(mem2chunk(newp)) ||
         ar_ptr == arena_for_chunk(mem2chunk(newp)));
  return newp;
}



glibc does this:


  newp = _int_realloc(ar_ptr, oldmem, bytes);

  (void)mutex_unlock(&ar_ptr->mutex);
  assert(!newp || chunk_is_mmapped(mem2chunk(newp)) ||
         ar_ptr == arena_for_chunk(mem2chunk(newp)));

  if (newp == NULL)
    {
      /* Try harder to allocate memory in other arenas.  */
      newp = public_mALLOc(bytes);
      if (newp != NULL)
        {
          MALLOC_COPY (newp, oldmem, oldsize - 2 * SIZE_SZ);
#if THREAD_STATS
          if(!mutex_trylock(&ar_ptr->mutex))
            ++(ar_ptr->stat_lock_direct);
          else {
            (void)mutex_lock(&ar_ptr->mutex);
            ++(ar_ptr->stat_lock_wait);
          }
#else
          (void)mutex_lock(&ar_ptr->mutex);
#endif
          _int_free(ar_ptr, oldmem);
          (void)mutex_unlock(&ar_ptr->mutex);
        }
    }

  return newp;
}


And indeed " - 2 * SIZE_SZ" looks wrong. Need to find a way to reach this code path in order to reproduce. Looking into way to force _int_realloc to return NULL.

Comment 4 Denys Vlasenko 2008-08-14 16:44:53 UTC
Created attachment 314330 [details]
Patch which is expected to fix this bug.

Not verified to work, currently I see no way to reach this code path.

Comment 5 Denys Vlasenko 2008-08-14 16:51:48 UTC
Niels, can you help to produce a program which exhibits the bug? As far as I see, 


newp = _int_realloc(ar_ptr, oldmem, bytes);

can be NULL only if allocation genuinely fails due to OOM. I looked into internals and _int_realloc falls back to int_malloc, which in turn falls back to sYSMALLOc, which fails only if there is no memory (both brk() and mmap() can't provide enough).

On the other hand, I have no reason to think that you just dreamed it up. You obviously saw it for real.

If you can't make an isolated testcase but you do have a way to reproduce it locally, can you verify locally that attached patch works?

Comment 6 Need Real Name 2008-08-14 17:13:37 UTC
I had the same problem reproducing the bug in an isolated (small test case) environment. I'll try the patch using our software.

Comment 7 Need Real Name 2008-08-15 15:09:40 UTC
The patch indeed helps. Not sure what the special conditions of our multi threaded application are besides using lots of malloc-ed memory.

Comment 8 Denys Vlasenko 2008-08-21 14:18:44 UTC
Submitted the patch to libc-alpha.com

"[PATCH] fix a case of realloc not copying entire block (RH bug 457508)"

Comment 9 Ulrich Drepper 2008-09-17 14:43:42 UTC
No, the patch is wrong.  You would copy the first word of the next block as well.

Apply this patch to glibc:

--- malloc/malloc.c.~1.189.~    2008-09-16 12:23:55.000000000 -0700
+++ malloc/malloc.c     2008-09-17 07:21:18.000000000 -0700
@@ -3705,7 +3705,8 @@ public_rEALLOc(Void_t* oldmem, size_t by
   tsd_setspecific(arena_key, (Void_t *)ar_ptr);
 #endif
 
-  newp = _int_realloc(ar_ptr, oldmem, bytes);
+  //  newp = _int_realloc(ar_ptr, oldmem, bytes);
+  newp = NULL;
 
   (void)mutex_unlock(&ar_ptr->mutex);
   assert(!newp || chunk_is_mmapped(mem2chunk(newp)) ||
@@ -3717,6 +3718,7 @@ public_rEALLOc(Void_t* oldmem, size_t by
       newp = public_mALLOc(bytes);
       if (newp != NULL)
 	{
+ printf("realloc: copy [%p,%p)\n",oldmem,(char*)oldmem+oldsize-2* SIZE_SZ);
 	  MALLOC_COPY (newp, oldmem, oldsize - 2 * SIZE_SZ);
 #if THREAD_STATS
 	  if(!mutex_trylock(&ar_ptr->mutex))


Then run this test program:

#include <stdlib.h>
#include <stdio.h>

#define INTERNAL_SIZE_T size_t
#define SIZE_SZ sizeof(size_t)

typedef struct malloc_chunk {

  INTERNAL_SIZE_T      prev_size;  /* Size of previous chunk (if free).  */
  INTERNAL_SIZE_T      size;       /* Size in bytes, including overhead. */

  struct malloc_chunk* fd;         /* double links -- used only if free. */
  struct malloc_chunk* bk;

  /* Only used for large blocks: pointer to next larger size.  */
  struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */
  struct malloc_chunk* bk_nextsize;
} *mchunkptr;

#define mem2chunk(mem) ((mchunkptr)((char*)(mem) - 2*SIZE_SZ))
#define PREV_INUSE 0x1
#define IS_MMAPPED 0x2
#define NON_MAIN_ARENA 0x4
#define SIZE_BITS (PREV_INUSE|IS_MMAPPED|NON_MAIN_ARENA)
#define chunksize(p)         ((p)->size & ~(SIZE_BITS))


int
main(void)
{
  int s;
  for (s = 16; s < 512; s += 8)
    {
      void *p1 = malloc (s);
      void *p2 = malloc (s);
      void *p3 = malloc (s);
      printf ("%3d: p1=%p/[%p,%p) p2=%p/[%p,%p) p3=%p/[%p,%p)\n",
	      s,
	      p1, mem2chunk(p1), (char*)mem2chunk(p1)+chunksize(mem2chunk(p1)),
	      p2, mem2chunk(p2), (char*)mem2chunk(p2)+chunksize(mem2chunk(p2)),
	      p3, mem2chunk(p3), (char*)mem2chunk(p3)+chunksize(mem2chunk(p3)));
      if (p1 == NULL || p2 == NULL || p3 == NULL)
	abort ();
      void *p4 = realloc (p2, s + 16);
      printf ("%3d: p4=%p/[%p,%p)\n", s,
	      p4, mem2chunk(p4), (char*)mem2chunk(p4)+chunksize(mem2chunk(p4)));
      free (p1);
      free (p4);
      free (p3);
    }
  return 0;
}


On an x86-64 machine the output will look something like this:

 16: p1=0x2030010/[0x2030000,0x2030020) p2=0x2030030/[0x2030020,0x2030040) p3=0x2030050/[0x2030040,0x2030060)
realloc: copy [0x2030030,0x2030040)
 16: p4=0x2030070/[0x2030060,0x2030090)
 24: p1=0x2030050/[0x2030040,0x2030060) p2=0x2030010/[0x2030000,0x2030020) p3=0x2030030/[0x2030020,0x2030040)
realloc: copy [0x2030010,0x2030020)
 24: p4=0x2030070/[0x2030060,0x2030090)
 32: p1=0x2030070/[0x2030060,0x2030090) p2=0x20300a0/[0x2030090,0x20300c0) p3=0x20300d0/[0x20300c0,0x20300f0)
realloc: copy [0x20300a0,0x20300c0)
 32: p4=0x2030100/[0x20300f0,0x2030130)
 40: p1=0x20300d0/[0x20300c0,0x20300f0) p2=0x2030070/[0x2030060,0x2030090) p3=0x20300a0/[0x2030090,0x20300c0)
realloc: copy [0x2030070,0x2030090)
 40: p4=0x2030100/[0x20300f0,0x2030130)
 48: p1=0x2030100/[0x20300f0,0x2030130) p2=0x2030140/[0x2030130,0x2030170) p3=0x2030180/[0x2030170,0x20301b0)
realloc: copy [0x2030140,0x2030170)
 48: p4=0x20301c0/[0x20301b0,0x2030200)
 56: p1=0x2030180/[0x2030170,0x20301b0) p2=0x2030100/[0x20300f0,0x2030130) p3=0x2030140/[0x2030130,0x2030170)
realloc: copy [0x2030100,0x2030130)
 56: p4=0x20301c0/[0x20301b0,0x2030200)
 64: p1=0x20301c0/[0x20301b0,0x2030200) p2=0x2030210/[0x2030200,0x2030250) p3=0x2030260/[0x2030250,0x20302a0)
realloc: copy [0x2030210,0x2030250)
 64: p4=0x20302b0/[0x20302a0,0x2030300)
 72: p1=0x2030260/[0x2030250,0x20302a0) p2=0x2030010/[0x2030000,0x2030050) p3=0x2030060/[0x2030050,0x20300a0)
realloc: copy [0x2030010,0x2030050)
 72: p4=0x20300b0/[0x20300a0,0x2030100)
 80: p1=0x20300b0/[0x20300a0,0x2030100) p2=0x2030110/[0x2030100,0x2030160) p3=0x2030170/[0x2030160,0x20301c0)
realloc: copy [0x2030110,0x2030160)
 80: p4=0x20301d0/[0x20301c0,0x2030230)
 88: p1=0x20300b0/[0x20300a0,0x2030100) p2=0x2030110/[0x2030100,0x2030160) p3=0x2030170/[0x2030160,0x20301c0)
realloc: copy [0x2030110,0x2030160)
 88: p4=0x20301d0/[0x20301c0,0x2030230)
 96: p1=0x20300b0/[0x20300a0,0x2030110) p2=0x2030120/[0x2030110,0x2030180) p3=0x2030190/[0x2030180,0x20301f0)
realloc: copy [0x2030120,0x2030180)
 96: p4=0x20302b0/[0x20302a0,0x2030320)
104: p1=0x2030010/[0x2030000,0x2030070) p2=0x2030080/[0x2030070,0x20300e0) p3=0x20300f0/[0x20300e0,0x2030150)
realloc: copy [0x2030080,0x20300e0)
104: p4=0x2030160/[0x2030150,0x20301d0)
112: p1=0x2030010/[0x2030000,0x2030080) p2=0x2030090/[0x2030080,0x2030100) p3=0x2030110/[0x2030100,0x2030180)
realloc: copy [0x2030090,0x2030100)
112: p4=0x2030190/[0x2030180,0x2030210)
120: p1=0x2030010/[0x2030000,0x2030080) p2=0x2030090/[0x2030080,0x2030100) p3=0x2030110/[0x2030100,0x2030180)
realloc: copy [0x2030090,0x2030100)
120: p4=0x2030190/[0x2030180,0x2030210)
128: p1=0x2030010/[0x2030000,0x2030090) p2=0x20300a0/[0x2030090,0x2030120) p3=0x2030130/[0x2030120,0x20301b0)
realloc: copy [0x20300a0,0x2030120)
128: p4=0x20301c0/[0x20301b0,0x2030250)
136: p1=0x2030010/[0x2030000,0x2030090) p2=0x20300a0/[0x2030090,0x2030120) p3=0x2030130/[0x2030120,0x20301b0)
realloc: copy [0x20300a0,0x2030120)
136: p4=0x20301c0/[0x20301b0,0x2030250)
144: p1=0x2030010/[0x2030000,0x20300a0) p2=0x20300b0/[0x20300a0,0x2030140) p3=0x2030150/[0x2030140,0x20301e0)
realloc: copy [0x20300b0,0x2030140)
144: p4=0x20301f0/[0x20301e0,0x2030290)
152: p1=0x2030010/[0x2030000,0x20300a0) p2=0x20300b0/[0x20300a0,0x2030140) p3=0x2030150/[0x2030140,0x20301e0)
realloc: copy [0x20300b0,0x2030140)
152: p4=0x20301f0/[0x20301e0,0x2030290)
160: p1=0x2030010/[0x2030000,0x20300b0) p2=0x20300c0/[0x20300b0,0x2030160) p3=0x2030170/[0x2030160,0x2030210)
realloc: copy [0x20300c0,0x2030160)
160: p4=0x2030220/[0x2030210,0x20302d0)
168: p1=0x2030010/[0x2030000,0x20300b0) p2=0x20300c0/[0x20300b0,0x2030160) p3=0x2030170/[0x2030160,0x2030210)
realloc: copy [0x20300c0,0x2030160)
168: p4=0x2030220/[0x2030210,0x20302d0)
176: p1=0x2030010/[0x2030000,0x20300c0) p2=0x20300d0/[0x20300c0,0x2030180) p3=0x2030190/[0x2030180,0x2030240)
realloc: copy [0x20300d0,0x2030180)
176: p4=0x2030250/[0x2030240,0x2030310)
184: p1=0x2030010/[0x2030000,0x20300c0) p2=0x20300d0/[0x20300c0,0x2030180) p3=0x2030190/[0x2030180,0x2030240)
realloc: copy [0x20300d0,0x2030180)
184: p4=0x2030250/[0x2030240,0x2030310)
192: p1=0x2030010/[0x2030000,0x20300d0) p2=0x20300e0/[0x20300d0,0x20301a0) p3=0x20301b0/[0x20301a0,0x2030270)
realloc: copy [0x20300e0,0x20301a0)
192: p4=0x2030280/[0x2030270,0x2030350)
200: p1=0x2030010/[0x2030000,0x20300d0) p2=0x20300e0/[0x20300d0,0x20301a0) p3=0x20301b0/[0x20301a0,0x2030270)
realloc: copy [0x20300e0,0x20301a0)
200: p4=0x2030280/[0x2030270,0x2030350)
208: p1=0x2030010/[0x2030000,0x20300e0) p2=0x20300f0/[0x20300e0,0x20301c0) p3=0x20301d0/[0x20301c0,0x20302a0)
realloc: copy [0x20300f0,0x20301c0)
208: p4=0x20302b0/[0x20302a0,0x2030390)
216: p1=0x2030010/[0x2030000,0x20300e0) p2=0x20300f0/[0x20300e0,0x20301c0) p3=0x20301d0/[0x20301c0,0x20302a0)
realloc: copy [0x20300f0,0x20301c0)
216: p4=0x20302b0/[0x20302a0,0x2030390)
224: p1=0x2030010/[0x2030000,0x20300f0) p2=0x2030100/[0x20300f0,0x20301e0) p3=0x20301f0/[0x20301e0,0x20302d0)
realloc: copy [0x2030100,0x20301e0)
224: p4=0x20302e0/[0x20302d0,0x20303d0)
232: p1=0x2030010/[0x2030000,0x20300f0) p2=0x2030100/[0x20300f0,0x20301e0) p3=0x20301f0/[0x20301e0,0x20302d0)
realloc: copy [0x2030100,0x20301e0)
232: p4=0x20302e0/[0x20302d0,0x20303d0)
240: p1=0x2030010/[0x2030000,0x2030100) p2=0x2030110/[0x2030100,0x2030200) p3=0x2030210/[0x2030200,0x2030300)
realloc: copy [0x2030110,0x2030200)
240: p4=0x2030310/[0x2030300,0x2030410)
248: p1=0x2030010/[0x2030000,0x2030100) p2=0x2030110/[0x2030100,0x2030200) p3=0x2030210/[0x2030200,0x2030300)
realloc: copy [0x2030110,0x2030200)
248: p4=0x2030310/[0x2030300,0x2030410)
256: p1=0x2030010/[0x2030000,0x2030110) p2=0x2030120/[0x2030110,0x2030220) p3=0x2030230/[0x2030220,0x2030330)
realloc: copy [0x2030120,0x2030220)
256: p4=0x2030340/[0x2030330,0x2030450)
264: p1=0x2030010/[0x2030000,0x2030110) p2=0x2030120/[0x2030110,0x2030220) p3=0x2030230/[0x2030220,0x2030330)
realloc: copy [0x2030120,0x2030220)
264: p4=0x2030340/[0x2030330,0x2030450)
272: p1=0x2030010/[0x2030000,0x2030120) p2=0x2030130/[0x2030120,0x2030240) p3=0x2030250/[0x2030240,0x2030360)
realloc: copy [0x2030130,0x2030240)
272: p4=0x2030370/[0x2030360,0x2030490)
280: p1=0x2030010/[0x2030000,0x2030120) p2=0x2030130/[0x2030120,0x2030240) p3=0x2030250/[0x2030240,0x2030360)
realloc: copy [0x2030130,0x2030240)
280: p4=0x2030370/[0x2030360,0x2030490)
288: p1=0x2030010/[0x2030000,0x2030130) p2=0x2030140/[0x2030130,0x2030260) p3=0x2030270/[0x2030260,0x2030390)
realloc: copy [0x2030140,0x2030260)
288: p4=0x20303a0/[0x2030390,0x20304d0)
296: p1=0x2030010/[0x2030000,0x2030130) p2=0x2030140/[0x2030130,0x2030260) p3=0x2030270/[0x2030260,0x2030390)
realloc: copy [0x2030140,0x2030260)
296: p4=0x20303a0/[0x2030390,0x20304d0)
304: p1=0x2030010/[0x2030000,0x2030140) p2=0x2030150/[0x2030140,0x2030280) p3=0x2030290/[0x2030280,0x20303c0)
realloc: copy [0x2030150,0x2030280)
304: p4=0x20303d0/[0x20303c0,0x2030510)
312: p1=0x2030010/[0x2030000,0x2030140) p2=0x2030150/[0x2030140,0x2030280) p3=0x2030290/[0x2030280,0x20303c0)
realloc: copy [0x2030150,0x2030280)
312: p4=0x20303d0/[0x20303c0,0x2030510)
320: p1=0x2030010/[0x2030000,0x2030150) p2=0x2030160/[0x2030150,0x20302a0) p3=0x20302b0/[0x20302a0,0x20303f0)
realloc: copy [0x2030160,0x20302a0)
320: p4=0x2030400/[0x20303f0,0x2030550)
328: p1=0x2030010/[0x2030000,0x2030150) p2=0x2030160/[0x2030150,0x20302a0) p3=0x20302b0/[0x20302a0,0x20303f0)
realloc: copy [0x2030160,0x20302a0)
328: p4=0x2030400/[0x20303f0,0x2030550)
336: p1=0x2030010/[0x2030000,0x2030160) p2=0x2030170/[0x2030160,0x20302c0) p3=0x20302d0/[0x20302c0,0x2030420)
realloc: copy [0x2030170,0x20302c0)
336: p4=0x2030430/[0x2030420,0x2030590)
344: p1=0x2030010/[0x2030000,0x2030160) p2=0x2030170/[0x2030160,0x20302c0) p3=0x20302d0/[0x20302c0,0x2030420)
realloc: copy [0x2030170,0x20302c0)
344: p4=0x2030430/[0x2030420,0x2030590)
352: p1=0x2030010/[0x2030000,0x2030170) p2=0x2030180/[0x2030170,0x20302e0) p3=0x20302f0/[0x20302e0,0x2030450)
realloc: copy [0x2030180,0x20302e0)
352: p4=0x2030460/[0x2030450,0x20305d0)
360: p1=0x2030010/[0x2030000,0x2030170) p2=0x2030180/[0x2030170,0x20302e0) p3=0x20302f0/[0x20302e0,0x2030450)
realloc: copy [0x2030180,0x20302e0)
360: p4=0x2030460/[0x2030450,0x20305d0)
368: p1=0x2030010/[0x2030000,0x2030180) p2=0x2030190/[0x2030180,0x2030300) p3=0x2030310/[0x2030300,0x2030480)
realloc: copy [0x2030190,0x2030300)
368: p4=0x2030490/[0x2030480,0x2030610)
376: p1=0x2030010/[0x2030000,0x2030180) p2=0x2030190/[0x2030180,0x2030300) p3=0x2030310/[0x2030300,0x2030480)
realloc: copy [0x2030190,0x2030300)
376: p4=0x2030490/[0x2030480,0x2030610)
384: p1=0x2030010/[0x2030000,0x2030190) p2=0x20301a0/[0x2030190,0x2030320) p3=0x2030330/[0x2030320,0x20304b0)
realloc: copy [0x20301a0,0x2030320)
384: p4=0x20304c0/[0x20304b0,0x2030650)
392: p1=0x2030010/[0x2030000,0x2030190) p2=0x20301a0/[0x2030190,0x2030320) p3=0x2030330/[0x2030320,0x20304b0)
realloc: copy [0x20301a0,0x2030320)
392: p4=0x20304c0/[0x20304b0,0x2030650)
400: p1=0x2030010/[0x2030000,0x20301a0) p2=0x20301b0/[0x20301a0,0x2030340) p3=0x2030350/[0x2030340,0x20304e0)
realloc: copy [0x20301b0,0x2030340)
400: p4=0x20304f0/[0x20304e0,0x2030690)
408: p1=0x2030010/[0x2030000,0x20301a0) p2=0x20301b0/[0x20301a0,0x2030340) p3=0x2030350/[0x2030340,0x20304e0)
realloc: copy [0x20301b0,0x2030340)
408: p4=0x20304f0/[0x20304e0,0x2030690)
416: p1=0x2030010/[0x2030000,0x20301b0) p2=0x20301c0/[0x20301b0,0x2030360) p3=0x2030370/[0x2030360,0x2030510)
realloc: copy [0x20301c0,0x2030360)
416: p4=0x2030520/[0x2030510,0x20306d0)
424: p1=0x2030010/[0x2030000,0x20301b0) p2=0x20301c0/[0x20301b0,0x2030360) p3=0x2030370/[0x2030360,0x2030510)
realloc: copy [0x20301c0,0x2030360)
424: p4=0x2030520/[0x2030510,0x20306d0)
432: p1=0x2030010/[0x2030000,0x20301c0) p2=0x20301d0/[0x20301c0,0x2030380) p3=0x2030390/[0x2030380,0x2030540)
realloc: copy [0x20301d0,0x2030380)
432: p4=0x2030550/[0x2030540,0x2030710)
440: p1=0x2030010/[0x2030000,0x20301c0) p2=0x20301d0/[0x20301c0,0x2030380) p3=0x2030390/[0x2030380,0x2030540)
realloc: copy [0x20301d0,0x2030380)
440: p4=0x2030550/[0x2030540,0x2030710)
448: p1=0x2030010/[0x2030000,0x20301d0) p2=0x20301e0/[0x20301d0,0x20303a0) p3=0x20303b0/[0x20303a0,0x2030570)
realloc: copy [0x20301e0,0x20303a0)
448: p4=0x2030580/[0x2030570,0x2030750)
456: p1=0x2030010/[0x2030000,0x20301d0) p2=0x20301e0/[0x20301d0,0x20303a0) p3=0x20303b0/[0x20303a0,0x2030570)
realloc: copy [0x20301e0,0x20303a0)
456: p4=0x2030580/[0x2030570,0x2030750)
464: p1=0x2030010/[0x2030000,0x20301e0) p2=0x20301f0/[0x20301e0,0x20303c0) p3=0x20303d0/[0x20303c0,0x20305a0)
realloc: copy [0x20301f0,0x20303c0)
464: p4=0x20305b0/[0x20305a0,0x2030790)
472: p1=0x2030010/[0x2030000,0x20301e0) p2=0x20301f0/[0x20301e0,0x20303c0) p3=0x20303d0/[0x20303c0,0x20305a0)
realloc: copy [0x20301f0,0x20303c0)
472: p4=0x20305b0/[0x20305a0,0x2030790)
480: p1=0x2030010/[0x2030000,0x20301f0) p2=0x2030200/[0x20301f0,0x20303e0) p3=0x20303f0/[0x20303e0,0x20305d0)
realloc: copy [0x2030200,0x20303e0)
480: p4=0x20305e0/[0x20305d0,0x20307d0)
488: p1=0x2030010/[0x2030000,0x20301f0) p2=0x2030200/[0x20301f0,0x20303e0) p3=0x20303f0/[0x20303e0,0x20305d0)
realloc: copy [0x2030200,0x20303e0)
488: p4=0x20305e0/[0x20305d0,0x20307d0)
496: p1=0x2030010/[0x2030000,0x2030200) p2=0x2030210/[0x2030200,0x2030400) p3=0x2030410/[0x2030400,0x2030600)
realloc: copy [0x2030210,0x2030400)
496: p4=0x2030610/[0x2030600,0x2030810)
504: p1=0x2030010/[0x2030000,0x2030200) p2=0x2030210/[0x2030200,0x2030400) p3=0x2030410/[0x2030400,0x2030600)
realloc: copy [0x2030210,0x2030400)
504: p4=0x2030610/[0x2030600,0x2030810)


Now imagine the MALLOC_COPY would copy 8 more bytes.  This would always copy the first word (next_size) of the next malloc_chunk header.  The "realloc" lines would show the second number to be 8 bytes higher, hence overlapping.


If you run into problems this means you overrun the buffer and then expect realloc to compensate for this bug and copy the overflowed bytes as well.

Comment 10 Wolfram Gloger 2008-09-19 14:23:19 UTC
>      void *p1 = malloc (s);
...
>      printf ("%3d: p1=%p/[%p,%p) p2=%p/[%p,%p) p3=%p/[%p,%p)\n",
>       s,
>       p1, mem2chunk(p1), (char*)mem2chunk(p1)+chunksize(mem2chunk(p1)),

That correctly prints the extent of the chunks as malloc sees them internally.

However, these chunks are all handed out to the user, so they can never be "free". The "prev_size" field of the next chunk _is_ therefore available to the user and must be copied in case of a realloc().  See also the comment in:

>typedef struct malloc_chunk {
>
>  INTERNAL_SIZE_T      prev_size;  /* Size of previous chunk (if free).  */


It would be more appropriate for your test program to print

p1, p1+chunksize(mem2chunk(p1))-(is_mmaped(mem2chunk(p1)) ? 2*SIZE_SZ : SIZE_SZ)

as this is the range that is actually available to the user.

Yes this is confusing, but keeping the prev_size field out of "struct malloc_chunk" would be even more ugly.

Please re-open this bug and apply the patch.
It is quite obviously correct also from source inspection.

Comment 11 Ulrich Drepper 2008-09-20 03:01:47 UTC
(In reply to comment #10)
> It would be more appropriate for your test program to print
> 
> p1, p1+chunksize(mem2chunk(p1))-(is_mmaped(mem2chunk(p1)) ? 2*SIZE_SZ :
> SIZE_SZ)
> 
> as this is the range that is actually available to the user.
> 
> Yes this is confusing, but keeping the prev_size field out of "struct
> malloc_chunk" would be even more ugly.
> 
> Please re-open this bug and apply the patch.
> It is quite obviously correct also from source inspection.

No, it's not.

Look at the program, it tests all sizes.  We never allocate blocks so that they use the prev_size field.  es, it might have been designed like this, but we don't do it.

Hence, any code which depends on the prev_size field to be copied overruns its buffer.

Now, if we would get a patch which changes the implementation to take advantage of the extra field this would change the situation.  It obviously would require that the prev-size field is copied.

Comment 12 Wolfram Gloger 2008-09-20 08:59:53 UTC
>Look at the program, it tests all sizes.  We never allocate blocks so that they
>use the prev_size field.  es, it might have been designed like this, but we
>don't do it.

I'm perplexed by this comment.

It is true that the glibc malloc has changed significantly from ptmalloc2, which is _all fine_ and I may have neglected to properly track these changes, for which I apologize.

But not using the prev_size field in allocated chunks is (or rather would be) a _very_ significant change (for better error-detection, I suppose).  For one thing, doesn't this change the overhead per chunk to 2*SIZE_SZ rather than just SIZE_SZ?
I have just re-looked again at current cvs libc/malloc.c, and I still cannot see this reflected in request2size() or in malloc_usable_size().  They still compute with an overhead of one SIZE_SZ per normal chunk (corresponding to the _always_ present "size" field).

So: sorry, I do not understand your objection.

Comment 13 Ulrich Drepper 2008-09-20 09:18:47 UTC
(In reply to comment #12)
> So: sorry, I do not understand your objection.

The objection is that we do not seem to use the block this way and copying the prev_size field would mask buffer overruns.

The change not to use the field certainly wasn't a deliberate decision.  I haven't researched the issue at all.  But it is clear that to _require_ the realloc copy change additional changes are needed.  I'm not sure when I'll have time to look at it.  Changing this behavior certainly would have advantages.  When this is done it won't have anything to do with this BZ.

Comment 14 Peter Markovski 2008-09-22 11:32:37 UTC
Try running this program:

#include <stdlib.h>
int main() {
        int i;
        for (i = 10; i < 100; i++)
                free(malloc(i));
        return 0;
}

with the attached instrumentation patch to glibc 2.8.0.
Among other lines, I see:

_int_malloc: bytes:24 nb:32 SIZE_SZ:8
_int_free: chunksize:32
...
_int_malloc: bytes:40 nb:48 SIZE_SZ:8
_int_free: chunksize:48
...
_int_malloc: bytes:56 nb:64 SIZE_SZ:8
_int_free: chunksize:64

which clearly shows that glibc does use prev_size at the end of allocated block as part of user accessible storage.

Have it not be doing that, the overhead would be two SIZE_SZ's (16 bytes on my 64-bit machine), not one as seen here.

Comment 15 Peter Markovski 2008-09-22 11:33:31 UTC
Created attachment 317350 [details]
Instrumentation patch (prints sizes of malloc/free)

Comment 16 Ulrich Drepper 2008-11-03 08:47:05 UTC
There are situations when the problem can appear.  I've changed the code upstream.  Needs to be moved into all RHEL* releases as well.  Someone needs to clone the bug, if necessary.

Comment 17 Bug Zapper 2009-06-10 02:20:37 UTC
This message is a reminder that Fedora 9 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 9.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '9'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 9's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 9 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 18 Bug Zapper 2009-07-14 17:54:38 UTC
Fedora 9 changed to end-of-life (EOL) status on 2009-07-10. Fedora 9 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 19 Stefan Ring 2009-09-11 16:40:36 UTC
I believe we are seeing the effect of this bug on our customer's production servers. We run a C++ application server which embeds Python and actually does most of the processing in Python. It usually consumes ~4-8 CPU hours per day, and heap usage is ~10-15 GB most of the time. Every once in a while (about every 2 weeks), this thing crashes out of the blue with exactly the same symptom every time: a single bogus pointer somewhere inside a very long (>1 million entries) Python list (a list in Python is really an array of pointers). I've been going mad trying to reproduce this (unsuccessfully), and I pray to god that this fix will make it go away.

After a considerable amount of time staring at malloc.c, I have convinced myself that the patch is actually correct. What really pushed me over the edge of belief is this line:

          copysize = oldsize - SIZE_SZ;

Note how it's -SIZE_SZ, not -2*SIZE_SZ.

Your fix does not go far enough, though. From what I've seen, there are two more problematic places (everywhere MALLOC_COPY is used).

Comment 20 Stefan Ring 2009-09-11 16:41:12 UTC
Created attachment 360703 [details]
Additional patch fixing two more instances of the same problem

Comment 21 Stefan Ring 2009-09-11 21:13:27 UTC
Interestingly, the code in question has been like this since the beginning of time (git hash f65fd747b440ae2d8a7481ecc50e668c5e4d0cc9 from 8 Dec 1996). I also went to see what Doug Lea's malloc looks like. Apparently, the value it uses depends on the type of the memory block: mmapped vs. non-mmapped (MMAP_CHUNK_OVERHEAD / CHUNK_OVERHEAD). I suppose, this distinction is not there for ptmalloc2.

Comment 22 Wolfram Gloger 2009-09-11 22:40:03 UTC
The code starting with:

       /* Try harder to allocate memory in other arenas.  */
...

which is changed by Denys' patch was added _long_ after 1996.
The actual cvs commit was malloc.c:1.166, AFAICS.

The distinction between mmapped and non-mmapped chunks certainly
_still applies_, please see

http://sourceware.org/ml/libc-alpha/2008-08/msg00026.html

where I already pointed out that the change is not adequate for
the other source lines.

In summary, Denys' patch is correct and sufficient to fix this bug.

Comment 23 Stefan Ring 2009-09-12 07:48:30 UTC
The suspicion that it might indeed not be needed in the other cases has come to me also, in my sleep ;).


Note You need to log in before you can comment on or make changes to this bug.