Bug 230220

Summary: glibc error in lmbench
Product: [Retired] Red Hat Hardware Certification Program Reporter: George Beshers <gbeshers>
Component: Test Suite (tests)Assignee: Greg Nichols <gnichols>
Status: CLOSED ERRATA QA Contact:
Severity: medium Docs Contact:
Priority: high    
Version: 5CC: gnichols, jh, martinez, niwa.hideyuki, quan.gan, wei, wwlinuxengineering
Target Milestone: ---   
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2007-0733 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-08-01 18:46:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 223165    
Attachments:
Description Flags
patch to avoid recursive free none

Description George Beshers 2007-02-27 16:01:34 UTC
Description of problem:
  When running the HW certification test on a machine with 2Tbytes (yes,
  2 terrabytes) of memory I got double free or corruption erros from
  glibc.  

  NOTE: the large allocations 387740m == 388Gbytes of memory
  and 795481m == 776Gb.

  This was also seen on the remote access 128p 256Gbyte machine, but
  I don't know the actual size that bw_mem was run with.

  Also, 16384m == 16Bytes worked fine.

====================================================================

bw_mem 397740m rd
*** glibc detected *** bw_mem: double free or corruption (out):
0x40000000000124a0 ***
======= Backtrace: =========
/lib/libc.so.6.1[0x20000000001f6190]
/lib/libc.so.6.1(cfree+0x1ad780)[0x20000000001f6ba0]
bw_mem[0x40000000000056a0]
bw_mem[0x400000000000de80]
[0xa0000000000107e0]
[0xa000000000010621]
/lib/libc.so.6.1(munmap+0x27d010)[0x20000000002c6460]
/lib/libc.so.6.1(cfree+0x1ad6e0)[0x20000000001f6b00]
bw_mem[0x40000000000056a0]
bw_mem[0x400000000000ebf0]
bw_mem[0x400000000000f100]
bw_mem[0x4000000000010440]
bw_mem[0x4000000000005f10]
/lib/libc.so.6.1(__libc_start_main+0xfe250)[0x20000000001476b0]
bw_mem[0x4000000000001740]
======= Memory map: ========
00000000-00004000 r--p 00000000 00:00 0
2000000000000000-2000000000038000 r-xp 00000000 08:0a 4606758           
/lib/ld-2.5.so
2000000000044000-2000000000050000 rw-p 00034000 08:0a 4606758           
/lib/ld-2.5.so
2000000000050000-2000000000114000 r-xp 00000000 08:0a 4606773           
/lib/libm-2.5.so
2000000000114000-2000000000120000 ---p 000c4000 08:0a 4606773           
/lib/libm-2.5.so
2000000000120000-2000000000124000 rw-p 000c0000 08:0a 4606773           
/lib/libm-2.5.so
2000000000124000-2000000000388000 r-xp 00000000 08:0a 4606765           
/lib/libc-2.5.so
2000000000388000-2000000000394000 ---p 00264000 08:0a 4606765           
/lib/libc-2.5.so
2000000000394000-20000000003a0000 rw-p 00260000 08:0a 4606765           
/lib/libc-2.5.so
20000000003a0000-20000000003b8000 rw-p 20000000003a0000 00:00 0
20000000003c0000-20000000003dc000 r-xp 00000000 08:0a 4606754           
/lib/libgcc_s-4.1.1-20070105.so.1
20000000003dc000-20000000003e8000 ---p 0001c000 08:0a 4606754           
/lib/libgcc_s-4.1.1-20070105.so.1
20000000003e8000-20000000003ec000 rw-p 00018000 08:0a 4606754           
/lib/libgcc_s-4.1.1-20070105.so.1
20000000003ec000-20000000003fc000 rw-p 20000000003ec000 00:00 0
2000000004000000-2000000004024000 rw-p 2000000004000000 00:00 0
2000000004024000-2000000008000000 ---p 2000000004024000 00:00 0
4000000000000000-4000000000014000 r-xp 00000000 08:0a 88723             
/usr/bin/bw_mem
6000000000000000-6000000000004000 rw-p 00010000 08:0a 88723             
/usr/bin/bw_mem
6000000000004000-600000000002c000 rw-p 6000000000004000 00:00 0          [heap]
60000fff7fffc000-60000fff80000000 rw-p 60000fff7fffc000 00:00 0
60000ffffe9a8000-60000ffffe9fc000 rw-p 60000ffffe9a8000 00:00 0          [stack]
a000000000000000-a000000000020000 ---p 00000000 00:00 0                  [vdso]
417060.62 139.18
7130.81user 1306.86system 2:20:43elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+25455584minor)pagefaults 0swaps
        done 06:32:07 372762244
bw_mem 795481m rd
*** glibc detected *** bw_mem: double free or corruption (out):
0x40000000000124a0 ***
======= Backtrace: =========
/lib/libc.so.6.1[0x20000000001f6190]
/lib/libc.so.6.1(cfree+0x1ad780)[0x20000000001f6ba0]
bw_mem[0x40000000000056a0]
bw_mem[0x400000000000de80]
[0xa0000000000107e0]
[0xa000000000010621]
/lib/libc.so.6.1(munmap+0x27d010)[0x20000000002c6460]
/lib/libc.so.6.1(cfree+0x1ad6e0)[0x20000000001f6b00]
bw_mem[0x40000000000056a0]
bw_mem[0x400000000000ebf0]
bw_mem[0x400000000000f100]
bw_mem[0x4000000000010440]
bw_mem[0x4000000000005f10]
/lib/libc.so.6.1(__libc_start_main+0xfe250)[0x20000000001476b0]
bw_mem[0x4000000000001740]
======= Memory map: ========
00000000-00004000 r--p 00000000 00:00 0
2000000000000000-2000000000038000 r-xp 00000000 08:0a 4606758           
/lib/ld-2.5.so
2000000000044000-2000000000050000 rw-p 00034000 08:0a 4606758           
/lib/ld-2.5.so
2000000000050000-2000000000114000 r-xp 00000000 08:0a 4606773           
/lib/libm-2.5.so
2000000000114000-2000000000120000 ---p 000c4000 08:0a 4606773           
/lib/libm-2.5.so
2000000000120000-2000000000124000 rw-p 000c0000 08:0a 4606773           
/lib/libm-2.5.so
2000000000124000-2000000000388000 r-xp 00000000 08:0a 4606765           
/lib/libc-2.5.so
2000000000388000-2000000000394000 ---p 00264000 08:0a 4606765           
/lib/libc-2.5.so
2000000000394000-20000000003a0000 rw-p 00260000 08:0a 4606765           
/lib/libc-2.5.so
20000000003a0000-20000000003b8000 rw-p 20000000003a0000 00:00 0
20000000003c0000-20000000003dc000 r-xp 00000000 08:0a 4606754           
/lib/libgcc_s-4.1.1-20070105.so.1
20000000003dc000-20000000003e8000 ---p 0001c000 08:0a 4606754           
/lib/libgcc_s-4.1.1-20070105.so.1
20000000003e8000-20000000003ec000 rw-p 00018000 08:0a 4606754           
/lib/libgcc_s-4.1.1-20070105.so.1
20000000003ec000-20000000003fc000 rw-p 20000000003ec000 00:00 0
2000000004000000-2000000004024000 rw-p 2000000004000000 00:00 0
2000000004024000-2000000008000000 ---p 2000000004024000 00:00 0
4000000000000000-4000000000014000 r-xp 00000000 08:0a 88723             
/usr/bin/bw_mem
6000000000000000-6000000000004000 rw-p 00010000 08:0a 88723             
/usr/bin/bw_mem
6000000000004000-600000000002c000 rw-p 6000000000004000 00:00 0          [heap]
60000fff7fffc000-60000fff80000000 rw-p 60000fff7fffc000 00:00 0
60000fffff3d4000-60000fffff428000 rw-p 60000fffff3d4000 00:00 0          [stack]
a000000000000000-a000000000020000 ---p 00000000 00:00 0                  [vdso]
834122.29 120.19
16021.20user 3011.74system 5:17:24elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+50911008minor)pagefaults 0swaps
        done 11:49:32 042153161



Version-Release number of selected component (if applicable):
  1,25


How reproducible:
  routinely

Steps to Reproduce:
1. 'bw_mem 397740m rd
2.
3.
  
Actual results:


Expected results:


Additional info:
  I will try to identify the "breaking point" on SkyNet this evening.

  This does "block" our ability to certify this system.

Comment 1 George Beshers 2007-02-28 13:27:12 UTC
For what its worth this does not appear to be happening on a power of 2 boundary.

[root@skynet1 ~]# bw_mem -N1 131071m wr
*** glibc detected *** bw_mem: double free or corruption (out):
0x40000000000124a0 ***
======= Backtrace: =========
/lib/libc.so.6.1[0x20000000001f6190]
/lib/libc.so.6.1(cfree+0x1ad780)[0x20000000001f6ba0]
bw_mem[0x40000000000056a0]
bw_mem[0x400000000000de80]
[0xa0000000000107e0]
[0xa000000000010621]
/lib/libc.so.6.1(munmap+0x27d010)[0x20000000002c6460]
/lib/libc.so.6.1(cfree+0x1ad6e0)[0x20000000001f6b00]
bw_mem[0x40000000000056a0]
bw_mem[0x400000000000ebf0]
bw_mem[0x400000000000f100]
bw_mem[0x4000000000010440]
bw_mem[0x4000000000006000]
/lib/libc.so.6.1(__libc_start_main+0xfe250)[0x20000000001476b0]
bw_mem[0x4000000000001740]
======= Memory map: ========
00000000-00004000 r--p 00000000 00:00 0
2000000000000000-2000000000038000 r-xp 00000000 08:0a 4606758           
/lib/ld-2.5.so
2000000000044000-2000000000050000 rw-p 00034000 08:0a 4606758           
/lib/ld-2.5.so
2000000000050000-2000000000114000 r-xp 00000000 08:0a 4606773           
/lib/libm-2.5.so
2000000000114000-2000000000120000 ---p 000c4000 08:0a 4606773           
/lib/libm-2.5.so
2000000000120000-2000000000124000 rw-p 000c0000 08:0a 4606773           
/lib/libm-2.5.so
2000000000124000-2000000000388000 r-xp 00000000 08:0a 4606765           
/lib/libc-2.5.so
2000000000388000-2000000000394000 ---p 00264000 08:0a 4606765           
/lib/libc-2.5.so
2000000000394000-20000000003a0000 rw-p 00260000 08:0a 4606765           
/lib/libc-2.5.so
20000000003a0000-20000000003b8000 rw-p 20000000003a0000 00:00 0
20000000003c0000-20000000003dc000 r-xp 00000000 08:0a 4606754           
/lib/libgcc_s-4.1.1-20070105.so.1
20000000003dc000-20000000003e8000 ---p 0001c000 08:0a 4606754           
/lib/libgcc_s-4.1.1-20070105.so.1
20000000003e8000-20000000003ec000 rw-p 00018000 08:0a 4606754           
/lib/libgcc_s-4.1.1-20070105.so.1
20000000003ec000-20000000003fc000 rw-p 20000000003ec000 00:00 0
2000000004000000-2000000004024000 rw-p 2000000004000000 00:00 0
2000000004024000-2000000008000000 ---p 2000000004024000 00:00 0
4000000000000000-4000000000014000 r-xp 00000000 08:0a 88723             
/usr/bin/bw_mem
6000000000000000-6000000000004000 rw-p 00010000 08:0a 88723             
/usr/bin/bw_mem
6000000000004000-600000000002c000 rw-p 6000000000004000 00:00 0          [heap]
60000fff7fffc000-60000fff80000000 rw-p 60000fff7fffc000 00:00 0
60000fffff3e8000-60000fffff43c000 rw-p 60000fffff3e8000 00:00 0          [stack]
a000000000000000-a000000000020000 ---p 00000000 00:00 0                  [vdso]
137437.90 696.71
[root@skynet1 ~]#
[root@skynet1 ~]# bw_mem -N1 65537m wr
68720.53 786.88
[root@skynet1 ~]#                                                          

Comment 2 Greg Nichols 2007-03-07 14:58:29 UTC
*** Bug 227327 has been marked as a duplicate of this bug. ***

Comment 3 Jakub Jelinek 2007-03-14 14:19:49 UTC
Why are you assigning this to glibc?  Most probably it is a bug in bw_mem.

Comment 4 George Beshers 2007-03-14 15:40:11 UTC
No it is in glibc as I can reproduce it with any
malloc greater than 700Gbytes (the G is not a typo).

I am working on a patch.


Comment 5 George Beshers 2007-03-15 15:49:48 UTC
Well, it appears that I spoke too soon, although the behavior changed
slightly when I compiled with checks turned on.

I think something is trashing glibc's private memory slightly and
then it goes and trashes things further.  I have not backtraced
the original corruption yet.

Anyone have code for the debugging hooks which checks malloc's
data structures for consistency?

Comment 6 Jakub Jelinek 2007-03-15 15:55:47 UTC
You can try MALLOC_CHECK_=3, mtrace, ElectricFence or valgrind.

Comment 7 George Beshers 2007-04-10 17:48:33 UTC
Created attachment 152163 [details]
patch to avoid recursive free

Comment 8 YangKun 2007-04-18 07:45:50 UTC
New lmbench package is built(added the above patch). You can get it from:
    http://porkchop.devel.redhat.com/brewroot/packages/lmbench/3.0a7/6.EL5/

Please verify. Thanks

Comment 9 Marizol Martinez 2007-06-11 15:49:48 UTC
SGI (George) will verify once he gets access to 2TB.

Comment 11 Red Hat Bugzilla 2007-08-01 18:46:29 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0733.html


Comment 12 quangan 2009-06-09 07:50:22 UTC
closed