Bug 797692 - papi 4.2.1: core dumps
Summary: papi 4.2.1: core dumps
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: papi
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: William Cohen
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-02-27 02:43 UTC by Jose Pedro Oliveira
Modified: 2012-03-12 13:24 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-03-12 13:24:19 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Jose Pedro Oliveira 2012-02-27 02:43:21 UTC
Description of problem:
The papi 4.2.1 in Fedora 16 core dumps.

Version-Release number of selected component (if applicable):
papi-4.2.1-1

How reproducible:
Always

Steps to Reproduce:
1. build papi-4.2.1-1.fc18 for f16
2. install it
3. run papi_avail or papi_native_avail
  
Actual results:
----------
$ papi_avail 
*** glibc detected *** papi_avail: free(): invalid next size (normal): 0x0000000006045490 ***
======= Backtrace: =========
/lib64/libc.so.6[0x3680a7dda6]
/lib64/libc.so.6[0x3680a7f08e]
/lib64/libc.so.6(fclose+0x155)[0x3680a6db45]
/usr/lib64/libpapi.so(_mx_init_substrate+0xa8)[0x368162a5a8]
/usr/lib64/libpapi.so(_papi_hwi_init_global+0x38)[0x368161a6b8]
/usr/lib64/libpapi.so(PAPI_library_init+0xad)[0x368161768d]
papi_avail[0x40156f]
/lib64/libc.so.6(__libc_start_main+0xed)[0x3680a2169d]
papi_avail[0x40213d]
======= Memory map: ========
...
----------


Additional info:
It doesn't core dump in RHEL6.2

Comment 1 Jose Pedro Oliveira 2012-02-27 16:35:55 UTC
Not building the lm_sensors component appears to fix the problem:

---with-components="coretemp example lmsensors lustre mx net"
+--with-components="coretemp example lustre mx net"

Comment 2 William Cohen 2012-02-27 17:28:25 UTC
I just tried to replicate this on a AMD family 10 machine running fedora 15 and Intel Pentium 4 machine running F16.  Neither triggered the problem. This problem could be very machine specific.

What processor and mother board are you getting this failure on?

Could you supply the papi_avail output before the failure? 

Could you install the papi-debuginfo rpm and glibc-debuginfo (debuginfo-install glibc) and see if can get a backtrace that provides the line numbers?


Alterative run papi_avail in gdb, set a break point in exit, run and print backtrace:

gdb /usr/bin/papi_avail
break exit
run
where

Comment 3 Jose Pedro Oliveira 2012-02-27 19:27:57 UTC
papi_avail output from a papi-4.2.1-1.x86_64 rpm built in mock without the lmsensors component (the only specfile change is the one mentioned in the second comment):
----------

# papi_avail 
Available events and hardware information.
--------------------------------------------------------------------------------
PAPI Version             : 4.2.1.0
Vendor string and code   : GenuineIntel (1)
Model string and code    : Intel(R) Core(TM) i7 CPU         870  @ 2.93GHz (30)
CPU Revision             : 5.000000
CPUID Info               : Family: 6  Model: 30  Stepping: 5
CPU Megahertz            : 1200.000000
CPU Clock Megahertz      : 1200
Hdw Threads per core     : 2
Cores per Socket         : 4
NUMA Nodes               : 1
CPUs per Node            : 8
Total CPUs               : 8
Number Hardware Counters : 7
Max Multiplex Counters   : 64
--------------------------------------------------------------------------------

    Name        Code    Avail Deriv Description (Note)
PAPI_L1_DCM  0x80000000  Yes   No   Level 1 data cache misses
PAPI_L1_ICM  0x80000001  Yes   No   Level 1 instruction cache misses
PAPI_L2_DCM  0x80000002  Yes   Yes  Level 2 data cache misses
PAPI_L2_ICM  0x80000003  Yes   No   Level 2 instruction cache misses
PAPI_L3_DCM  0x80000004  No    No   Level 3 data cache misses
PAPI_L3_ICM  0x80000005  No    No   Level 3 instruction cache misses
PAPI_L1_TCM  0x80000006  Yes   Yes  Level 1 cache misses
PAPI_L2_TCM  0x80000007  Yes   No   Level 2 cache misses
PAPI_L3_TCM  0x80000008  Yes   No   Level 3 cache misses
PAPI_CA_SNP  0x80000009  No    No   Requests for a snoop
PAPI_CA_SHR  0x8000000a  No    No   Requests for exclusive access to shared cache line
PAPI_CA_CLN  0x8000000b  No    No   Requests for exclusive access to clean cache line
PAPI_CA_INV  0x8000000c  No    No   Requests for cache line invalidation
PAPI_CA_ITV  0x8000000d  No    No   Requests for cache line intervention
PAPI_L3_LDM  0x8000000e  Yes   No   Level 3 load misses
PAPI_L3_STM  0x8000000f  No    No   Level 3 store misses
PAPI_BRU_IDL 0x80000010  No    No   Cycles branch units are idle
PAPI_FXU_IDL 0x80000011  No    No   Cycles integer units are idle
PAPI_FPU_IDL 0x80000012  No    No   Cycles floating point units are idle
PAPI_LSU_IDL 0x80000013  No    No   Cycles load/store units are idle
PAPI_TLB_DM  0x80000014  Yes   No   Data translation lookaside buffer misses
PAPI_TLB_IM  0x80000015  Yes   No   Instruction translation lookaside buffer misses
PAPI_TLB_TL  0x80000016  Yes   Yes  Total translation lookaside buffer misses
PAPI_L1_LDM  0x80000017  Yes   No   Level 1 load misses
PAPI_L1_STM  0x80000018  Yes   No   Level 1 store misses
PAPI_L2_LDM  0x80000019  Yes   No   Level 2 load misses
PAPI_L2_STM  0x8000001a  Yes   No   Level 2 store misses
PAPI_BTAC_M  0x8000001b  No    No   Branch target address cache misses
PAPI_PRF_DM  0x8000001c  No    No   Data prefetch cache misses
PAPI_L3_DCH  0x8000001d  No    No   Level 3 data cache hits
PAPI_TLB_SD  0x8000001e  No    No   Translation lookaside buffer shootdowns
PAPI_CSR_FAL 0x8000001f  No    No   Failed store conditional instructions
PAPI_CSR_SUC 0x80000020  No    No   Successful store conditional instructions
PAPI_CSR_TOT 0x80000021  No    No   Total store conditional instructions
PAPI_MEM_SCY 0x80000022  No    No   Cycles Stalled Waiting for memory accesses
PAPI_MEM_RCY 0x80000023  No    No   Cycles Stalled Waiting for memory Reads
PAPI_MEM_WCY 0x80000024  No    No   Cycles Stalled Waiting for memory writes
PAPI_STL_ICY 0x80000025  No    No   Cycles with no instruction issue
PAPI_FUL_ICY 0x80000026  No    No   Cycles with maximum instruction issue
PAPI_STL_CCY 0x80000027  No    No   Cycles with no instructions completed
PAPI_FUL_CCY 0x80000028  No    No   Cycles with maximum instructions completed
PAPI_HW_INT  0x80000029  No    No   Hardware interrupts
PAPI_BR_UCN  0x8000002a  Yes   No   Unconditional branch instructions
PAPI_BR_CN   0x8000002b  Yes   No   Conditional branch instructions
PAPI_BR_TKN  0x8000002c  Yes   No   Conditional branch instructions taken
PAPI_BR_NTK  0x8000002d  Yes   Yes  Conditional branch instructions not taken
PAPI_BR_MSP  0x8000002e  Yes   No   Conditional branch instructions mispredicted
PAPI_BR_PRC  0x8000002f  Yes   Yes  Conditional branch instructions correctly predicted
PAPI_FMA_INS 0x80000030  No    No   FMA instructions completed
PAPI_TOT_IIS 0x80000031  Yes   No   Instructions issued
PAPI_TOT_INS 0x80000032  Yes   No   Instructions completed
PAPI_INT_INS 0x80000033  No    No   Integer instructions
PAPI_FP_INS  0x80000034  Yes   No   Floating point instructions
PAPI_LD_INS  0x80000035  Yes   No   Load instructions
PAPI_SR_INS  0x80000036  Yes   No   Store instructions
PAPI_BR_INS  0x80000037  Yes   No   Branch instructions
PAPI_VEC_INS 0x80000038  No    No   Vector/SIMD instructions (could include integer)
PAPI_RES_STL 0x80000039  Yes   No   Cycles stalled on any resource
PAPI_FP_STAL 0x8000003a  No    No   Cycles the FP unit(s) are stalled
PAPI_TOT_CYC 0x8000003b  Yes   No   Total cycles
PAPI_LST_INS 0x8000003c  Yes   Yes  Load/store instructions completed
PAPI_SYC_INS 0x8000003d  No    No   Synchronization instructions completed
PAPI_L1_DCH  0x8000003e  Yes   Yes  Level 1 data cache hits
PAPI_L2_DCH  0x8000003f  Yes   Yes  Level 2 data cache hits
PAPI_L1_DCA  0x80000040  Yes   No   Level 1 data cache accesses
PAPI_L2_DCA  0x80000041  Yes   No   Level 2 data cache accesses
PAPI_L3_DCA  0x80000042  Yes   Yes  Level 3 data cache accesses
PAPI_L1_DCR  0x80000043  Yes   No   Level 1 data cache reads
PAPI_L2_DCR  0x80000044  Yes   No   Level 2 data cache reads
PAPI_L3_DCR  0x80000045  Yes   No   Level 3 data cache reads
PAPI_L1_DCW  0x80000046  Yes   No   Level 1 data cache writes
PAPI_L2_DCW  0x80000047  Yes   No   Level 2 data cache writes
PAPI_L3_DCW  0x80000048  Yes   No   Level 3 data cache writes
PAPI_L1_ICH  0x80000049  Yes   No   Level 1 instruction cache hits
PAPI_L2_ICH  0x8000004a  Yes   No   Level 2 instruction cache hits
PAPI_L3_ICH  0x8000004b  No    No   Level 3 instruction cache hits
PAPI_L1_ICA  0x8000004c  Yes   No   Level 1 instruction cache accesses
PAPI_L2_ICA  0x8000004d  Yes   No   Level 2 instruction cache accesses
PAPI_L3_ICA  0x8000004e  Yes   No   Level 3 instruction cache accesses
PAPI_L1_ICR  0x8000004f  Yes   No   Level 1 instruction cache reads
PAPI_L2_ICR  0x80000050  Yes   No   Level 2 instruction cache reads
PAPI_L3_ICR  0x80000051  Yes   No   Level 3 instruction cache reads
PAPI_L1_ICW  0x80000052  No    No   Level 1 instruction cache writes
PAPI_L2_ICW  0x80000053  No    No   Level 2 instruction cache writes
PAPI_L3_ICW  0x80000054  No    No   Level 3 instruction cache writes
PAPI_L1_TCH  0x80000055  No    No   Level 1 total cache hits
PAPI_L2_TCH  0x80000056  Yes   Yes  Level 2 total cache hits
PAPI_L3_TCH  0x80000057  No    No   Level 3 total cache hits
PAPI_L1_TCA  0x80000058  Yes   Yes  Level 1 total cache accesses
PAPI_L2_TCA  0x80000059  Yes   No   Level 2 total cache accesses
PAPI_L3_TCA  0x8000005a  Yes   No   Level 3 total cache accesses
PAPI_L1_TCR  0x8000005b  Yes   Yes  Level 1 total cache reads
PAPI_L2_TCR  0x8000005c  Yes   Yes  Level 2 total cache reads
PAPI_L3_TCR  0x8000005d  Yes   Yes  Level 3 total cache reads
PAPI_L1_TCW  0x8000005e  No    No   Level 1 total cache writes
PAPI_L2_TCW  0x8000005f  Yes   No   Level 2 total cache writes
PAPI_L3_TCW  0x80000060  Yes   No   Level 3 total cache writes
PAPI_FML_INS 0x80000061  No    No   Floating point multiply instructions
PAPI_FAD_INS 0x80000062  No    No   Floating point add instructions
PAPI_FDV_INS 0x80000063  No    No   Floating point divide instructions
PAPI_FSQ_INS 0x80000064  No    No   Floating point square root instructions
PAPI_FNV_INS 0x80000065  No    No   Floating point inverse instructions
PAPI_FP_OPS  0x80000066  Yes   Yes  Floating point operations
PAPI_SP_OPS  0x80000067  Yes   Yes  Floating point operations; optimized to count scaled single precision vector operations
PAPI_DP_OPS  0x80000068  Yes   Yes  Floating point operations; optimized to count scaled double precision vector operations
PAPI_VEC_SP  0x80000069  Yes   No   Single precision vector/SIMD instructions
PAPI_VEC_DP  0x8000006a  Yes   No   Double precision vector/SIMD instructions
-------------------------------------------------------------------------
Of 107 possible events, 63 are available, of which 17 are derived.

avail.c                                     PASSED

Comment 4 Jose Pedro Oliveira 2012-02-27 19:35:45 UTC
papi-4.2.1-1 built in mock (no specfile changes). 

Additional RPMs installed:

  * debuginfo-install lm_sensors-libs-3.3.1-1.fc16.x86_64

----------
# gdb /usr/bin/papi_avail 
GNU gdb (GDB) Fedora (7.3.50.20110722-10.fc16)
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/papi_avail...Reading symbols from /usr/lib/debug/usr/bin/papi_avail.debug...done.
done.


(gdb) break exit
Breakpoint 1 at 0x4013b0


(gdb) run
Starting program: /usr/bin/papi_avail 
Detaching after fork from child process 26950.
*** glibc detected *** /usr/bin/papi_avail: free(): invalid next size (normal): 0x000000000460d490 ***
======= Backtrace: =========
/lib64/libc.so.6[0x3680a7dda6]
/lib64/libc.so.6[0x3680a7f08e]
/lib64/libc.so.6(fclose+0x155)[0x3680a6db45]
/usr/lib64/libpapi.so(_mx_init_substrate+0xa8)[0x7ffff7daf5a8]
/usr/lib64/libpapi.so(_papi_hwi_init_global+0x38)[0x7ffff7d9f6b8]
/usr/lib64/libpapi.so(PAPI_library_init+0xad)[0x7ffff7d9c68d]
/usr/bin/papi_avail[0x40156f]
/lib64/libc.so.6(__libc_start_main+0xed)[0x3680a2169d]
/usr/bin/papi_avail[0x40213d]
======= Memory map: ========
00400000-00407000 r-xp 00000000 08:03 2624639                            /usr/bin/papi_avail
00606000-00607000 r--p 00006000 08:03 2624639                            /usr/bin/papi_avail
00607000-00608000 rw-p 00007000 08:03 2624639                            /usr/bin/papi_avail
00608000-04678000 rw-p 00000000 00:00 0                                  [heap]
3680600000-3680622000 r-xp 00000000 08:03 2097266                        /lib64/ld-2.14.90.so
3680821000-3680822000 r--p 00021000 08:03 2097266                        /lib64/ld-2.14.90.so
3680822000-3680823000 rw-p 00022000 08:03 2097266                        /lib64/ld-2.14.90.so
3680823000-3680824000 rw-p 00000000 00:00 0 
3680a00000-3680bad000 r-xp 00000000 08:03 2097277                        /lib64/libc-2.14.90.so
3680bad000-3680dad000 ---p 001ad000 08:03 2097277                        /lib64/libc-2.14.90.so
3680dad000-3680db1000 r--p 001ad000 08:03 2097277                        /lib64/libc-2.14.90.so
3680db1000-3680db3000 rw-p 001b1000 08:03 2097277                        /lib64/libc-2.14.90.so
3680db3000-3680db8000 rw-p 00000000 00:00 0 
3680e00000-3680e83000 r-xp 00000000 08:03 2097462                        /lib64/libm-2.14.90.so
3680e83000-3681082000 ---p 00083000 08:03 2097462                        /lib64/libm-2.14.90.so
3681082000-3681083000 r--p 00082000 08:03 2097462                        /lib64/libm-2.14.90.so
3681083000-3681084000 rw-p 00083000 08:03 2097462                        /lib64/libm-2.14.90.so
3681a00000-3681a15000 r-xp 00000000 08:03 2097469                        /lib64/libgcc_s-4.6.2-20111027.so.1
3681a15000-3681c14000 ---p 00015000 08:03 2097469                        /lib64/libgcc_s-4.6.2-20111027.so.1
3681c14000-3681c15000 rw-p 00014000 08:03 2097469                        /lib64/libgcc_s-4.6.2-20111027.so.1
3683a00000-3683a0e000 r-xp 00000000 08:03 2626130                        /usr/lib64/libsensors.so.4.3.1
3683a0e000-3683c0d000 ---p 0000e000 08:03 2626130                        /usr/lib64/libsensors.so.4.3.1
3683c0d000-3683c0e000 rw-p 0000d000 08:03 2626130                        /usr/lib64/libsensors.so.4.3.1
7ffff7ab5000-7ffff7ae1000 rw-p 00000000 00:00 0 
7ffff7ae1000-7ffff7b49000 r-xp 00000000 08:03 2624924                    /usr/lib64/libpfm.so.4.2.0
7ffff7b49000-7ffff7d49000 ---p 00068000 08:03 2624924                    /usr/lib64/libpfm.so.4.2.0
7ffff7d49000-7ffff7d83000 rw-p 00068000 08:03 2624924                    /usr/lib64/libpfm.so.4.2.0
7ffff7d83000-7ffff7d85000 rw-p 00000000 00:00 0 
7ffff7d85000-7ffff7dcd000 r-xp 00000000 08:03 2624905                    /usr/lib64/libpapi.so.4.2.1.0
7ffff7dcd000-7ffff7fcc000 ---p 00048000 08:03 2624905                    /usr/lib64/libpapi.so.4.2.1.0
7ffff7fcc000-7ffff7fce000 r--p 00047000 08:03 2624905                    /usr/lib64/libpapi.so.4.2.1.0
7ffff7fce000-7ffff7fd3000 rw-p 00049000 08:03 2624905                    /usr/lib64/libpapi.so.4.2.1.0
7ffff7fd3000-7ffff7fd8000 rw-p 00000000 00:00 0 
7ffff7ffc000-7ffff7ffe000 rw-p 00000000 00:00 0 
7ffff7ffe000-7ffff7fff000 r-xp 00000000 00:00 0                          [vdso]
7ffffffde000-7ffffffff000 rw-p 00000000 00:00 0                          [stack]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

Program received signal SIGABRT, Aborted.
0x0000003680a36285 in __GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64	  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);


(gdb) where
#0  0x0000003680a36285 in __GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x0000003680a37b9b in __GI_abort () at abort.c:91
#2  0x0000003680a77a7e in __libc_message (do_abort=2, fmt=0x3680b76678 "*** glibc detected *** %s: %s: 0x%s ***\n")
    at ../sysdeps/unix/sysv/linux/libc_fatal.c:198
#3  0x0000003680a7dda6 in malloc_printerr (action=3, str=0x3680b76840 "free(): invalid next size (normal)", ptr=<optimized out>)
    at malloc.c:5021
#4  0x0000003680a7f08e in _int_free (av=0x3680db1700, p=0x460d480, have_lock=0) at malloc.c:3942
#5  0x0000003680a6db45 in _IO_new_fclose (fp=0x460d490) at iofclose.c:88
#6  0x00007ffff7daf5a8 in _mx_init_substrate () at components/mx/linux-mx.c:234
#7  0x00007ffff7d9f6b8 in _papi_hwi_init_global () at papi_internal.c:1420
#8  0x00007ffff7d9c68d in PAPI_library_init (version=<optimized out>) at papi.c:601
#9  0x000000000040156f in main (argc=<optimized out>, argv=0x7fffffffe3e8) at avail.c:163
----------

Comment 5 Jose Pedro Oliveira 2012-02-27 19:41:54 UTC
(In reply to comment #2)

> 
> What processor and mother board are you getting this failure on?
> 

From the dmidecode output:

Base Board Information
    Manufacturer: Foxconn
    Product Name: H55MX-S Series
    Version: 1.1

BIOS Information
	Vendor: American Megatrends Inc.
	Version: 080015 
	Release Date: 08/09/2010

Processor Information
        Socket Designation: CPU 1
        Type: Central Processor
        Family: Core i7
        Manufacturer: Intel            
        Signature: Type 0, Family 6, Model 30, Stepping 5

Comment 6 William Cohen 2012-03-06 15:05:11 UTC
Ran through valgrind. It looks like there might be a access pass the end of the allocated array for linux-lmsensors.c:116.

$ valgrind papi_avail
==22559== Memcheck, a memory error detector
==22559== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==22559== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==22559== Command: papi_avail
==22559== 
==22559== Invalid write of size 4
==22559==    at 0x5263AD9: createNativeEvents (linux-lmsensors.c:116)
==22559==    by 0x5263C0D: LM_SENSORS_init_substrate (linux-lmsensors.c:191)
==22559==    by 0x5253AA7: _papi_hwi_init_global (papi_internal.c:1420)
==22559==    by 0x525177C: PAPI_library_init (papi.c:601)
==22559==    by 0x40180A: main (avail.c:163)
==22559==  Address 0x5dd1ad8 is 216 bytes inside a block of size 568 free'd
==22559==    at 0x500EC4A: free (vg_replace_malloc.c:427)
==22559==    by 0x305E8662DF: __fopen_internal (iofopen.c:98)
==22559==    by 0x305F004FEE: sensors_get_label (in /usr/lib64/libsensors.so.4.2.0)
==22559==    by 0x52639EF: createNativeEvents (linux-lmsensors.c:80)
==22559==    by 0x5263C0D: LM_SENSORS_init_substrate (linux-lmsensors.c:191)
==22559==    by 0x5253AA7: _papi_hwi_init_global (papi_internal.c:1420)
==22559==    by 0x525177C: PAPI_library_init (papi.c:601)
==22559==    by 0x40180A: main (avail.c:163)
==22559== 
Available events and hardware information.
--------------------------------------------------------------------------------
PAPI Version             : 4.2.1.0
Vendor string and code   : GenuineIntel (1)
Model string and code    : Intel(R) Core(TM) i7 CPU       M 620  @ 2.67GHz (37)
CPU Revision             : 2.000000
CPUID Info               : Family: 6  Model: 37  Stepping: 2
CPU Megahertz            : 1199.000000
CPU Clock Megahertz      : 1199
Hdw Threads per core     : 2
Cores per Socket         : 2
NUMA Nodes               : 1
CPUs per Node            : 4
Total CPUs               : 4
Number Hardware Counters : 7
Max Multiplex Counters   : 64
--------------------------------------------------------------------------------

    Name        Code    Avail Deriv Description (Note)
PAPI_L1_DCM  0x80000000  Yes   No   Level 1 data cache misses
PAPI_L1_ICM  0x80000001  Yes   No   Level 1 instruction cache misses
PAPI_L2_DCM  0x80000002  Yes   Yes  Level 2 data cache misses
PAPI_L2_ICM  0x80000003  Yes   No   Level 2 instruction cache misses
PAPI_L3_DCM  0x80000004  No    No   Level 3 data cache misses
PAPI_L3_ICM  0x80000005  No    No   Level 3 instruction cache misses
PAPI_L1_TCM  0x80000006  Yes   Yes  Level 1 cache misses
PAPI_L2_TCM  0x80000007  Yes   No   Level 2 cache misses
PAPI_L3_TCM  0x80000008  Yes   No   Level 3 cache misses
PAPI_CA_SNP  0x80000009  No    No   Requests for a snoop
PAPI_CA_SHR  0x8000000a  No    No   Requests for exclusive access to shared cache line
PAPI_CA_CLN  0x8000000b  No    No   Requests for exclusive access to clean cache line
PAPI_CA_INV  0x8000000c  No    No   Requests for cache line invalidation
PAPI_CA_ITV  0x8000000d  No    No   Requests for cache line intervention
PAPI_L3_LDM  0x8000000e  Yes   No   Level 3 load misses
PAPI_L3_STM  0x8000000f  No    No   Level 3 store misses
PAPI_BRU_IDL 0x80000010  No    No   Cycles branch units are idle
PAPI_FXU_IDL 0x80000011  No    No   Cycles integer units are idle
PAPI_FPU_IDL 0x80000012  No    No   Cycles floating point units are idle
PAPI_LSU_IDL 0x80000013  No    No   Cycles load/store units are idle
PAPI_TLB_DM  0x80000014  Yes   No   Data translation lookaside buffer misses
PAPI_TLB_IM  0x80000015  Yes   No   Instruction translation lookaside buffer misses
PAPI_TLB_TL  0x80000016  Yes   Yes  Total translation lookaside buffer misses
PAPI_L1_LDM  0x80000017  Yes   No   Level 1 load misses
PAPI_L1_STM  0x80000018  Yes   No   Level 1 store misses
PAPI_L2_LDM  0x80000019  Yes   No   Level 2 load misses
PAPI_L2_STM  0x8000001a  Yes   No   Level 2 store misses
PAPI_BTAC_M  0x8000001b  No    No   Branch target address cache misses
PAPI_PRF_DM  0x8000001c  No    No   Data prefetch cache misses
PAPI_L3_DCH  0x8000001d  No    No   Level 3 data cache hits
PAPI_TLB_SD  0x8000001e  No    No   Translation lookaside buffer shootdowns
PAPI_CSR_FAL 0x8000001f  No    No   Failed store conditional instructions
PAPI_CSR_SUC 0x80000020  No    No   Successful store conditional instructions
PAPI_CSR_TOT 0x80000021  No    No   Total store conditional instructions
PAPI_MEM_SCY 0x80000022  No    No   Cycles Stalled Waiting for memory accesses
PAPI_MEM_RCY 0x80000023  No    No   Cycles Stalled Waiting for memory Reads
PAPI_MEM_WCY 0x80000024  No    No   Cycles Stalled Waiting for memory writes
PAPI_STL_ICY 0x80000025  No    No   Cycles with no instruction issue
PAPI_FUL_ICY 0x80000026  No    No   Cycles with maximum instruction issue
PAPI_STL_CCY 0x80000027  No    No   Cycles with no instructions completed
PAPI_FUL_CCY 0x80000028  No    No   Cycles with maximum instructions completed
PAPI_HW_INT  0x80000029  No    No   Hardware interrupts
PAPI_BR_UCN  0x8000002a  Yes   No   Unconditional branch instructions
PAPI_BR_CN   0x8000002b  Yes   No   Conditional branch instructions
PAPI_BR_TKN  0x8000002c  Yes   No   Conditional branch instructions taken
PAPI_BR_NTK  0x8000002d  Yes   Yes  Conditional branch instructions not taken
PAPI_BR_MSP  0x8000002e  Yes   No   Conditional branch instructions mispredicted
PAPI_BR_PRC  0x8000002f  Yes   Yes  Conditional branch instructions correctly predicted
PAPI_FMA_INS 0x80000030  No    No   FMA instructions completed
PAPI_TOT_IIS 0x80000031  Yes   No   Instructions issued
PAPI_TOT_INS 0x80000032  Yes   No   Instructions completed
PAPI_INT_INS 0x80000033  No    No   Integer instructions
PAPI_FP_INS  0x80000034  Yes   No   Floating point instructions
PAPI_LD_INS  0x80000035  Yes   No   Load instructions
PAPI_SR_INS  0x80000036  Yes   No   Store instructions
PAPI_BR_INS  0x80000037  Yes   No   Branch instructions
PAPI_VEC_INS 0x80000038  No    No   Vector/SIMD instructions (could include integer)
PAPI_RES_STL 0x80000039  Yes   No   Cycles stalled on any resource
PAPI_FP_STAL 0x8000003a  No    No   Cycles the FP unit(s) are stalled
PAPI_TOT_CYC 0x8000003b  Yes   No   Total cycles
PAPI_LST_INS 0x8000003c  Yes   Yes  Load/store instructions completed
PAPI_SYC_INS 0x8000003d  No    No   Synchronization instructions completed
PAPI_L1_DCH  0x8000003e  No    No   Level 1 data cache hits
PAPI_L2_DCH  0x8000003f  Yes   Yes  Level 2 data cache hits
PAPI_L1_DCA  0x80000040  No    No   Level 1 data cache accesses
PAPI_L2_DCA  0x80000041  Yes   No   Level 2 data cache accesses
PAPI_L3_DCA  0x80000042  Yes   Yes  Level 3 data cache accesses
PAPI_L1_DCR  0x80000043  No    No   Level 1 data cache reads
PAPI_L2_DCR  0x80000044  Yes   No   Level 2 data cache reads
PAPI_L3_DCR  0x80000045  Yes   No   Level 3 data cache reads
PAPI_L1_DCW  0x80000046  No    No   Level 1 data cache writes
PAPI_L2_DCW  0x80000047  Yes   No   Level 2 data cache writes
PAPI_L3_DCW  0x80000048  Yes   No   Level 3 data cache writes
PAPI_L1_ICH  0x80000049  Yes   No   Level 1 instruction cache hits
PAPI_L2_ICH  0x8000004a  Yes   No   Level 2 instruction cache hits
PAPI_L3_ICH  0x8000004b  No    No   Level 3 instruction cache hits
PAPI_L1_ICA  0x8000004c  Yes   No   Level 1 instruction cache accesses
PAPI_L2_ICA  0x8000004d  Yes   No   Level 2 instruction cache accesses
PAPI_L3_ICA  0x8000004e  Yes   No   Level 3 instruction cache accesses
PAPI_L1_ICR  0x8000004f  Yes   No   Level 1 instruction cache reads
PAPI_L2_ICR  0x80000050  Yes   No   Level 2 instruction cache reads
PAPI_L3_ICR  0x80000051  Yes   No   Level 3 instruction cache reads
PAPI_L1_ICW  0x80000052  No    No   Level 1 instruction cache writes
PAPI_L2_ICW  0x80000053  No    No   Level 2 instruction cache writes
PAPI_L3_ICW  0x80000054  No    No   Level 3 instruction cache writes
PAPI_L1_TCH  0x80000055  No    No   Level 1 total cache hits
PAPI_L2_TCH  0x80000056  Yes   Yes  Level 2 total cache hits
PAPI_L3_TCH  0x80000057  No    No   Level 3 total cache hits
PAPI_L1_TCA  0x80000058  No    No   Level 1 total cache accesses
PAPI_L2_TCA  0x80000059  Yes   No   Level 2 total cache accesses
PAPI_L3_TCA  0x8000005a  Yes   No   Level 3 total cache accesses
PAPI_L1_TCR  0x8000005b  No    No   Level 1 total cache reads
PAPI_L2_TCR  0x8000005c  Yes   Yes  Level 2 total cache reads
PAPI_L3_TCR  0x8000005d  Yes   Yes  Level 3 total cache reads
PAPI_L1_TCW  0x8000005e  No    No   Level 1 total cache writes
PAPI_L2_TCW  0x8000005f  Yes   No   Level 2 total cache writes
PAPI_L3_TCW  0x80000060  Yes   No   Level 3 total cache writes
PAPI_FML_INS 0x80000061  No    No   Floating point multiply instructions
PAPI_FAD_INS 0x80000062  No    No   Floating point add instructions
PAPI_FDV_INS 0x80000063  No    No   Floating point divide instructions
PAPI_FSQ_INS 0x80000064  No    No   Floating point square root instructions
PAPI_FNV_INS 0x80000065  No    No   Floating point inverse instructions
PAPI_FP_OPS  0x80000066  Yes   Yes  Floating point operations
PAPI_SP_OPS  0x80000067  Yes   Yes  Floating point operations; optimized to count scaled single precision vector operations
PAPI_DP_OPS  0x80000068  Yes   Yes  Floating point operations; optimized to count scaled double precision vector operations
PAPI_VEC_SP  0x80000069  Yes   No   Single precision vector/SIMD instructions
PAPI_VEC_DP  0x8000006a  Yes   No   Double precision vector/SIMD instructions
-------------------------------------------------------------------------
Of 107 possible events, 57 are available, of which 14 are derived.

avail.c                                     PASSED
==22559== 
==22559== HEAP SUMMARY:
==22559==     in use at exit: 253,706 bytes in 893 blocks
==22559==   total heap usage: 7,217 allocs, 6,324 frees, 6,530,071 bytes allocated
==22559== 
==22559== LEAK SUMMARY:
==22559==    definitely lost: 38,720 bytes in 39 blocks
==22559==    indirectly lost: 0 bytes in 0 blocks
==22559==      possibly lost: 0 bytes in 0 blocks
==22559==    still reachable: 214,986 bytes in 854 blocks
==22559==         suppressed: 0 bytes in 0 blocks
==22559== Rerun with --leak-check=full to see details of leaked memory
==22559== 
==22559== For counts of detected and suppressed errors, rerun with: -v
==22559== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 6 from 6)

Comment 7 William Cohen 2012-03-07 01:51:40 UTC
In an effort to better understand what is going on I have made a scratch version of valgrind 3.7.1 (with cpuid support if needed) for fedora 16 to run papi_avail on and show if there are problems with the memory allocation in lmsensors code.

The fedora-16 scratch build of valgrind is at:

http://koji.fedoraproject.org/koji/taskinfo?taskID=3858127

Once installing valgrind-3.7.0-1.fc16.cpuid.x86_64.rpm from 
http://koji.fedoraproject.org/koji/taskinfo?taskID=3858128

Or valgrind-3.7.0-1.fc16.cpuid.i686.rpm  from
http://koji.fedoraproject.org/koji/taskinfo?taskID=3858129

You should be able to get some information about memory allocation problems with:

valgrind papi_avail

Comment 8 Jose Pedro Oliveira 2012-03-07 02:27:06 UTC
William,

Here goes the output:

$ rpm -q valgrind
valgrind-3.7.0-1.fc16.cpuid.x86_64

$ valgrind /usr/bin/papi_avail 
==31273== Memcheck, a memory error detector
==31273== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==31273== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==31273== Command: /usr/bin/papi_avail
==31273== 
==31273== Invalid write of size 4
==31273==    at 0x367DC29345: createNativeEvents (linux-lmsensors.c:116)
==31273==    by 0x367DC2945F: LM_SENSORS_init_substrate (linux-lmsensors.c:191)
==31273==    by 0x367DC1A6B7: _papi_hwi_init_global (papi_internal.c:1420)
==31273==    by 0x367DC1768C: PAPI_library_init (papi.c:601)
==31273==    by 0x40156E: main (avail.c:163)
==31273==  Address 0x5b01728 is 216 bytes inside a block of size 568 free'd
==31273==    at 0x520D8AE: free (vg_replace_malloc.c:427)
==31273==    by 0x3680A6E5CC: __fopen_internal (iofopen.c:98)
==31273==    by 0x3683A0388F: sensors_get_label (access.c:188)
==31273==    by 0x367DC29271: createNativeEvents (linux-lmsensors.c:80)
==31273==    by 0x367DC2945F: LM_SENSORS_init_substrate (linux-lmsensors.c:191)
==31273==    by 0x367DC1A6B7: _papi_hwi_init_global (papi_internal.c:1420)
==31273==    by 0x367DC1768C: PAPI_library_init (papi.c:601)
==31273==    by 0x40156E: main (avail.c:163)
==31273== 
Available events and hardware information.
--------------------------------------------------------------------------------
PAPI Version             : 4.2.1.0
Vendor string and code   : GenuineIntel (1)
Model string and code    : Intel(R) Core(TM) i7 CPU         870  @ 2.93GHz (30)
CPU Revision             : 5.000000
CPUID Info               : Family: 6  Model: 30  Stepping: 5
CPU Megahertz            : 1200.000000
CPU Clock Megahertz      : 1200
Hdw Threads per core     : 2
Cores per Socket         : 4
NUMA Nodes               : 1
CPUs per Node            : 8
Total CPUs               : 8
Number Hardware Counters : 7
Max Multiplex Counters   : 64
--------------------------------------------------------------------------------

    Name        Code    Avail Deriv Description (Note)
PAPI_L1_DCM  0x80000000  Yes   No   Level 1 data cache misses
PAPI_L1_ICM  0x80000001  Yes   No   Level 1 instruction cache misses
  .
  .
  .
PAPI_VEC_SP  0x80000069  Yes   No   Single precision vector/SIMD instructions
PAPI_VEC_DP  0x8000006a  Yes   No   Double precision vector/SIMD instructions
-------------------------------------------------------------------------
Of 107 possible events, 57 are available, of which 14 are derived.

avail.c                                     PASSED
==31273== 
==31273== HEAP SUMMARY:
==31273==     in use at exit: 252,620 bytes in 980 blocks
==31273==   total heap usage: 7,856 allocs, 6,876 frees, 6,814,488 bytes allocated
==31273== 
==31273== LEAK SUMMARY:
==31273==    definitely lost: 38,720 bytes in 39 blocks
==31273==    indirectly lost: 0 bytes in 0 blocks
==31273==      possibly lost: 0 bytes in 0 blocks
==31273==    still reachable: 213,900 bytes in 941 blocks
==31273==         suppressed: 0 bytes in 0 blocks
==31273== Rerun with --leak-check=full to see details of leaked memory
==31273== 
==31273== For counts of detected and suppressed errors, rerun with: -v
==31273== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 2 from 2)

Comment 9 Jose Pedro Oliveira 2012-03-07 03:30:49 UTC
$ rpm -q lm_sensors
lm_sensors-3.3.1-1.fc16.x86_6


Adding a couple of printfs I get a Radeon video driver event and then a buffer overrun:

$ valgrind papi_avail
==4079== Memcheck, a memory error detector
==4079== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==4079== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==4079== Command: papi_avail
==4079== 

LM_SENSORS_init_substrate: Number of LM_SENSORS events = 1

createNativeEvents: 000: radeon-pci-0100.temp1.temp1_input

==4079== Invalid write of size 4
==4079==    at 0x5265388: ??? (in /usr/lib64/libpapi.so.4.2.1.0)
==4079==    by 0x52654DE: LM_SENSORS_init_substrate (in /usr/lib64/libpapi.so.4.2.1.0)
==4079==    by 0x52566B7: _papi_hwi_init_global (in /usr/lib64/libpapi.so.4.2.1.0)
==4079==    by 0x525368C: PAPI_library_init (in /usr/lib64/libpapi.so.4.2.1.0)
==4079==    by 0x40156E: main (avail.c:163)
...



The problem appears to be in this piece of code of createNativeEvents():
...
            /* increment the table index counter */
            id++;                                          /* <-- PROBLEM */
         }

         lm_sensors_native_table[id].count = count + 1;    /* Crashes here */
...

Comment 10 Jose Pedro Oliveira 2012-03-07 03:52:19 UTC
The buffer overrun is also be detected by valgrind in RHEL6.2/SL6.2 (but the papi apps don't crash):

 * debuginfo-install papi
 * modprobe coretemp        # to add events detected by lm_sensors
 * valgrind papi_avail

--------
# valgrind papi_avail 
==1758== Memcheck, a memory error detector
==1758== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==1758== Using Valgrind-3.6.0 and LibVEX; rerun with -h for copyright info
==1758== Command: papi_avail
==1758== 
==1758== Invalid write of size 4
==1758==    at 0x3F2E628AD9: createNativeEvents (linux-lmsensors.c:116)
==1758==    by 0x3F2E628C0D: LM_SENSORS_init_substrate (linux-lmsensors.c:191)
==1758==    by 0x3F2E618AA7: _papi_hwi_init_global (papi_internal.c:1420)
==1758==    by 0x3F2E61677C: PAPI_library_init (papi.c:601)
==1758==    by 0x40180A: main (avail.c:163)
==1758==  Address 0x5746328 is 216 bytes inside a block of size 568 free'd
==1758==    at 0x520C95D: free (vg_replace_malloc.c:366)
==1758==    by 0x3D8746584C: fclose@@GLIBC_2.2.5 (iofclose.c:88)
==1758==    by 0x3F2EA05022: sensors_get_label (access.c:190)
==1758==    by 0x3F2E6289EF: createNativeEvents (linux-lmsensors.c:80)
==1758==    by 0x3F2E628C0D: LM_SENSORS_init_substrate (linux-lmsensors.c:191)
==1758==    by 0x3F2E618AA7: _papi_hwi_init_global (papi_internal.c:1420)
==1758==    by 0x3F2E61677C: PAPI_library_init (papi.c:601)
==1758==    by 0x40180A: main (avail.c:163)
==1758== 
....
-----

Comment 11 Jose Pedro Oliveira 2012-03-09 19:42:39 UTC
commit 0526b12537d187bee8dac734026578d5f2b9035e
Author: Vince Weaver <vweaver1.edu>
Date:   Fri Mar 9 14:41:14 2012 -0500

    Fix buffer overrun in lmsensors component

Comment 12 William Cohen 2012-03-09 21:44:10 UTC
The patch has been applied and a new RPM built, papi-4.2.1-2. Could you verify this fixes the problem.

http://koji.fedoraproject.org/koji/taskinfo?taskID=3875511

Comment 13 Jose Pedro Oliveira 2012-03-09 23:33:09 UTC
(In reply to comment #12)
> The patch has been applied and a new RPM built, papi-4.2.1-2. Could you verify
> this fixes the problem.
> 
> http://koji.fedoraproject.org/koji/taskinfo?taskID=3875511

It no longer crashes in F16 x86_64 and valgrind no longer detects invalid writes in F16 x86_64 and in Sl6.2 x86_64.

/jpo

Comment 14 William Cohen 2012-03-12 13:24:19 UTC
According the Comment 13 this is fixed with the patch from upstream.


Note You need to log in before you can comment on or make changes to this bug.