Bug 1262015 - papi_avail segfaults (invalid free) when lustre component runs
papi_avail segfaults (invalid free) when lustre component runs
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: papi (Show other bugs)
6.7
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: William Cohen
qe-baseos-tools
:
Depends On:
Blocks: 1271375
  Show dependency treegraph
 
Reported: 2015-09-10 11:25 EDT by Dave Love
Modified: 2015-12-09 13:06 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-12-09 13:06:55 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
output (354 bytes, text/plain)
2015-09-11 05:11 EDT, Dave Love
no flags Details
revised output (16.06 KB, text/plain)
2015-09-14 07:17 EDT, Dave Love
no flags Details
Upstream patch to address segementation faults with lustre component (6.07 KB, text/plain)
2015-09-14 09:39 EDT, William Cohen
no flags Details

  None (edit)
Description Dave Love 2015-09-10 11:25:32 EDT
Description of problem:

papi_avail (with no arguments) segfaults consistently, after printing what I think is the complete output.  This is on sandybridge, in case that makes a difference.  Setting MALLOC_CHECK_=1 shows a load of invalid frees.

The one from a self-built papi-5.4 doesn't do so.

Version-Release number of selected component (if applicable):
papi-5.1.1-11.el6.x86_64
Comment 2 William Cohen 2015-09-10 11:49:23 EDT
Could you include the output from the run that failed.  papi_avail ran without segfaults on a westmere machine:

$ uname -a
Linux cannondale 2.6.32-573.3.1.el6.x86_64 #1 SMP Mon Aug 10 09:44:54 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux

$ MALLOC_CHECK_=1 papi_avail
Available events and hardware information.
--------------------------------------------------------------------------------
PAPI Version             : 5.1.1.0
Vendor string and code   : GenuineIntel (1)
Model string and code    : Intel(R) Core(TM) i7 CPU       M 620  @ 2.67GHz (37)
CPU Revision             : 2.000000
CPUID Info               : Family: 6  Model: 37  Stepping: 2
CPU Max Megahertz        : 2667
CPU Min Megahertz        : 1199
Hdw Threads per core     : 2
Cores per Socket         : 2
NUMA Nodes               : 1
CPUs per Node            : 4
Total CPUs               : 4
Running in a VM          : no
Number Hardware Counters : 7
Max Multiplex Counters   : 64
--------------------------------------------------------------------------------

    Name        Code    Avail Deriv Description (Note)
PAPI_L1_DCM  0x80000000  Yes   No   Level 1 data cache misses
PAPI_L1_ICM  0x80000001  Yes   No   Level 1 instruction cache misses
PAPI_L2_DCM  0x80000002  Yes   Yes  Level 2 data cache misses
PAPI_L2_ICM  0x80000003  Yes   No   Level 2 instruction cache misses
PAPI_L3_DCM  0x80000004  No    No   Level 3 data cache misses
PAPI_L3_ICM  0x80000005  No    No   Level 3 instruction cache misses
PAPI_L1_TCM  0x80000006  Yes   Yes  Level 1 cache misses
PAPI_L2_TCM  0x80000007  Yes   No   Level 2 cache misses
PAPI_L3_TCM  0x80000008  Yes   No   Level 3 cache misses
PAPI_CA_SNP  0x80000009  No    No   Requests for a snoop
PAPI_CA_SHR  0x8000000a  No    No   Requests for exclusive access to shared cache line
PAPI_CA_CLN  0x8000000b  No    No   Requests for exclusive access to clean cache line
PAPI_CA_INV  0x8000000c  No    No   Requests for cache line invalidation
PAPI_CA_ITV  0x8000000d  No    No   Requests for cache line intervention
PAPI_L3_LDM  0x8000000e  Yes   No   Level 3 load misses
PAPI_L3_STM  0x8000000f  No    No   Level 3 store misses
PAPI_BRU_IDL 0x80000010  No    No   Cycles branch units are idle
PAPI_FXU_IDL 0x80000011  No    No   Cycles integer units are idle
PAPI_FPU_IDL 0x80000012  No    No   Cycles floating point units are idle
PAPI_LSU_IDL 0x80000013  No    No   Cycles load/store units are idle
PAPI_TLB_DM  0x80000014  Yes   No   Data translation lookaside buffer misses
PAPI_TLB_IM  0x80000015  Yes   No   Instruction translation lookaside buffer misses
PAPI_TLB_TL  0x80000016  Yes   Yes  Total translation lookaside buffer misses
PAPI_L1_LDM  0x80000017  Yes   No   Level 1 load misses
PAPI_L1_STM  0x80000018  Yes   No   Level 1 store misses
PAPI_L2_LDM  0x80000019  Yes   No   Level 2 load misses
PAPI_L2_STM  0x8000001a  Yes   No   Level 2 store misses
PAPI_BTAC_M  0x8000001b  No    No   Branch target address cache misses
PAPI_PRF_DM  0x8000001c  No    No   Data prefetch cache misses
PAPI_L3_DCH  0x8000001d  No    No   Level 3 data cache hits
PAPI_TLB_SD  0x8000001e  No    No   Translation lookaside buffer shootdowns
PAPI_CSR_FAL 0x8000001f  No    No   Failed store conditional instructions
PAPI_CSR_SUC 0x80000020  No    No   Successful store conditional instructions
PAPI_CSR_TOT 0x80000021  No    No   Total store conditional instructions
PAPI_MEM_SCY 0x80000022  No    No   Cycles Stalled Waiting for memory accesses
PAPI_MEM_RCY 0x80000023  No    No   Cycles Stalled Waiting for memory Reads
PAPI_MEM_WCY 0x80000024  No    No   Cycles Stalled Waiting for memory writes
PAPI_STL_ICY 0x80000025  No    No   Cycles with no instruction issue
PAPI_FUL_ICY 0x80000026  No    No   Cycles with maximum instruction issue
PAPI_STL_CCY 0x80000027  No    No   Cycles with no instructions completed
PAPI_FUL_CCY 0x80000028  No    No   Cycles with maximum instructions completed
PAPI_HW_INT  0x80000029  No    No   Hardware interrupts
PAPI_BR_UCN  0x8000002a  Yes   No   Unconditional branch instructions
PAPI_BR_CN   0x8000002b  Yes   No   Conditional branch instructions
PAPI_BR_TKN  0x8000002c  Yes   No   Conditional branch instructions taken
PAPI_BR_NTK  0x8000002d  Yes   Yes  Conditional branch instructions not taken
PAPI_BR_MSP  0x8000002e  Yes   No   Conditional branch instructions mispredicted
PAPI_BR_PRC  0x8000002f  Yes   Yes  Conditional branch instructions correctly predicted
PAPI_FMA_INS 0x80000030  No    No   FMA instructions completed
PAPI_TOT_IIS 0x80000031  Yes   No   Instructions issued
PAPI_TOT_INS 0x80000032  Yes   No   Instructions completed
PAPI_INT_INS 0x80000033  No    No   Integer instructions
PAPI_FP_INS  0x80000034  Yes   No   Floating point instructions
PAPI_LD_INS  0x80000035  Yes   No   Load instructions
PAPI_SR_INS  0x80000036  Yes   No   Store instructions
PAPI_BR_INS  0x80000037  Yes   No   Branch instructions
PAPI_VEC_INS 0x80000038  No    No   Vector/SIMD instructions (could include integer)
PAPI_RES_STL 0x80000039  Yes   No   Cycles stalled on any resource
PAPI_FP_STAL 0x8000003a  No    No   Cycles the FP unit(s) are stalled
PAPI_TOT_CYC 0x8000003b  Yes   No   Total cycles
PAPI_LST_INS 0x8000003c  Yes   Yes  Load/store instructions completed
PAPI_SYC_INS 0x8000003d  No    No   Synchronization instructions completed
PAPI_L1_DCH  0x8000003e  No    No   Level 1 data cache hits
PAPI_L2_DCH  0x8000003f  Yes   Yes  Level 2 data cache hits
PAPI_L1_DCA  0x80000040  No    No   Level 1 data cache accesses
PAPI_L2_DCA  0x80000041  Yes   No   Level 2 data cache accesses
PAPI_L3_DCA  0x80000042  Yes   Yes  Level 3 data cache accesses
PAPI_L1_DCR  0x80000043  No    No   Level 1 data cache reads
PAPI_L2_DCR  0x80000044  Yes   No   Level 2 data cache reads
PAPI_L3_DCR  0x80000045  Yes   No   Level 3 data cache reads
PAPI_L1_DCW  0x80000046  No    No   Level 1 data cache writes
PAPI_L2_DCW  0x80000047  Yes   No   Level 2 data cache writes
PAPI_L3_DCW  0x80000048  Yes   No   Level 3 data cache writes
PAPI_L1_ICH  0x80000049  Yes   No   Level 1 instruction cache hits
PAPI_L2_ICH  0x8000004a  Yes   No   Level 2 instruction cache hits
PAPI_L3_ICH  0x8000004b  No    No   Level 3 instruction cache hits
PAPI_L1_ICA  0x8000004c  Yes   No   Level 1 instruction cache accesses
PAPI_L2_ICA  0x8000004d  Yes   No   Level 2 instruction cache accesses
PAPI_L3_ICA  0x8000004e  Yes   No   Level 3 instruction cache accesses
PAPI_L1_ICR  0x8000004f  Yes   No   Level 1 instruction cache reads
PAPI_L2_ICR  0x80000050  Yes   No   Level 2 instruction cache reads
PAPI_L3_ICR  0x80000051  Yes   No   Level 3 instruction cache reads
PAPI_L1_ICW  0x80000052  No    No   Level 1 instruction cache writes
PAPI_L2_ICW  0x80000053  No    No   Level 2 instruction cache writes
PAPI_L3_ICW  0x80000054  No    No   Level 3 instruction cache writes
PAPI_L1_TCH  0x80000055  No    No   Level 1 total cache hits
PAPI_L2_TCH  0x80000056  Yes   Yes  Level 2 total cache hits
PAPI_L3_TCH  0x80000057  No    No   Level 3 total cache hits
PAPI_L1_TCA  0x80000058  No    No   Level 1 total cache accesses
PAPI_L2_TCA  0x80000059  Yes   No   Level 2 total cache accesses
PAPI_L3_TCA  0x8000005a  Yes   No   Level 3 total cache accesses
PAPI_L1_TCR  0x8000005b  No    No   Level 1 total cache reads
PAPI_L2_TCR  0x8000005c  Yes   Yes  Level 2 total cache reads
PAPI_L3_TCR  0x8000005d  Yes   Yes  Level 3 total cache reads
PAPI_L1_TCW  0x8000005e  No    No   Level 1 total cache writes
PAPI_L2_TCW  0x8000005f  Yes   No   Level 2 total cache writes
PAPI_L3_TCW  0x80000060  Yes   No   Level 3 total cache writes
PAPI_FML_INS 0x80000061  No    No   Floating point multiply instructions
PAPI_FAD_INS 0x80000062  No    No   Floating point add instructions
PAPI_FDV_INS 0x80000063  No    No   Floating point divide instructions
PAPI_FSQ_INS 0x80000064  No    No   Floating point square root instructions
PAPI_FNV_INS 0x80000065  No    No   Floating point inverse instructions
PAPI_FP_OPS  0x80000066  Yes   Yes  Floating point operations
PAPI_SP_OPS  0x80000067  Yes   Yes  Floating point operations; optimized to count scaled single precision vector operations
PAPI_DP_OPS  0x80000068  Yes   Yes  Floating point operations; optimized to count scaled double precision vector operations
PAPI_VEC_SP  0x80000069  Yes   No   Single precision vector/SIMD instructions
PAPI_VEC_DP  0x8000006a  Yes   No   Double precision vector/SIMD instructions
PAPI_REF_CYC 0x8000006b  Yes   No   Reference clock cycles
-------------------------------------------------------------------------
Of 108 possible events, 58 are available, of which 14 are derived.

avail.c                                     PASSED
Comment 3 Dave Love 2015-09-11 05:11:43 EDT
Created attachment 1072497 [details]
output

Output attached.  I actually see the same thing on Westmere.  That's running
a Scientific Linux rebuild, but the same package version.  The libraries are picked
up correctly, and rpmverify is OK on the package.  I can't think what else
might be awry.  Doubtless not a big issue.
Comment 4 William Cohen 2015-09-11 10:59:01 EDT
The attachment doesn't seem to have the actual output in it. 

Are these errors only observable on Scientific Linux or are the problems seen on Red Hat Enterprise Linux 6.7? Which specific papi rpm is being used? x86_64 or i386?

As stated earlier I was unable to replicate the problem on a westmere machine running RHEL6.7 and papi-5.1.1-11.el6.x86_64.
Comment 5 Dave Love 2015-09-14 07:17:35 EDT
Created attachment 1073183 [details]
revised output

Apologies for the bad attachment -- looks as if it picked up a rogue paste rather than the file.  New one attached.

The attachment is from up-to-date RHEL 6.7 on Sandybridge.  I can reproduce
that on Westmere, but only with (the same package versions) under SL as I don't
have RHEL on Westmere.

Puzzled, I got a backtrace.  Presumably this only occurs when you have a
Lustre filesystem mount:

Program received signal SIGSEGV, Segmentation fault.
__libc_free (mem=0x4c41544f54) at malloc.c:3714
3714	  if (chunk_is_mmapped(p))                       /* release mmapped memory. */
Missing separate debuginfos, use: debuginfo-install lm_sensors-libs-3.1.1-17.el6.x86_64 papi52-papi-5.2.0-5.el6.liv.x86_64
(gdb) bt
#0  __libc_free (mem=0x4c41544f54) at malloc.c:3714
#1  0x00007ffff7ba3757 in host_finalize ()
    at components/lustre/linux-lustre.c:359
#2  _lustre_shutdown_component () at components/lustre/linux-lustre.c:444
#3  0x00007ffff7b8dae9 in PAPI_shutdown () at papi.c:4401
#4  0x0000000000402fe5 in test_pass (file=0x404c90 "avail.c", values=0x0, 
    num_tests=0) at test_utils.c:505
#5  0x00000000004018ba in main (argc=<value optimized out>, 
    argv=0x7fffffffa93c) at avail.c:357
(gdb)
Comment 6 William Cohen 2015-09-14 09:39:07 EDT
Created attachment 1073252 [details]
Upstream patch to address segementation faults with lustre component

A search of the upstream papi git repository turned up this patch.  It doesn't apply cleanly the to lustre component in the rhel6.7 papi-5.1.1-11.el6.  There are some additional patches additional patches for the lustre component committed between 5.1.1 and when this patch was commit.
Comment 7 William Cohen 2015-12-09 12:46:55 EST
RHEL6 kernels don't provide the needed lustre module.  As a results, PAPI will not initialize/run the lustre component and this particular issue will not be triggered on RHEL6.  There is no way to test the lustre component fixes on RHEL6.
Comment 8 RHEL Product and Program Management 2015-12-09 13:06:55 EST
Development Management has reviewed and declined this request.
You may appeal this decision by reopening this request.

Note You need to log in before you can comment on or make changes to this bug.