Bug 746851 - papi: please update to 4.2
Summary: papi: please update to 4.2
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: papi
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: William Cohen
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-10-18 00:20 UTC by Jose Pedro Oliveira
Modified: 2012-02-27 02:20 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-02-27 02:20:22 UTC
Type: ---


Attachments (Terms of Use)
coretemp_basic fails to print the first event name (1.50 KB, patch)
2011-10-26 17:27 UTC, Jose Pedro Oliveira
no flags Details | Diff
Dump fields of PAPI_component_info_t (837 bytes, application/octet-stream)
2011-10-26 18:09 UTC, Jose Pedro Oliveira
no flags Details

Description Jose Pedro Oliveira 2011-10-18 00:20:58 UTC
Description of problem:
Please update papi to the latest upstream version (4.1.4).

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:
papi 4.1.4 in rawhide (and in F16).

Additional info:
----------
PAPI 4.1.4 Release (2011-08-30)

The PAPI 4.1.4 release is available for download. This is n internal release of PAPI-C, intended to suppor the Cray toolchain. It contains fixes for several bugs and some newly supported platforms.

This version:

* adds support for Intel SandyBridge processors;
* adds support for ARM Cortex A8 and A9 processors;
* enhances support for NVIDIA CUDA component;
* updates support for libpfm4
* replaces out of date man pages with doxygen generated man pages.

Unless you're curious or in need of one of the above features, we recommend that you wait to upgrade until the release of PAPI 4.2.
----------
Source: http://icl.cs.utk.edu/papi/software/index.html

Comment 1 William Cohen 2011-10-18 12:51:03 UTC
The plan for fedora is to focus on getting as many fixes and corrections into the upstream PAPI 4.2 as possible. PAPI-4.2 should be released in the near future. Then pulling PAPI-4.2 into Fedora. The papi 4.1.4 is going to have a short life.

Comment 2 William Cohen 2011-10-18 18:17:21 UTC
Changing to PAPI-4.2 since that will be the one used.

Comment 3 Jose Pedro Oliveira 2011-10-20 16:50:17 UTC
(In reply to comment #1)
> The plan for fedora is to focus on getting as many fixes and corrections into
> the upstream PAPI 4.2 as possible. PAPI-4.2 should be released in the near
> future. Then pulling PAPI-4.2 into Fedora. The papi 4.1.4 is going to have a
> short life.

Thanks for the feedback.  Meanwhile I would like to know if there are plans to enable the PAPI-C components available in src/components (acpi, net, lmsensors, ...). I'm particular interested in testing the net component.

Regards,
jpo

PS - Still a PAPI newbie.

Comment 4 William Cohen 2011-10-20 20:05:23 UTC
Yes, I plan to turn on components when possible. Some of the components such as CUDA can't be enable in Fedora because the needed files are not available in the build system. However, I have turned on the net component.

To get an early start on PAPI-4.2 there is a scratch build of the papi-4.2 (the version hasn't been bumped so it is still listed as 4.1.4) on koji:

http://koji.fedoraproject.org/koji/taskinfo?taskID=3447600

Please give this RPM a try. It would be good to test this out on Fedora and make sure that things are working. We would like to make sure that the upstream PAPI-4.2 is a good release.

Comment 5 Jose Pedro Oliveira 2011-10-20 22:51:13 UTC
(In reply to comment #4)
> Yes, I plan to turn on components when possible. Some of the components such as
> CUDA can't be enable in Fedora because the needed files are not available in
> the build system. However, I have turned on the net component.
> 
> To get an early start on PAPI-4.2 there is a scratch build of the papi-4.2 (the
> version hasn't been bumped so it is still listed as 4.1.4) on koji:
> 
> http://koji.fedoraproject.org/koji/taskinfo?taskID=3447600
> 
> Please give this RPM a try. It would be good to test this out on Fedora and
> make sure that things are working. We would like to make sure that the upstream
> PAPI-4.2 is a good release.

Thanks for the package update!

I've just build and installed it in Fedora 15 and already found a problem:

$ rpm -q papi
papi-4.1.4-0.20111020.fc15.x86_64

$ papi_component_avail 
*** buffer overflow detected ***: papi_component_avail terminated
======= Backtrace: =========
/lib64/libc.so.6(__fortify_fail+0x37)[0x34230f7e27]
/lib64/libc.so.6[0x34230f5e50]
/lib64/libc.so.6(__strncpy_chk+0x17e)[0x34230f512e]
/usr/lib64/libpapi.so(coretemp_init_substrate+0xa4)[0x7f7d6584e2d4]
/usr/lib64/libpapi.so(_papi_hwi_init_global+0x30)[0x7f7d658405a0]
/usr/lib64/libpapi.so(PAPI_library_init+0xad)[0x7f7d6583d57d]
papi_component_avail[0x401377]
/lib64/libc.so.6(__libc_start_main+0xed)[0x342302139d]
papi_component_avail[0x401691]
======= Memory map: ========
00400000-00406000 r-xp 00000000 08:03 2637455                            /usr/bin/papi_component_avail
00605000-00606000 rw-p 00005000 08:03 2637455                            /usr/bin/papi_component_avail
00606000-04606000 rw-p 00000000 00:00 0 
05694000-056e8000 rw-p 00000000 00:00 0                                  [heap]
3422c00000-3422c1f000 r-xp 00000000 08:03 2097154                        /lib64/ld-2.14.so
3422e1e000-3422e1f000 r--p 0001e000 08:03 2097154                        /lib64/ld-2.14.so
3422e1f000-3422e20000 rw-p 0001f000 08:03 2097154                        /lib64/ld-2.14.so
3422e20000-3422e21000 rw-p 00000000 00:00 0 
3423000000-342318f000 r-xp 00000000 08:03 2097155                        /lib64/libc-2.14.so
342318f000-342338f000 ---p 0018f000 08:03 2097155                        /lib64/libc-2.14.so
342338f000-3423393000 r--p 0018f000 08:03 2097155                        /lib64/libc-2.14.so
3423393000-3423394000 rw-p 00193000 08:03 2097155                        /lib64/libc-2.14.so
3423394000-342339a000 rw-p 00000000 00:00 0 
3ab7000000-3ab7015000 r-xp 00000000 08:03 2097260                        /lib64/libgcc_s-4.6.1-20110908.so.1
3ab7015000-3ab7214000 ---p 00015000 08:03 2097260                        /lib64/libgcc_s-4.6.1-20110908.so.1
3ab7214000-3ab7215000 rw-p 00014000 08:03 2097260                        /lib64/libgcc_s-4.6.1-20110908.so.1
7f7d65566000-7f7d65591000 rw-p 00000000 00:00 0 
7f7d65591000-7f7d655f1000 r-xp 00000000 08:03 2642949                    /usr/lib64/libpfm.so.4.2.0
7f7d655f1000-7f7d657f0000 ---p 00060000 08:03 2642949                    /usr/lib64/libpfm.so.4.2.0
7f7d657f0000-7f7d65824000 rw-p 0005f000 08:03 2642949                    /usr/lib64/libpfm.so.4.2.0
7f7d65824000-7f7d65826000 rw-p 00000000 00:00 0 
7f7d65826000-7f7d6586c000 r-xp 00000000 08:03 2642948                    /usr/lib64/libpapi.so.4.1.4.0
7f7d6586c000-7f7d65a6c000 ---p 00046000 08:03 2642948                    /usr/lib64/libpapi.so.4.1.4.0
7f7d65a6c000-7f7d65a86000 rw-p 00046000 08:03 2642948                    /usr/lib64/libpapi.so.4.1.4.0
7f7d65a86000-7f7d65a89000 rw-p 00000000 00:00 0 
7f7d65aab000-7f7d65aad000 rw-p 00000000 00:00 0 
7fff8d048000-7fff8d069000 rw-p 00000000 00:00 0                          [stack]
7fff8d1f9000-7fff8d1fa000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
Aborted (core dumped)

Comment 6 Jose Pedro Oliveira 2011-10-20 23:12:56 UTC
The core dump is caused by the coretemp component.

Removing the coretemp component ( --with-components="acpi example lustre net" ) solves the problem:

$ papi_component_avail 
Available components and hardware information.
--------------------------------------------------------------------------------
PAPI Version             : 4.1.4.0
Vendor string and code   : GenuineIntel (1)
Model string and code    : Intel(R) Core(TM) i7 CPU         870  @ 2.93GHz (30)
CPU Revision             : 5.000000
CPUID Info               : Family: 6  Model: 30  Stepping: 5
CPU Megahertz            : 1200.000000
CPU Clock Megahertz      : 1200
Hdw Threads per core     : 2
Cores per Socket         : 4
NUMA Nodes               : 1
CPU's per Node           : 8
Total CPU's              : 8
Number Hardware Counters : 7
Max Multiplex Counters   : 64
--------------------------------------------------------------------------------

Name:				$Id: perf_events.c,v 1.90 2011/10/05 21:29:39 vweaver1 Exp $
Name:				$Id: linux-acpi.c,v 1.5 2011/09/15 00:46:22 jagode Exp $
Name:				$Id: example.c,v 1.8 2011/09/15 00:46:22 jagode Exp $
Name:				$Id: linux-lustre.c,v 1.9 2011/09/15 00:46:22 jagode Exp $
Name:				$Id: linux-net.c,v 1.4 2011/09/15 00:46:22 jagode Exp $

--------------------------------------------------------------------------------
component.c                             PASSED

Comment 7 Jose Pedro Oliveira 2011-10-21 12:24:12 UTC
Gdb backtrace:
---------- 
...
(gdb) r
Starting program: /home/fedora/rpms/BUILD/papi-4.1.4/src/utils/papi_component_avail 

Program received signal SIGSEGV, Segmentation fault.
__strncpy_ssse3 () at ../sysdeps/x86_64/multiarch/strcpy.S:94
94		pcmpeqb	(%rsi), %xmm0			/* compare 16 bytes in (%rsi) and %xmm0 for equality, try to find null char*/

(gdb) backtrace
#0  __strncpy_ssse3 () at ../sysdeps/x86_64/multiarch/strcpy.S:94
#1  0x00000000004137e3 in coretemp_init_substrate () at components/coretemp/linux-coretemp.c:131
#2  coretemp_init_substrate () at components/coretemp/linux-coretemp.c:110
#3  0x000000000040ab2b in _papi_hwi_init_global () at papi_internal.c:1348
#4  0x0000000000407d70 in PAPI_library_init (version=<optimized out>) at papi.c:595
#5  0x000000000040228c in main (argc=1, argv=0x7fffffffe468) at component.c:85

(gdb) 
----------


The problem is in the coretemp_init_substrate() function:

  1) generateEventList("/sys/class/hwmon") returns 0 and
  2) the code that follows assumes that at least one event is available
     (NUM_EVENTS > 0; papi_malloc + do-while block)

/jpo

Comment 8 William Cohen 2011-10-21 15:07:58 UTC
Patched coretemp papi component to fix problems observed in it:

http://koji.fedoraproject.org/koji/taskinfo?taskID=3449724

However, when running "make fulltest" see problems, so there is some more work required.

Comment 9 Jose Pedro Oliveira 2011-10-21 16:16:34 UTC
William,


A couple more notes regarding the coretemp problem reported above (and a correction to your mail
http://lists.eecs.utk.edu/pipermail/perfapi-devel/2011-October/004865.html):

 * the crash occurred in the two systems where I tested:

    i) Intel i7 system running Fedora 15 x86_64
       (kernel 2.6.40.6-0.fc15.x86_64)
   ii) Dual-Xeon Dell server running Scientific Linux 6.1 x86_64
       (kernel 2.6.32-131.12.1.el6.x86_64)

 * both systems have the /sys/class/hwmon directory

     i) in the Fedora 15 system it contains a symbolic link

        lrwxrwxrwx. 1 root root 0 Oct 21 06:12 hwmon0 -> ../../devices/pci0000:00/0000:00:03.0/0000:01:00.0/hwmon/hwmon0

         The entry appears to be related to the system Radeon graphic card:

         # cat /sys/class/hwmon/hwmon0/name 
         radeon

    ii) in the SL6.1 system it is empty

/jpo

Comment 10 Jose Pedro Oliveira 2011-10-21 16:23:16 UTC
(In reply to comment #8)
> Patched coretemp papi component to fix problems observed in it:
> 
> http://koji.fedoraproject.org/koji/taskinfo?taskID=3449724
> 
> However, when running "make fulltest" see problems, so there is some more work
> required.

The patch fixed the crash:

$ rpm -q papi
papi-4.1.4-0.20111020.fc15_1.x86_64

$ papi_component_avail 
Available components and hardware information.
--------------------------------------------------------------------------------
PAPI Version             : 4.1.4.0
Vendor string and code   : GenuineIntel (1)
Model string and code    : Intel(R) Core(TM) i7 CPU         870  @ 2.93GHz (30)
CPU Revision             : 5.000000
CPUID Info               : Family: 6  Model: 30  Stepping: 5
CPU Megahertz            : 1200.000000
CPU Clock Megahertz      : 1200
Hdw Threads per core     : 2
Cores per Socket         : 4
NUMA Nodes               : 1
CPU's per Node           : 8
Total CPU's              : 8
Number Hardware Counters : 7
Max Multiplex Counters   : 64
--------------------------------------------------------------------------------

Name:				$Id: perf_events.c,v 1.90 2011/10/05 21:29:39 vweaver1 Exp $
Name:				$Id: linux-acpi.c,v 1.5 2011/09/15 00:46:22 jagode Exp $
Name:				$Id: linux-coretemp.c,v 1.5 2011/09/15 00:46:22 jagode Exp $
Name:				$Id: example.c,v 1.8 2011/09/15 00:46:22 jagode Exp $
Name:				$Id: linux-lustre.c,v 1.9 2011/09/15 00:46:22 jagode Exp $
Name:				$Id: linux-net.c,v 1.4 2011/09/15 00:46:22 jagode Exp $

--------------------------------------------------------------------------------
component.c                             PASSED

Comment 11 Jose Pedro Oliveira 2011-10-21 17:21:14 UTC
(In reply to comment #8)
...
> 
> However, when running "make fulltest" see problems, so there is some more work
> required.

I'm seeing lots of problems here too (partial output in the Fedora 15 system),
and one of the tests core dumps:

...
Running ctests/all_native_events:Unable to open ACPI temperature file.Error stopping ACPI_STAT: PAPI_EINVAL
Error starting ACPI_TEMP : PAPI_EISRUN
all_native_events.c             WARNING
Line # 171
Warning: 0 Uncore and 1 Offcore events were ignored

all_native_events.c             PASSED with WARNING
...
Running ctests/overflow_pthreads:can not find host counter: `< on hyperion....
we only have: 'tcp_segments_sent' 'tcp_segments_received' 'tcp_segments_retransmitted' 'lo_recv' 'lo_send' 'p132p1_recv' 'p132p1_send' 'p3p1_recv' 'p3p1_send' 'p3p2_recv' 'p3p2_send' 'tcp_segments_sent' 'tcp_segments_received' 'tcp_segments_retransmitted' 'lo_recv' 'lo_send' 'p132p1_recv' 'p132p1_send' 'p3p1_recv' 'p3p1_send' please subscribe only once to each counter
...
Running ctests/profile:profile.c                                 WARNING
Line # 143
Warning: PAPI_profil PAPI_PROFIL_RANDOM not supported

profile.c                                 PASSED with WARNING
...
Running ctests/profile_pthreads:can not find host counter: p= on hyperion...
we only have: 'tcp_segments_sent' 'tcp_segments_received' 'tcp_segments_retransmitted' 'lo_recv' 'lo_send' 'p132p1_recv' 'p132p1_send' 'p3p1_recv' 'p3p1_send' 'p3p2_recv' 'p3p2_send' 'tcp_segments_sent' 'tcp_segments_received' 'tcp_segments_retransmitted' 'tcp_segments_sent' 'tcp_segments_received' 'tcp_segments_retransmitted' 'tcp_segments_sent' can not find host counter: ?@ on hyperion...
we only have: 'tcp_segments_sent' 'tcp_segments_received' 'tcp_segments_retransmitted' 'lo_recv' 'lo_send' 'p132p1_recv' 'p132p1_send' 'p3p1_recv' 'p3p1_send' 'p3p2_recv' can not find host counter: ?> on hyperion...
we only have: 'tcp_segments_sent' 'tcp_segments_received' 'tcp_segments_retransmitted' 'lo_recv' 'lo_send' 'p132p1_recv' 'p132p1_send' 'p3p1_recv' 'p3p1_send' 'p3p2_recv' 'p3p2_send' 'tcp_segments_sent' 'tcp_segments_received' 'tcp_segments_retransmitted' 'tcp_segments_sent' 'tcp_segments_received' 'tcp_segments_retransmitted' 'tcp_segments_sent' 'tcp_segments_received' 'tcp_segments_retransmitted' 'tcp_segments_sent' 'tcp_segments_received' 'tcp_segments_retransmitted' 'lo_recv' 'lo_send' 'p132p1_recv' 'p132p1_send' 'p3p1_recv' 'p3p1_send' 'tcp_segments_received' 'tcp_segments_retransmitted' 'tcp_segments_sent' 'tcp_segments_received' 'tcp_segments_retransmitted' 'lo_recv' 'lo_send' 'p132p1_recv' 'p132p1_send' 'p3p1_recv' 'p3p1_send' 'p3p2_recv' 'p3p2_send' 'lo_recv' 'lo_send' 'p132p1_recv' 'p132p1_send' 'p3p1_recv' 'p3p2_recv' 'p3p2_send' 'lo_recv' 'lo_send' 'p132p1_recv' 'p132p1_send' 'p3p1_recv' 'p3p1_send' 'p3p2_recv' 'p3p2_send' 'lo_recv' 'lo_send' 'p132p1_recv' 'p132p1_send' 'p3p1_recv' 'p3p1_send' 'p3p2_recv' 'p3p2_send' 'lo_recv' 'lo_send' 'p132p1_recv' 'p132p1_send' 'p3p1_recv' 'p3p1_send' 'p3p2_recv' 'p3p2_send' 'p3p1_send' 'p3p2_recv' 'p3p2_send' 'lo_recv' 'lo_send' 'p132p1_recv' 'p132p1_send' 'p3p1_recv' 'p3p1_send' 'p3p2_recv' 'p3p2_send' 'lo_recv' 'lo_send' 'p132p1_recv' 'p132p1_send' 
...
Running ctests/sdsc4-mpx:sdsc4.c                                     FAILED
Line # 307
Error: Values differ from reference

Running ctests/sdsc-mpx:sdsc.c                                       FAILED
Line # 55
Error: Error on 0, 1766903735.000000>0.200000 and 1766903735>100000

...
Running ctests/thrspecific:*** glibc detected *** ./ctests/thrspecific: double free or corruption (out): 0x00007fc828003920 ***
...
./ctests/thrspecific[0x402365]
'tcp_segments_sent' /lib64/libpthread.so.0[0x3423c07b31]
'tcp_segments_received' 'tcp_segments_received' 'tcp_segments_retransmitted' 'tcp_segments_retransmitted' 'lo_recv' 00400000-0049f000 r-xp 00000000 08:06 1846384                            /home/fedora/rpms/BUILD/papi-4.1.4/src/ctests/thrspecific
0069e000-006bf000 rw-p 0009e000 08:06 1846384                            /home/fedora/rpms/BUILD/papi-4.1.4/src/ctests/thrspecific
006bf000-046c4000 rw-p 00000000 00:00 0 
04b70000-04bc4000 rw-p 00000000 00:00 0                                  [heap]
3422c00000-3422c1f000 r-xp 00000000 08:03 2097154                        /lib64/ld-2.14.so
3422e1e000-3422e1f000 r--p 0001e000 08:03 2097154                        /lib64/ld-2.14.so
3422e1f000-3422e20000 rw-p 0001f000 08:03 2097154                        /lib64/ld-2.14.so
3422e20000-3422e21000 rw-p 00000000 00:00 0 
3423000000-342318f000 r-xp 00000000 08:03 2097155                        /lib64/libc-2.14.so
342318f000-342338f000 ---p 0018f000 08:03 2097155                        /lib64/libc-2.14.so
342338f000-3423393000 r--p 0018f000 08:03 2097155                        /lib64/libc-2.14.so
3423393000-3423394000 rw-p 00193000 08:03 20run_tests.sh: line 114:   713 Aborted                 (core dumped) $VALGRIND ./$i $TESTS_QUIET
...

Comment 12 Jose Pedro Oliveira 2011-10-21 17:31:51 UTC
Some of the net component problems appear to be caused by the new network interfaces names in Fedora >= 15.

The F15 system has the following network interfaces up: lo, p132p1, p3p1 and p3p2.

Although the net component detects these interfaces, it doesn't appear to support them natively (it appears to have LO and ETH0..ETH4 hardcoded). It only
lists bytes received and written for the new p* interfaces. There is room for improvement as it appears to be using the ifconfig output as source of the network stats.

Partial output of papi_native_avail:
----------
...
--------------------------------------------------------------------------------
0x50000000   tcp_segments_sent  | # of TCP segments sent                       |
--------------------------------------------------------------------------------
0x50000001   tcp_segments_received  | # of TCP segments received               |
--------------------------------------------------------------------------------
0x50000002   tcp_segments_retransmitted  | # of TCP segments retransmitted     |
--------------------------------------------------------------------------------
0x50000003   lo_recv  | bytes received on this interface                       |
--------------------------------------------------------------------------------
0x50000004   lo_send  | bytes written on this interface                        |
--------------------------------------------------------------------------------
0x50000005   p132p1_recv  | bytes received on this interface                   |
--------------------------------------------------------------------------------
0x50000006   p132p1_send  | bytes written on this interface                    |
--------------------------------------------------------------------------------
0x50000007   p3p1_recv  | bytes received on this interface                     |
--------------------------------------------------------------------------------
0x50000008   p3p1_send  | bytes written on this interface                      |
--------------------------------------------------------------------------------
0x50000009   p3p2_recv  | bytes received on this interface                     |
--------------------------------------------------------------------------------
0x5000000a   p3p2_send  | bytes written on this interface                      |
--------------------------------------------------------------------------------
0x54000000   LO_RX_PACKETS  | LO_RX_PACKETS                                    |
--------------------------------------------------------------------------------
0x54000001   LO_RX_ERRORS  | LO_RX_ERRORS                                      |
--------------------------------------------------------------------------------
0x54000002   LO_RX_DROPPED  | LO_RX_DROPPED                                    |
--------------------------------------------------------------------------------
0x54000003   LO_RX_OVERRUNS  | LO_RX_OVERRUNS                                  |
--------------------------------------------------------------------------------
0x54000004   LO_RX_FRAME  | LO_RX_FRAME                                        |
--------------------------------------------------------------------------------
0x54000005   LO_RX_BYTES  | LO_RX_BYTES                                        |
--------------------------------------------------------------------------------
0x54000006   LO_TX_PACKETS  | LO_TX_PACKETS                                    |
--------------------------------------------------------------------------------
0x54000007   LO_TX_ERRORS  | LO_TX_ERRORS                                      |
--------------------------------------------------------------------------------
0x54000008   LO_TX_DROPPED  | LO_TX_DROPPED                                    |
--------------------------------------------------------------------------------
0x54000009   LO_TX_OVERRUNS  | LO_TX_OVERRUNS                                  |
--------------------------------------------------------------------------------
0x5400000a   LO_TX_CARRIER  | LO_TX_CARRIER                                    |
--------------------------------------------------------------------------------
0x5400000b   LO_TX_BYTES  | LO_TX_BYTES                                        |
--------------------------------------------------------------------------------
0x5400000c   LO_COLLISIONS  | LO_COLLISIONS                                    |
--------------------------------------------------------------------------------
0x5400000d   ETH0_RX_PACKETS  | ETH0_RX_PACKETS                                |
--------------------------------------------------------------------------------
0x5400000e   ETH0_RX_ERRORS  | ETH0_RX_ERRORS                                  |
--------------------------------------------------------------------------------
0x5400000f   ETH0_RX_DROPPED  | ETH0_RX_DROPPED                                |
--------------------------------------------------------------------------------
0x54000010   ETH0_RX_OVERRUNS  | ETH0_RX_OVERRUNS                              |
--------------------------------------------------------------------------------
0x54000011   ETH0_RX_FRAME  | ETH0_RX_FRAME                                    |
--------------------------------------------------------------------------------
0x54000012   ETH0_RX_BYTES  | ETH0_RX_BYTES                                    |
--------------------------------------------------------------------------------
0x54000013   ETH0_TX_PACKETS  | ETH0_TX_PACKETS                                |
--------------------------------------------------------------------------------
0x54000014   ETH0_TX_ERRORS  | ETH0_TX_ERRORS                                  |
--------------------------------------------------------------------------------
0x54000015   ETH0_TX_DROPPED  | ETH0_TX_DROPPED                                |
--------------------------------------------------------------------------------
0x54000016   ETH0_TX_OVERRUNS  | ETH0_TX_OVERRUNS                              |
--------------------------------------------------------------------------------
0x54000017   ETH0_TX_CARRIER  | ETH0_TX_CARRIER                                |
--------------------------------------------------------------------------------
0x54000018   ETH0_TX_BYTES  | ETH0_TX_BYTES                                    |
--------------------------------------------------------------------------------
0x54000019   ETH0_COLLISIONS  | ETH0_COLLISIONS                                |
--------------------------------------------------------------------------------
...
----------

Comment 13 Jose Pedro Oliveira 2011-10-21 18:54:37 UTC
(In reply to comment #9)
> William,
> 
> 
> A couple more notes regarding the coretemp problem reported above (and a
> correction to your mail
> http://lists.eecs.utk.edu/pipermail/perfapi-devel/2011-October/004865.html):
> 
>  * the crash occurred in the two systems where I tested:
> 
>     i) Intel i7 system running Fedora 15 x86_64
>        (kernel 2.6.40.6-0.fc15.x86_64)
>    ii) Dual-Xeon Dell server running Scientific Linux 6.1 x86_64
>        (kernel 2.6.32-131.12.1.el6.x86_64)
> 
>  * both systems have the /sys/class/hwmon directory
> 
>      i) in the Fedora 15 system it contains a symbolic link
> 
>         lrwxrwxrwx. 1 root root 0 Oct 21 06:12 hwmon0 ->
> ../../devices/pci0000:00/0000:00:03.0/0000:01:00.0/hwmon/hwmon0
> 
>          The entry appears to be related to the system Radeon graphic card:
> 
>          # cat /sys/class/hwmon/hwmon0/name 
>          radeon
> 
>     ii) in the SL6.1 system it is empty
> 
> /jpo


 * PAPI Component Repository 
   http://icl.cs.utk.edu/projects/papi/repository/index.php/Main_Page

   Problem: doesn't list the net component

 * CoreTemp
   http://icl.cs.utk.edu/projects/papi/repository/index.php/CoreTemp

   Note: the kernel module coretemp needs to be loaded
         (default: not loaded)
         (modprobe coretemp   -->   /sys/class/hwmon/hwmon[0-9]+ entries)

Comment 14 Jose Pedro Oliveira 2011-10-21 20:42:32 UTC
William Cohen,

Regarding the coretemp patch: I think it would be better to make the function generateEventList() return a PAPI error if it isn't able to locate hwmon*/{temp,fan}* files. Something like:

--- papi-4.1.4/src/components/coretemp/linux-coretemp.c	2011-09-15 01:46:22.000000000 +0100
+++ papi-4.1.4-modified/src/components/coretemp/linux-coretemp.c	2011-10-21 21:17:53.057865554 +0100
@@ -91,6 +91,12 @@
   }
 
   closedir(dir);
+
+  if ( count == 0 ) {
+	PAPIERROR("Oops: No temperature/fan performance counters found! Are you sure the coretemp module is loaded?\n");
+	return( PAPI_ENOCNTR );
+  }
+
   return (count);
 }
 
----------

With this patch the coretemp_init_substrate() "if (NUMEVENTS==0)..." test could be dropped.

Caveat: with this patch will cause all papi applications  to fail if the coretemp component isn't able to find any temperature/fan performance counters.


Failure example in the Fedora 15 system without the coretemp module loaded:
----------
$ papi_component_avail 
PAPI Error: Oops: No temperature/fan performance counters found! Are you sure the coretemp module is loaded?
.
component.c                             FAILED
Line # 87
Error in PAPI_library_init: PAPI_ENOCNTR
----------

Too harsh? Note that coretemp_init_substrate() returns a papi error if it doesn't find the /sys/class/hwmon directory.

Any thoughts?

jpo

Comment 15 Jose Pedro Oliveira 2011-10-24 20:21:35 UTC
William,

Would be possible to update the papi SRPM with latest CVS source code?
It includes code committed today [1] and friday.

tia,
jpo

[1] - the version has already been bumped to 4.2.0

Comment 16 William Cohen 2011-10-25 02:06:36 UTC
For comment 14 the software shouldn't assume that the hardware support is there for that. If it fails, then papi will not work on any machine that doesn't have the hardware support for coretemp. The code needs to be graceful about.

There is a scratch build for F15 of the current version papi cvs in

http://koji.fedoraproject.org/koji/taskinfo?taskID=3457926

Would prefer to use official release for rawhide. Once there is an official release of papi 4.2, a build will be pushed through.

Comment 17 Jose Pedro Oliveira 2011-10-25 08:40:53 UTC
(In reply to comment #16)
> For comment 14 the software shouldn't assume that the hardware support is there
> for that. If it fails, then papi will not work on any machine that doesn't have
> the hardware support for coretemp. The code needs to be graceful about.

I agree.

> There is a scratch build for F15 of the current version papi cvs in
> 
> http://koji.fedoraproject.org/koji/taskinfo?taskID=3457926

Thanks for the new scratch build (note that it only builds papi with the lustre component active).
 
> Would prefer to use official release for rawhide. Once there is an official
> release of papi 4.2, a build will be pushed through.

Ack.

jpo

Comment 18 Jose Pedro Oliveira 2011-10-25 09:26:51 UTC
Still regarding the coretemp component:

 * the last commit to the linux-coretemp.c file no longer makes papi terminate
   if the /sys/class/hwmon directory doesn't exist:

   ----------
   ...
   dir = opendir(base_dir);
   if ( dir == NULL ) {
      SUBDBG("Can't find %s, are you sure the coretemp module is loaded?\n",ยท
          base_dir);
      return 0;
   }
   ...
   ------------

   The problem is that the debug message is not entirely correct as the
   /sys/class/hwmon directory is not created by the coretemp module.

   Quote from http://www.mjmwired.net/kernel/Documentation/thermal/sysfs-api.txt
   ----------
   ...
   106	2. sysfs attributes structure
   107	
   108	RO	read only value
   109	RW	read/write value
   110	
   111	Thermal sysfs attributes will be represented under /sys/class/thermal.
   112	Hwmon sysfs I/F extension is also available under /sys/class/hwmon
   113	if hwmon is compiled in or built as a module.
   ...
   -----------

 * I still would like to see generateEventList() emit a warning if it
   doesn't find any counters:

      closedir(dir);
   +
   +  if ( count == 0 ) {
   +     SUBDBG("No thermal counters found! Are you sure the coretemp module is loaded?\n");
   +  }
   +
      return (count);
 }

Comment 19 Jose Pedro Oliveira 2011-10-25 14:53:01 UTC
SUT: Fedora 15 x86_64 with papi compiled with 5 components

Some tests are still experiencing memory corruption problems. For example, runnning profile_pthreads produces different output (and sometimes it crashes):


$  ctests/profile_pthreads
please subscribe only once to each counter

$  ctests/profile_pthreads
can not find host counter: = on <myhostname>
we only have: 'tcp_segments_sent' 'tcp_segments_received' 'tcp_segments_retransmitted' 'lo_recv' 'lo_send' 'p132p1_recv' 'p132p1_send' 'p3p1_recv' 'p3p1_send' 'p3p2_recv' 'p3p2_send' 'tcp_segments_sent' 'tcp_segments_received' 'tcp_segments_retransmitted' 'tcp_segments_sent' 'tcp_segments_received' 'tcp_segments_retransmitted' 'tcp_segments_sent' 'tcp_segments_received' 'tcp_segments_retransmitted' 'lo_recv' 'lo_send' 'p132p1_recv' 'p132p1_send' 'p3p1_recv' 'p3p1_send' 'p3p2_recv' please subscribe only once to each counter
'p3p2_send' 'tcp_segments_sent'

$  ctests/profile_pthreads
*** glibc detected *** ctests/profile_pthreadsplease subscribe only once to each counter

Comment 20 William Cohen 2011-10-25 15:43:29 UTC
It looks like the luster component is causing these failures. Turned off the luster component in:

http://koji.fedoraproject.org/koji/taskinfo?taskID=3459523

Comment 21 Jose Pedro Oliveira 2011-10-26 17:24:15 UTC
(In reply to comment #20)
> It looks like the luster component is causing these failures. Turned off the
> luster component in:
> 
> http://koji.fedoraproject.org/koji/taskinfo?taskID=3459523

Thanks for the new build.

Comment 22 Jose Pedro Oliveira 2011-10-26 17:27:42 UTC
Created attachment 530341 [details]
coretemp_basic fails to print the first event name

Changes:
 * bug: prints the first event name
 * enhancement: prints the component name

Comment 23 Jose Pedro Oliveira 2011-10-26 17:35:00 UTC
William,

Could you send upstream the patch attached in comment 22? My PAPI's MLs subscription requests are still pending approval.

tia,
jpo

Comment 24 Jose Pedro Oliveira 2011-10-26 17:55:31 UTC
There is still another problem with the coretemp component registration:

 * In my F15 system I've 20 temp* entries under /sys/class/hwmon/hwmon1/device 
   and the coretemp test program list them all (with the patch from comment 22
   applied).

   The problem is that the above number of events is not registered in the 
   component info struct. Dumping some of the values I get:

----------
Component: 2
	: Name              = $Id: linux-coretemp.c,v 1.13 2011/10/25 15:20:41 vweaver1 Exp $
	: Version           = $Revision: 1.13 $
	: Hardware counters = 512
	: Mux Hw counters   = 32
	: Preset events     = 0
	: Native events     = 0
...
----------

I would expect the number of hardware counters and/or the number of native events to be 20.

/jpo


Source code:
    ...
    numcmp = PAPI_num_components();
    for ( cid=0; cid<numcmp; cid++ ) {
        printf("Component: %d\n",cid);

        if ( (cmpinfo = PAPI_get_component_info(cid)) == NULL) 
            exit(EXIT_FAILURE);

        printf("\t: Name              = %s\n", cmpinfo->name);
        printf("\t: Version           = %s\n", cmpinfo->version);
        printf("\t: Hardware counters = %d\n", cmpinfo->num_cntrs);
        printf("\t: Mux Hw counters   = %d\n", cmpinfo->num_mpx_cntrs);
        printf("\t: Preset events     = %d\n", cmpinfo->num_preset_events);
        printf("\t: Native events     = %d\n", cmpinfo->num_preset_events);

    }
    ...

Comment 25 Jose Pedro Oliveira 2011-10-26 18:09:02 UTC
Created attachment 530347 [details]
Dump fields of PAPI_component_info_t

Program to dump some fields of PAPI_component_info_t structs.

Note: The code at the end of comment 24 didn't print the number of native events
      (it was printing the number of preset events instead).

Comment 26 Jose Pedro Oliveira 2011-10-27 02:35:12 UTC
(In reply to comment #23)
> William,
> 
> Could you send upstream the patch attached in comment 22? My PAPI's MLs
> subscription requests are still pending approval.

I've already sent the patch to the perfapi-devel mailing list.

/jpo

Comment 27 William Cohen 2011-10-27 14:10:49 UTC
The official papi-4.2.0.tar.gz has been downloaded and an rpm built for rawhide. The koji build of it is below:

http://koji.fedoraproject.org/koji/buildinfo?buildID=271035

Comment 28 Jose Pedro Oliveira 2011-10-28 20:02:13 UTC
Problem in comment 24 has been fixed fixed upstream:
http://icl.cs.utk.edu/viewcvs/viewcvs.cgi/PAPI/papi/src/components/coretemp/linux-coretemp.c?r1=1.15&r2=1.16

Comment 29 William Cohen 2011-10-31 15:50:22 UTC
Built papi with coretemp patches. Fixed in papi-4.2.0-2.fc17:

http://koji.fedoraproject.org/koji/rpminfo?rpmID=2771535

Comment 30 Jose Pedro Oliveira 2011-11-23 16:52:41 UTC
William,

FYI: Yesterday Vince committed the rewritten net component I have sent him two weeks ago. It now reads the network events directly from /proc/net/dev instead of parsing the output of the external ifconfig command.

Changelog:
http://icl.cs.utk.edu/viewcvs/viewcvs.cgi/PAPI/papi/src/components/net/CHANGES?revision=1.1&view=markup

/jpo

Comment 31 Jose Pedro Oliveira 2012-02-27 02:20:22 UTC
Closing ticket.

Current Fedora 17/18 PAPI builds: papi-4.2.0-4.fc17, papi-4.2.1-1.fc18


Note You need to log in before you can comment on or make changes to this bug.