Hide Forgot
Description of problem: Please update papi to the latest upstream version (4.1.4). Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: papi 4.1.4 in rawhide (and in F16). Additional info: ---------- PAPI 4.1.4 Release (2011-08-30) The PAPI 4.1.4 release is available for download. This is n internal release of PAPI-C, intended to suppor the Cray toolchain. It contains fixes for several bugs and some newly supported platforms. This version: * adds support for Intel SandyBridge processors; * adds support for ARM Cortex A8 and A9 processors; * enhances support for NVIDIA CUDA component; * updates support for libpfm4 * replaces out of date man pages with doxygen generated man pages. Unless you're curious or in need of one of the above features, we recommend that you wait to upgrade until the release of PAPI 4.2. ---------- Source: http://icl.cs.utk.edu/papi/software/index.html
The plan for fedora is to focus on getting as many fixes and corrections into the upstream PAPI 4.2 as possible. PAPI-4.2 should be released in the near future. Then pulling PAPI-4.2 into Fedora. The papi 4.1.4 is going to have a short life.
Changing to PAPI-4.2 since that will be the one used.
(In reply to comment #1) > The plan for fedora is to focus on getting as many fixes and corrections into > the upstream PAPI 4.2 as possible. PAPI-4.2 should be released in the near > future. Then pulling PAPI-4.2 into Fedora. The papi 4.1.4 is going to have a > short life. Thanks for the feedback. Meanwhile I would like to know if there are plans to enable the PAPI-C components available in src/components (acpi, net, lmsensors, ...). I'm particular interested in testing the net component. Regards, jpo PS - Still a PAPI newbie.
Yes, I plan to turn on components when possible. Some of the components such as CUDA can't be enable in Fedora because the needed files are not available in the build system. However, I have turned on the net component. To get an early start on PAPI-4.2 there is a scratch build of the papi-4.2 (the version hasn't been bumped so it is still listed as 4.1.4) on koji: http://koji.fedoraproject.org/koji/taskinfo?taskID=3447600 Please give this RPM a try. It would be good to test this out on Fedora and make sure that things are working. We would like to make sure that the upstream PAPI-4.2 is a good release.
(In reply to comment #4) > Yes, I plan to turn on components when possible. Some of the components such as > CUDA can't be enable in Fedora because the needed files are not available in > the build system. However, I have turned on the net component. > > To get an early start on PAPI-4.2 there is a scratch build of the papi-4.2 (the > version hasn't been bumped so it is still listed as 4.1.4) on koji: > > http://koji.fedoraproject.org/koji/taskinfo?taskID=3447600 > > Please give this RPM a try. It would be good to test this out on Fedora and > make sure that things are working. We would like to make sure that the upstream > PAPI-4.2 is a good release. Thanks for the package update! I've just build and installed it in Fedora 15 and already found a problem: $ rpm -q papi papi-4.1.4-0.20111020.fc15.x86_64 $ papi_component_avail *** buffer overflow detected ***: papi_component_avail terminated ======= Backtrace: ========= /lib64/libc.so.6(__fortify_fail+0x37)[0x34230f7e27] /lib64/libc.so.6[0x34230f5e50] /lib64/libc.so.6(__strncpy_chk+0x17e)[0x34230f512e] /usr/lib64/libpapi.so(coretemp_init_substrate+0xa4)[0x7f7d6584e2d4] /usr/lib64/libpapi.so(_papi_hwi_init_global+0x30)[0x7f7d658405a0] /usr/lib64/libpapi.so(PAPI_library_init+0xad)[0x7f7d6583d57d] papi_component_avail[0x401377] /lib64/libc.so.6(__libc_start_main+0xed)[0x342302139d] papi_component_avail[0x401691] ======= Memory map: ======== 00400000-00406000 r-xp 00000000 08:03 2637455 /usr/bin/papi_component_avail 00605000-00606000 rw-p 00005000 08:03 2637455 /usr/bin/papi_component_avail 00606000-04606000 rw-p 00000000 00:00 0 05694000-056e8000 rw-p 00000000 00:00 0 [heap] 3422c00000-3422c1f000 r-xp 00000000 08:03 2097154 /lib64/ld-2.14.so 3422e1e000-3422e1f000 r--p 0001e000 08:03 2097154 /lib64/ld-2.14.so 3422e1f000-3422e20000 rw-p 0001f000 08:03 2097154 /lib64/ld-2.14.so 3422e20000-3422e21000 rw-p 00000000 00:00 0 3423000000-342318f000 r-xp 00000000 08:03 2097155 /lib64/libc-2.14.so 342318f000-342338f000 ---p 0018f000 08:03 2097155 /lib64/libc-2.14.so 342338f000-3423393000 r--p 0018f000 08:03 2097155 /lib64/libc-2.14.so 3423393000-3423394000 rw-p 00193000 08:03 2097155 /lib64/libc-2.14.so 3423394000-342339a000 rw-p 00000000 00:00 0 3ab7000000-3ab7015000 r-xp 00000000 08:03 2097260 /lib64/libgcc_s-4.6.1-20110908.so.1 3ab7015000-3ab7214000 ---p 00015000 08:03 2097260 /lib64/libgcc_s-4.6.1-20110908.so.1 3ab7214000-3ab7215000 rw-p 00014000 08:03 2097260 /lib64/libgcc_s-4.6.1-20110908.so.1 7f7d65566000-7f7d65591000 rw-p 00000000 00:00 0 7f7d65591000-7f7d655f1000 r-xp 00000000 08:03 2642949 /usr/lib64/libpfm.so.4.2.0 7f7d655f1000-7f7d657f0000 ---p 00060000 08:03 2642949 /usr/lib64/libpfm.so.4.2.0 7f7d657f0000-7f7d65824000 rw-p 0005f000 08:03 2642949 /usr/lib64/libpfm.so.4.2.0 7f7d65824000-7f7d65826000 rw-p 00000000 00:00 0 7f7d65826000-7f7d6586c000 r-xp 00000000 08:03 2642948 /usr/lib64/libpapi.so.4.1.4.0 7f7d6586c000-7f7d65a6c000 ---p 00046000 08:03 2642948 /usr/lib64/libpapi.so.4.1.4.0 7f7d65a6c000-7f7d65a86000 rw-p 00046000 08:03 2642948 /usr/lib64/libpapi.so.4.1.4.0 7f7d65a86000-7f7d65a89000 rw-p 00000000 00:00 0 7f7d65aab000-7f7d65aad000 rw-p 00000000 00:00 0 7fff8d048000-7fff8d069000 rw-p 00000000 00:00 0 [stack] 7fff8d1f9000-7fff8d1fa000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] Aborted (core dumped)
The core dump is caused by the coretemp component. Removing the coretemp component ( --with-components="acpi example lustre net" ) solves the problem: $ papi_component_avail Available components and hardware information. -------------------------------------------------------------------------------- PAPI Version : 4.1.4.0 Vendor string and code : GenuineIntel (1) Model string and code : Intel(R) Core(TM) i7 CPU 870 @ 2.93GHz (30) CPU Revision : 5.000000 CPUID Info : Family: 6 Model: 30 Stepping: 5 CPU Megahertz : 1200.000000 CPU Clock Megahertz : 1200 Hdw Threads per core : 2 Cores per Socket : 4 NUMA Nodes : 1 CPU's per Node : 8 Total CPU's : 8 Number Hardware Counters : 7 Max Multiplex Counters : 64 -------------------------------------------------------------------------------- Name: $Id: perf_events.c,v 1.90 2011/10/05 21:29:39 vweaver1 Exp $ Name: $Id: linux-acpi.c,v 1.5 2011/09/15 00:46:22 jagode Exp $ Name: $Id: example.c,v 1.8 2011/09/15 00:46:22 jagode Exp $ Name: $Id: linux-lustre.c,v 1.9 2011/09/15 00:46:22 jagode Exp $ Name: $Id: linux-net.c,v 1.4 2011/09/15 00:46:22 jagode Exp $ -------------------------------------------------------------------------------- component.c PASSED
Gdb backtrace: ---------- ... (gdb) r Starting program: /home/fedora/rpms/BUILD/papi-4.1.4/src/utils/papi_component_avail Program received signal SIGSEGV, Segmentation fault. __strncpy_ssse3 () at ../sysdeps/x86_64/multiarch/strcpy.S:94 94 pcmpeqb (%rsi), %xmm0 /* compare 16 bytes in (%rsi) and %xmm0 for equality, try to find null char*/ (gdb) backtrace #0 __strncpy_ssse3 () at ../sysdeps/x86_64/multiarch/strcpy.S:94 #1 0x00000000004137e3 in coretemp_init_substrate () at components/coretemp/linux-coretemp.c:131 #2 coretemp_init_substrate () at components/coretemp/linux-coretemp.c:110 #3 0x000000000040ab2b in _papi_hwi_init_global () at papi_internal.c:1348 #4 0x0000000000407d70 in PAPI_library_init (version=<optimized out>) at papi.c:595 #5 0x000000000040228c in main (argc=1, argv=0x7fffffffe468) at component.c:85 (gdb) ---------- The problem is in the coretemp_init_substrate() function: 1) generateEventList("/sys/class/hwmon") returns 0 and 2) the code that follows assumes that at least one event is available (NUM_EVENTS > 0; papi_malloc + do-while block) /jpo
Patched coretemp papi component to fix problems observed in it: http://koji.fedoraproject.org/koji/taskinfo?taskID=3449724 However, when running "make fulltest" see problems, so there is some more work required.
William, A couple more notes regarding the coretemp problem reported above (and a correction to your mail http://lists.eecs.utk.edu/pipermail/perfapi-devel/2011-October/004865.html): * the crash occurred in the two systems where I tested: i) Intel i7 system running Fedora 15 x86_64 (kernel 2.6.40.6-0.fc15.x86_64) ii) Dual-Xeon Dell server running Scientific Linux 6.1 x86_64 (kernel 2.6.32-131.12.1.el6.x86_64) * both systems have the /sys/class/hwmon directory i) in the Fedora 15 system it contains a symbolic link lrwxrwxrwx. 1 root root 0 Oct 21 06:12 hwmon0 -> ../../devices/pci0000:00/0000:00:03.0/0000:01:00.0/hwmon/hwmon0 The entry appears to be related to the system Radeon graphic card: # cat /sys/class/hwmon/hwmon0/name radeon ii) in the SL6.1 system it is empty /jpo
(In reply to comment #8) > Patched coretemp papi component to fix problems observed in it: > > http://koji.fedoraproject.org/koji/taskinfo?taskID=3449724 > > However, when running "make fulltest" see problems, so there is some more work > required. The patch fixed the crash: $ rpm -q papi papi-4.1.4-0.20111020.fc15_1.x86_64 $ papi_component_avail Available components and hardware information. -------------------------------------------------------------------------------- PAPI Version : 4.1.4.0 Vendor string and code : GenuineIntel (1) Model string and code : Intel(R) Core(TM) i7 CPU 870 @ 2.93GHz (30) CPU Revision : 5.000000 CPUID Info : Family: 6 Model: 30 Stepping: 5 CPU Megahertz : 1200.000000 CPU Clock Megahertz : 1200 Hdw Threads per core : 2 Cores per Socket : 4 NUMA Nodes : 1 CPU's per Node : 8 Total CPU's : 8 Number Hardware Counters : 7 Max Multiplex Counters : 64 -------------------------------------------------------------------------------- Name: $Id: perf_events.c,v 1.90 2011/10/05 21:29:39 vweaver1 Exp $ Name: $Id: linux-acpi.c,v 1.5 2011/09/15 00:46:22 jagode Exp $ Name: $Id: linux-coretemp.c,v 1.5 2011/09/15 00:46:22 jagode Exp $ Name: $Id: example.c,v 1.8 2011/09/15 00:46:22 jagode Exp $ Name: $Id: linux-lustre.c,v 1.9 2011/09/15 00:46:22 jagode Exp $ Name: $Id: linux-net.c,v 1.4 2011/09/15 00:46:22 jagode Exp $ -------------------------------------------------------------------------------- component.c PASSED
(In reply to comment #8) ... > > However, when running "make fulltest" see problems, so there is some more work > required. I'm seeing lots of problems here too (partial output in the Fedora 15 system), and one of the tests core dumps: ... Running ctests/all_native_events:Unable to open ACPI temperature file.Error stopping ACPI_STAT: PAPI_EINVAL Error starting ACPI_TEMP : PAPI_EISRUN all_native_events.c WARNING Line # 171 Warning: 0 Uncore and 1 Offcore events were ignored all_native_events.c PASSED with WARNING ... Running ctests/overflow_pthreads:can not find host counter: `< on hyperion.... we only have: 'tcp_segments_sent' 'tcp_segments_received' 'tcp_segments_retransmitted' 'lo_recv' 'lo_send' 'p132p1_recv' 'p132p1_send' 'p3p1_recv' 'p3p1_send' 'p3p2_recv' 'p3p2_send' 'tcp_segments_sent' 'tcp_segments_received' 'tcp_segments_retransmitted' 'lo_recv' 'lo_send' 'p132p1_recv' 'p132p1_send' 'p3p1_recv' 'p3p1_send' please subscribe only once to each counter ... Running ctests/profile:profile.c WARNING Line # 143 Warning: PAPI_profil PAPI_PROFIL_RANDOM not supported profile.c PASSED with WARNING ... Running ctests/profile_pthreads:can not find host counter: p= on hyperion... we only have: 'tcp_segments_sent' 'tcp_segments_received' 'tcp_segments_retransmitted' 'lo_recv' 'lo_send' 'p132p1_recv' 'p132p1_send' 'p3p1_recv' 'p3p1_send' 'p3p2_recv' 'p3p2_send' 'tcp_segments_sent' 'tcp_segments_received' 'tcp_segments_retransmitted' 'tcp_segments_sent' 'tcp_segments_received' 'tcp_segments_retransmitted' 'tcp_segments_sent' can not find host counter: ?@ on hyperion... we only have: 'tcp_segments_sent' 'tcp_segments_received' 'tcp_segments_retransmitted' 'lo_recv' 'lo_send' 'p132p1_recv' 'p132p1_send' 'p3p1_recv' 'p3p1_send' 'p3p2_recv' can not find host counter: ?> on hyperion... we only have: 'tcp_segments_sent' 'tcp_segments_received' 'tcp_segments_retransmitted' 'lo_recv' 'lo_send' 'p132p1_recv' 'p132p1_send' 'p3p1_recv' 'p3p1_send' 'p3p2_recv' 'p3p2_send' 'tcp_segments_sent' 'tcp_segments_received' 'tcp_segments_retransmitted' 'tcp_segments_sent' 'tcp_segments_received' 'tcp_segments_retransmitted' 'tcp_segments_sent' 'tcp_segments_received' 'tcp_segments_retransmitted' 'tcp_segments_sent' 'tcp_segments_received' 'tcp_segments_retransmitted' 'lo_recv' 'lo_send' 'p132p1_recv' 'p132p1_send' 'p3p1_recv' 'p3p1_send' 'tcp_segments_received' 'tcp_segments_retransmitted' 'tcp_segments_sent' 'tcp_segments_received' 'tcp_segments_retransmitted' 'lo_recv' 'lo_send' 'p132p1_recv' 'p132p1_send' 'p3p1_recv' 'p3p1_send' 'p3p2_recv' 'p3p2_send' 'lo_recv' 'lo_send' 'p132p1_recv' 'p132p1_send' 'p3p1_recv' 'p3p2_recv' 'p3p2_send' 'lo_recv' 'lo_send' 'p132p1_recv' 'p132p1_send' 'p3p1_recv' 'p3p1_send' 'p3p2_recv' 'p3p2_send' 'lo_recv' 'lo_send' 'p132p1_recv' 'p132p1_send' 'p3p1_recv' 'p3p1_send' 'p3p2_recv' 'p3p2_send' 'lo_recv' 'lo_send' 'p132p1_recv' 'p132p1_send' 'p3p1_recv' 'p3p1_send' 'p3p2_recv' 'p3p2_send' 'p3p1_send' 'p3p2_recv' 'p3p2_send' 'lo_recv' 'lo_send' 'p132p1_recv' 'p132p1_send' 'p3p1_recv' 'p3p1_send' 'p3p2_recv' 'p3p2_send' 'lo_recv' 'lo_send' 'p132p1_recv' 'p132p1_send' ... Running ctests/sdsc4-mpx:sdsc4.c FAILED Line # 307 Error: Values differ from reference Running ctests/sdsc-mpx:sdsc.c FAILED Line # 55 Error: Error on 0, 1766903735.000000>0.200000 and 1766903735>100000 ... Running ctests/thrspecific:*** glibc detected *** ./ctests/thrspecific: double free or corruption (out): 0x00007fc828003920 *** ... ./ctests/thrspecific[0x402365] 'tcp_segments_sent' /lib64/libpthread.so.0[0x3423c07b31] 'tcp_segments_received' 'tcp_segments_received' 'tcp_segments_retransmitted' 'tcp_segments_retransmitted' 'lo_recv' 00400000-0049f000 r-xp 00000000 08:06 1846384 /home/fedora/rpms/BUILD/papi-4.1.4/src/ctests/thrspecific 0069e000-006bf000 rw-p 0009e000 08:06 1846384 /home/fedora/rpms/BUILD/papi-4.1.4/src/ctests/thrspecific 006bf000-046c4000 rw-p 00000000 00:00 0 04b70000-04bc4000 rw-p 00000000 00:00 0 [heap] 3422c00000-3422c1f000 r-xp 00000000 08:03 2097154 /lib64/ld-2.14.so 3422e1e000-3422e1f000 r--p 0001e000 08:03 2097154 /lib64/ld-2.14.so 3422e1f000-3422e20000 rw-p 0001f000 08:03 2097154 /lib64/ld-2.14.so 3422e20000-3422e21000 rw-p 00000000 00:00 0 3423000000-342318f000 r-xp 00000000 08:03 2097155 /lib64/libc-2.14.so 342318f000-342338f000 ---p 0018f000 08:03 2097155 /lib64/libc-2.14.so 342338f000-3423393000 r--p 0018f000 08:03 2097155 /lib64/libc-2.14.so 3423393000-3423394000 rw-p 00193000 08:03 20run_tests.sh: line 114: 713 Aborted (core dumped) $VALGRIND ./$i $TESTS_QUIET ...
Some of the net component problems appear to be caused by the new network interfaces names in Fedora >= 15. The F15 system has the following network interfaces up: lo, p132p1, p3p1 and p3p2. Although the net component detects these interfaces, it doesn't appear to support them natively (it appears to have LO and ETH0..ETH4 hardcoded). It only lists bytes received and written for the new p* interfaces. There is room for improvement as it appears to be using the ifconfig output as source of the network stats. Partial output of papi_native_avail: ---------- ... -------------------------------------------------------------------------------- 0x50000000 tcp_segments_sent | # of TCP segments sent | -------------------------------------------------------------------------------- 0x50000001 tcp_segments_received | # of TCP segments received | -------------------------------------------------------------------------------- 0x50000002 tcp_segments_retransmitted | # of TCP segments retransmitted | -------------------------------------------------------------------------------- 0x50000003 lo_recv | bytes received on this interface | -------------------------------------------------------------------------------- 0x50000004 lo_send | bytes written on this interface | -------------------------------------------------------------------------------- 0x50000005 p132p1_recv | bytes received on this interface | -------------------------------------------------------------------------------- 0x50000006 p132p1_send | bytes written on this interface | -------------------------------------------------------------------------------- 0x50000007 p3p1_recv | bytes received on this interface | -------------------------------------------------------------------------------- 0x50000008 p3p1_send | bytes written on this interface | -------------------------------------------------------------------------------- 0x50000009 p3p2_recv | bytes received on this interface | -------------------------------------------------------------------------------- 0x5000000a p3p2_send | bytes written on this interface | -------------------------------------------------------------------------------- 0x54000000 LO_RX_PACKETS | LO_RX_PACKETS | -------------------------------------------------------------------------------- 0x54000001 LO_RX_ERRORS | LO_RX_ERRORS | -------------------------------------------------------------------------------- 0x54000002 LO_RX_DROPPED | LO_RX_DROPPED | -------------------------------------------------------------------------------- 0x54000003 LO_RX_OVERRUNS | LO_RX_OVERRUNS | -------------------------------------------------------------------------------- 0x54000004 LO_RX_FRAME | LO_RX_FRAME | -------------------------------------------------------------------------------- 0x54000005 LO_RX_BYTES | LO_RX_BYTES | -------------------------------------------------------------------------------- 0x54000006 LO_TX_PACKETS | LO_TX_PACKETS | -------------------------------------------------------------------------------- 0x54000007 LO_TX_ERRORS | LO_TX_ERRORS | -------------------------------------------------------------------------------- 0x54000008 LO_TX_DROPPED | LO_TX_DROPPED | -------------------------------------------------------------------------------- 0x54000009 LO_TX_OVERRUNS | LO_TX_OVERRUNS | -------------------------------------------------------------------------------- 0x5400000a LO_TX_CARRIER | LO_TX_CARRIER | -------------------------------------------------------------------------------- 0x5400000b LO_TX_BYTES | LO_TX_BYTES | -------------------------------------------------------------------------------- 0x5400000c LO_COLLISIONS | LO_COLLISIONS | -------------------------------------------------------------------------------- 0x5400000d ETH0_RX_PACKETS | ETH0_RX_PACKETS | -------------------------------------------------------------------------------- 0x5400000e ETH0_RX_ERRORS | ETH0_RX_ERRORS | -------------------------------------------------------------------------------- 0x5400000f ETH0_RX_DROPPED | ETH0_RX_DROPPED | -------------------------------------------------------------------------------- 0x54000010 ETH0_RX_OVERRUNS | ETH0_RX_OVERRUNS | -------------------------------------------------------------------------------- 0x54000011 ETH0_RX_FRAME | ETH0_RX_FRAME | -------------------------------------------------------------------------------- 0x54000012 ETH0_RX_BYTES | ETH0_RX_BYTES | -------------------------------------------------------------------------------- 0x54000013 ETH0_TX_PACKETS | ETH0_TX_PACKETS | -------------------------------------------------------------------------------- 0x54000014 ETH0_TX_ERRORS | ETH0_TX_ERRORS | -------------------------------------------------------------------------------- 0x54000015 ETH0_TX_DROPPED | ETH0_TX_DROPPED | -------------------------------------------------------------------------------- 0x54000016 ETH0_TX_OVERRUNS | ETH0_TX_OVERRUNS | -------------------------------------------------------------------------------- 0x54000017 ETH0_TX_CARRIER | ETH0_TX_CARRIER | -------------------------------------------------------------------------------- 0x54000018 ETH0_TX_BYTES | ETH0_TX_BYTES | -------------------------------------------------------------------------------- 0x54000019 ETH0_COLLISIONS | ETH0_COLLISIONS | -------------------------------------------------------------------------------- ... ----------
(In reply to comment #9) > William, > > > A couple more notes regarding the coretemp problem reported above (and a > correction to your mail > http://lists.eecs.utk.edu/pipermail/perfapi-devel/2011-October/004865.html): > > * the crash occurred in the two systems where I tested: > > i) Intel i7 system running Fedora 15 x86_64 > (kernel 2.6.40.6-0.fc15.x86_64) > ii) Dual-Xeon Dell server running Scientific Linux 6.1 x86_64 > (kernel 2.6.32-131.12.1.el6.x86_64) > > * both systems have the /sys/class/hwmon directory > > i) in the Fedora 15 system it contains a symbolic link > > lrwxrwxrwx. 1 root root 0 Oct 21 06:12 hwmon0 -> > ../../devices/pci0000:00/0000:00:03.0/0000:01:00.0/hwmon/hwmon0 > > The entry appears to be related to the system Radeon graphic card: > > # cat /sys/class/hwmon/hwmon0/name > radeon > > ii) in the SL6.1 system it is empty > > /jpo * PAPI Component Repository http://icl.cs.utk.edu/projects/papi/repository/index.php/Main_Page Problem: doesn't list the net component * CoreTemp http://icl.cs.utk.edu/projects/papi/repository/index.php/CoreTemp Note: the kernel module coretemp needs to be loaded (default: not loaded) (modprobe coretemp --> /sys/class/hwmon/hwmon[0-9]+ entries)
William Cohen, Regarding the coretemp patch: I think it would be better to make the function generateEventList() return a PAPI error if it isn't able to locate hwmon*/{temp,fan}* files. Something like: --- papi-4.1.4/src/components/coretemp/linux-coretemp.c 2011-09-15 01:46:22.000000000 +0100 +++ papi-4.1.4-modified/src/components/coretemp/linux-coretemp.c 2011-10-21 21:17:53.057865554 +0100 @@ -91,6 +91,12 @@ } closedir(dir); + + if ( count == 0 ) { + PAPIERROR("Oops: No temperature/fan performance counters found! Are you sure the coretemp module is loaded?\n"); + return( PAPI_ENOCNTR ); + } + return (count); } ---------- With this patch the coretemp_init_substrate() "if (NUMEVENTS==0)..." test could be dropped. Caveat: with this patch will cause all papi applications to fail if the coretemp component isn't able to find any temperature/fan performance counters. Failure example in the Fedora 15 system without the coretemp module loaded: ---------- $ papi_component_avail PAPI Error: Oops: No temperature/fan performance counters found! Are you sure the coretemp module is loaded? . component.c FAILED Line # 87 Error in PAPI_library_init: PAPI_ENOCNTR ---------- Too harsh? Note that coretemp_init_substrate() returns a papi error if it doesn't find the /sys/class/hwmon directory. Any thoughts? jpo
William, Would be possible to update the papi SRPM with latest CVS source code? It includes code committed today [1] and friday. tia, jpo [1] - the version has already been bumped to 4.2.0
For comment 14 the software shouldn't assume that the hardware support is there for that. If it fails, then papi will not work on any machine that doesn't have the hardware support for coretemp. The code needs to be graceful about. There is a scratch build for F15 of the current version papi cvs in http://koji.fedoraproject.org/koji/taskinfo?taskID=3457926 Would prefer to use official release for rawhide. Once there is an official release of papi 4.2, a build will be pushed through.
(In reply to comment #16) > For comment 14 the software shouldn't assume that the hardware support is there > for that. If it fails, then papi will not work on any machine that doesn't have > the hardware support for coretemp. The code needs to be graceful about. I agree. > There is a scratch build for F15 of the current version papi cvs in > > http://koji.fedoraproject.org/koji/taskinfo?taskID=3457926 Thanks for the new scratch build (note that it only builds papi with the lustre component active). > Would prefer to use official release for rawhide. Once there is an official > release of papi 4.2, a build will be pushed through. Ack. jpo
Still regarding the coretemp component: * the last commit to the linux-coretemp.c file no longer makes papi terminate if the /sys/class/hwmon directory doesn't exist: ---------- ... dir = opendir(base_dir); if ( dir == NULL ) { SUBDBG("Can't find %s, are you sure the coretemp module is loaded?\n",ยท base_dir); return 0; } ... ------------ The problem is that the debug message is not entirely correct as the /sys/class/hwmon directory is not created by the coretemp module. Quote from http://www.mjmwired.net/kernel/Documentation/thermal/sysfs-api.txt ---------- ... 106 2. sysfs attributes structure 107 108 RO read only value 109 RW read/write value 110 111 Thermal sysfs attributes will be represented under /sys/class/thermal. 112 Hwmon sysfs I/F extension is also available under /sys/class/hwmon 113 if hwmon is compiled in or built as a module. ... ----------- * I still would like to see generateEventList() emit a warning if it doesn't find any counters: closedir(dir); + + if ( count == 0 ) { + SUBDBG("No thermal counters found! Are you sure the coretemp module is loaded?\n"); + } + return (count); }
SUT: Fedora 15 x86_64 with papi compiled with 5 components Some tests are still experiencing memory corruption problems. For example, runnning profile_pthreads produces different output (and sometimes it crashes): $ ctests/profile_pthreads please subscribe only once to each counter $ ctests/profile_pthreads can not find host counter: = on <myhostname> we only have: 'tcp_segments_sent' 'tcp_segments_received' 'tcp_segments_retransmitted' 'lo_recv' 'lo_send' 'p132p1_recv' 'p132p1_send' 'p3p1_recv' 'p3p1_send' 'p3p2_recv' 'p3p2_send' 'tcp_segments_sent' 'tcp_segments_received' 'tcp_segments_retransmitted' 'tcp_segments_sent' 'tcp_segments_received' 'tcp_segments_retransmitted' 'tcp_segments_sent' 'tcp_segments_received' 'tcp_segments_retransmitted' 'lo_recv' 'lo_send' 'p132p1_recv' 'p132p1_send' 'p3p1_recv' 'p3p1_send' 'p3p2_recv' please subscribe only once to each counter 'p3p2_send' 'tcp_segments_sent' $ ctests/profile_pthreads *** glibc detected *** ctests/profile_pthreadsplease subscribe only once to each counter
It looks like the luster component is causing these failures. Turned off the luster component in: http://koji.fedoraproject.org/koji/taskinfo?taskID=3459523
(In reply to comment #20) > It looks like the luster component is causing these failures. Turned off the > luster component in: > > http://koji.fedoraproject.org/koji/taskinfo?taskID=3459523 Thanks for the new build.
Created attachment 530341 [details] coretemp_basic fails to print the first event name Changes: * bug: prints the first event name * enhancement: prints the component name
William, Could you send upstream the patch attached in comment 22? My PAPI's MLs subscription requests are still pending approval. tia, jpo
There is still another problem with the coretemp component registration: * In my F15 system I've 20 temp* entries under /sys/class/hwmon/hwmon1/device and the coretemp test program list them all (with the patch from comment 22 applied). The problem is that the above number of events is not registered in the component info struct. Dumping some of the values I get: ---------- Component: 2 : Name = $Id: linux-coretemp.c,v 1.13 2011/10/25 15:20:41 vweaver1 Exp $ : Version = $Revision: 1.13 $ : Hardware counters = 512 : Mux Hw counters = 32 : Preset events = 0 : Native events = 0 ... ---------- I would expect the number of hardware counters and/or the number of native events to be 20. /jpo Source code: ... numcmp = PAPI_num_components(); for ( cid=0; cid<numcmp; cid++ ) { printf("Component: %d\n",cid); if ( (cmpinfo = PAPI_get_component_info(cid)) == NULL) exit(EXIT_FAILURE); printf("\t: Name = %s\n", cmpinfo->name); printf("\t: Version = %s\n", cmpinfo->version); printf("\t: Hardware counters = %d\n", cmpinfo->num_cntrs); printf("\t: Mux Hw counters = %d\n", cmpinfo->num_mpx_cntrs); printf("\t: Preset events = %d\n", cmpinfo->num_preset_events); printf("\t: Native events = %d\n", cmpinfo->num_preset_events); } ...
Created attachment 530347 [details] Dump fields of PAPI_component_info_t Program to dump some fields of PAPI_component_info_t structs. Note: The code at the end of comment 24 didn't print the number of native events (it was printing the number of preset events instead).
(In reply to comment #23) > William, > > Could you send upstream the patch attached in comment 22? My PAPI's MLs > subscription requests are still pending approval. I've already sent the patch to the perfapi-devel mailing list. /jpo
The official papi-4.2.0.tar.gz has been downloaded and an rpm built for rawhide. The koji build of it is below: http://koji.fedoraproject.org/koji/buildinfo?buildID=271035
Problem in comment 24 has been fixed fixed upstream: http://icl.cs.utk.edu/viewcvs/viewcvs.cgi/PAPI/papi/src/components/coretemp/linux-coretemp.c?r1=1.15&r2=1.16
Built papi with coretemp patches. Fixed in papi-4.2.0-2.fc17: http://koji.fedoraproject.org/koji/rpminfo?rpmID=2771535
William, FYI: Yesterday Vince committed the rewritten net component I have sent him two weeks ago. It now reads the network events directly from /proc/net/dev instead of parsing the output of the external ifconfig command. Changelog: http://icl.cs.utk.edu/viewcvs/viewcvs.cgi/PAPI/papi/src/components/net/CHANGES?revision=1.1&view=markup /jpo
Closing ticket. Current Fedora 17/18 PAPI builds: papi-4.2.0-4.fc17, papi-4.2.1-1.fc18