Escalated to Bugzilla from IssueTracker
Event posted on 11-04-2009 08:35pm EST by woodard [ben@quince common-block]$ gdb ./tpcommon-init_gfortran44 GNU gdb Fedora (6.8-37.el5) Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu"... (gdb) b 30 Breakpoint 1 at 0x400be4: file tpcommon-init.f, line 30. (gdb) b 26 Breakpoint 2 at 0x4009c1: file tpcommon-init.f, line 26. (gdb) r Starting program: /home/ben/Work/TV-bugs/common-block/tpcommon-init_gfortran44 [Thread debugging using libthread_db enabled] [New Thread 0x2b1cd8042950 (LWP 19345)] Breakpoint 2, correct () at tpcommon-init.f:26 26 istart = 1 (gdb) p istart $1 = 0 (gdb) n 27 !$omp parallel private(nthreads, iam, chunk) (gdb) p istart $2 = 1 (gdb) p &istart $3 = (PTR TO -> ( integer(kind=4) )) 0x2b1cd8042940 (gdb) c Continuing. [New Thread 0x427f7940 (LWP 19348)] [Switching to Thread 0x427f7940 (LWP 19348)] Breakpoint 1, MAIN__.omp_fn.0 (.omp_data_i=0x7fff169861f0) at tpcommon-init.f:30 30 nthreads = omp_get_num_threads() (gdb) p istart No symbol "istart" in current context. (gdb) list 30 25 N = 100 26 istart = 1 27 !$omp parallel private(nthreads, iam, chunk) 28 29 ! Compute the subset of iterations executed by each thread 30 nthreads = omp_get_num_threads() 31 iam = omp_get_thread_num() 32 chunk = (N + nthreads - 1)/nthreads 33 istart = iam * chunk + 1 34 iend = min((iam + 1) * chunk, N) The common block variable should still be in scope inside of the parallel section of code even though the implementation makes this an outlined function. Furthermore since the variable is thread private it should point to different addresses in each of the different threads in the common block. If you do not initialize the variable outside of the parallel region, it exposes a different error where the address of the variable can't be found by gdb. This event sent from IssueTracker by kbaxley [LLNL (HPC)] issue 362178
Event posted on 11-04-2009 08:45pm EST by woodard (gdb) b 51 Breakpoint 3 at 0x400b89: file tpcommon-init.f, line 51. (gdb) l 51 46 ! Subroutine to operate on a thread's portion of the array "iarray" 47 integer iarray(100), istart, iend, i 48 common /bounds/ istart, iend 49 !$omp threadprivate(/bounds/) 50 51 do i = istart, iend 52 iarray(i) = i * i 53 enddo 54 return 55 end (gdb) c Continuing. [Switching to Thread 0x2ad32baa9950 (LWP 19398)] Breakpoint 2, MAIN__.omp_fn.0 (.omp_data_i=0x7fff4c40e2f0) at tpcommon-init.f:30 30 nthreads = omp_get_num_threads() Current language: auto; currently fortran (gdb) c Continuing. 2 1 50 51 100 2 0 50 1 50 [Switching to Thread 0x41296940 (LWP 19401)] Breakpoint 3, work ( iarray=(1, 0, 1279320288, 32767, -1195323781, 56, -1601978046, 39363801, 1074791432, 33820679, 2244752, 34609696, 642122777, -1432354560, 1279322608, 32767, 732592504, 10963, 110142336, 6144, -2147409904, 1141254144, 1279321344, 32767, 1279321456, 32767, 6295752, 0, 0, 0, 0, 0, 0, 0, -1195331155, 56, 0, 0, 6295736, 0, 0, 0, 3, 0, 1279321456, 32767, -1195329149, 56, 727381208, 10963, 732592528, 10963, 0, 0, -1193161960, 56, 1279321344, 32767, -1195340777, 56, 1279321407, 32767, 732595456, 10963, 6, 0, 9, 0, 2090266759, 0, -1195339756, 56, 0, 0, 1279320736, 32767, 2090266758, 0, 1279321136, 32767, 1279321160, 32767, -1191167352, 56, 0, 0, 732597592, 10963, 732592528, 10963, -1195374822, 56, -1191117792, 56, -1195375680, 56, 0, 0, 2052, 1)) at tpcommon-init.f:51 51 do i = istart, iend (gdb) p istart $2 = 51 (gdb) p &istart $3 = (PTR TO -> ( integer(kind=4) )) 0x41296930 (gdb) c Continuing. [Switching to Thread 0x2ad32baa9950 (LWP 19398)] Breakpoint 3, work ( iarray=(1, 0, 1279320288, 32767, -1195323781, 56, -1601978046, 39363801, 1074791432, 33820679, 2244752, 34609696, 642122777, -1432354560, 1279322608, 32767, 732592504, 10963, 110142336, 6144, -2147409904, 1141254144, 1279321344, 32767, 1279321456, 32767, 6295752, 0, 0, 0, 0, 0, 0, 0, -1195331155, 56, 0, 0, 6295736, 0, 0, 0, 3, 0, 1279321456, 32767, -1195329149, 56, 727381208, 10963, 2601, 2704, 2809, 2916, 3025, 3136, 3249, 3364, 3481, 3600, 3721, 3844, 3969, 4096, 4225, 4356, 4489, 4624, 4761, 4900, 5041, 5184, 5329, 5476, 5625, 5776, 5929, 6084, 6241, 6400, 6561, 6724, 6889, 7056, 7225, 7396, 7569, 7744, 7921, 8100, 8281, 8464, 8649, 8836, 9025, 9216, 9409, 9604, 9801, 10000)) at tpcommon-init.f:51 51 do i = istart, iend (gdb) p istart $4 = 1 (gdb) p &istart $5 = (PTR TO -> ( integer(kind=4) )) 0x2ad32baa9940 Interestingly when you break in the subroutine called by the parallel section, you can actually see the thread private data as expected. In one case it is: (gdb) p istart $4 = 1 (gdb) p &istart $5 = (PTR TO -> ( integer(kind=4) )) 0x2ad32baa9940 and in the other it is: (gdb) p istart $2 = 51 (gdb) p &istart $3 = (PTR TO -> ( integer(kind=4) )) 0x41296930 kbaxley assigned to issue for LLNL (HPC). Status set to: Waiting on Tech This event sent from IssueTracker by kbaxley [LLNL (HPC)] issue 362178
Created attachment 367623 [details] reproducer
Event posted on 2009-11-05 09:08 PST by woodard I can sort of see why this happens. Though the common block is supposed to be accessible globally, you have to declare it to bring it into the scope of a particular function. In this case, we declare the common block in both the main program as well as the work subroutine. However, the compiler itself creates a function for the OpenMP parallel region. That function does not declare the common block, therefore the variable is not in scope inside this compiler generated function. I wrapped in the part about the variables coming back into scope in the function call out of the parallel region. I think that is a useful observation but I think that it mostly delineates that the problem is the fact that the common block variable is out of scope inside the static portion of the parallel region. The reason why the variables are in scope here as opposed to in the static portion of the parallel region is because the source code explicitly declares the common block bounds which brings the variables into scope. This event sent from IssueTracker by woodard issue 362178
according to fche the problem still persists in 4.4.2-7 from rawhide.
Created attachment 367857 [details] gcc44-rh533181.patch Patch against latest rawhide gcc which makes sure DW_AT_location for bounds/istart/iend is emitted even in the main program. With this you should be able to print istart/iend if you up to the function containing the parallel region (in the initial thread), so IMHO very similar thing to say: int foo (void) { static int var; int bar () { return var++; } bar (); bar (); bar (); return bar (); } int main (void) { return foo () - 3; } where also gcc emits DIE for var just in foo, not in the nested bar, and gdb handles it gracefully by looking the parent function's scope of a nested function after the nested function's scopes. While GCC could in theory also emit the DIE in the .omp_fn* function, that still won't handle the cases like: void foo () { int i = 5; static int j = 4; int k; i++; j++; #pragma omp parallel private (k) { k = 6; } } where one should be able in the debugger to print not just k, but also i and j which aren't ever referenced there. The OpenMP parallel/task region case is of course harder than just nested functions, is something that should be discussed between the compiler and debugger folks. For backtraces there is always an option to write hand crafted CFI info in some routine inside libgomp that will magically show the caller on the stack of the initial thread rather than showing backtrace back to libgomp/libpthread internals. Or teach the debugger somehow to figure it out on its own, either by teaching it about libgomp internals, or by shipping a library similar to libthread_db.so which gdb loads and asks it about glibc libpthread internals.
(In reply to comment #6) > where one should be able in the debugger to print not just k, but also i and j > which aren't ever referenced there. The OpenMP parallel/task region case is of > course harder than just nested functions, Could you please point out some OpenMP region case not solvable by nested DW_TAG_subprogram DIEs as being used for GCC nested functions?
Any #pragma omp parallel when having more than one thread. So say: int main () { static int i; int j = 1; int k; #pragma omp parallel num_threads(2) { k = 2; __builtin_printf ("%d\n", k); } return 0; } to keep it short. b 10 bt p i p j p k (for both threads). For the non-initial thread the backtrace is currently reported as: #0 main.omp_fn.0 (.omp_data_i=0x7fffffffe480) at t3.c:10 #1 0x0000003510a07fd2 in ?? () from /usr/lib64/libgomp.so.1 #2 0x000000350d20686a in start_thread () from /lib64/libpthread.so.0 #3 0x000000350c6de3bd in clone () from /lib64/libc.so.6 #4 0x0000000000000000 in ?? () and obviously you can't up to main that way to see the vars. In the initial thread the backtrace is: #0 main.omp_fn.0 (.omp_data_i=0x7fffffffe480) at t3.c:10 #1 0x000000000040061b in main () at t3.c:7 and you can up to see i, j, though of course it would be better if gdb did that even without up like it does in other cases. BTW, with F12 gcc/gdb even: int foo (void) { static __thread int tlsvar; int bar () { return tlsvar++; } bar (); bar (); bar (); return bar (); } int main (void) { return foo () - 3; } doesn't work right, not even in foo, but what gcc emitted looks correct to me.
Event posted on 2010-04-29 09:27 PDT by woodard Problem still seems to exist in RHEL6's latest with: gcc-gfortran-4.4.3-19.el6.x86_64 gdb-7.1-17.el6.x86_64 as can be seen: Current directory is /tmp/ GNU gdb (GDB) Red Hat Enterprise Linux (7.1-17.el6) Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /tmp/a.out...done. (gdb) b 30 Breakpoint 1 at 0x400bfc: file tpcommon-init.f, line 30. (gdb) b 26 Breakpoint 2 at 0x4009d9: file tpcommon-init.f, line 26. (gdb) r Starting program: /tmp/a.out [Thread debugging using libthread_db enabled] Breakpoint 2, correct () at tpcommon-init.f:26 (gdb) p istart $1 = 0 (gdb) n (gdb) p istart $2 = 1 (gdb) c Continuing. [New Thread 0x7ffff6c93710 (LWP 20131)] [Switching to Thread 0x7ffff6c93710 (LWP 20131)] Breakpoint 1, MAIN__.omp_fn.0 (.omp_data_i=0x7fffffffdfa0) at tpcommon-init.f:30 (gdb) p istart No symbol "istart" in current context. (gdb) up #1 0x00007ffff765f022 in gomp_thread_start (xdata=<value optimized out>) at ../../../libgomp/team.c:115 (gdb) p istart No symbol "istart" in current context. (gdb) This event sent from IssueTracker by woodard issue 362178
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0102.html