Bug 509764
Summary: | "mpirun -gdb" produces a time out | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Laurent Aguerreche <laurent.aguerreche+redhat> |
Component: | mpich2 | Assignee: | Deji Akingunola <dakingun> |
Status: | CLOSED WONTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | high | Docs Contact: | |
Priority: | low | ||
Version: | 11 | CC: | buntinas, dakingun |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2010-06-28 13:29:47 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Laurent Aguerreche
2009-07-05 23:34:04 UTC
What version of mpich2 has this problem? I have that on my system : $ rpm -qa | grep mpich2 mpich2-libs-1.1-1.fc11.x86_64 mpich2-1.1-1.fc11.x86_64 mpich2-devel-1.1-1.fc11.x86_64 Laurent, Are you sure your app is an mpi application and has been compiled with the mpicc from mpich2-devel-1.1-1.fc11.x86_64? Try this: cd /tmp mpicc /usr/share/mpich2/examples_graphics/cpi.c -o cpi mpiexec -gdb ./cpi You should get a gdb prompt (without the timeout message). If this doesn't work, can you post the output of which mpicc rpm -qf `which mpicc` Thanks! My program was not built with mpicc but directly with g++ and flags from `pkg-config mpich2-ch3` so I rebuilt it. Unfortunately I do not see any difference. I am sure that my program uses MPI since it is able to run 3 processes on my machine and let them communicate together. My program is quite big and loads many plugins when it starts so would it be possible that the timeout value is too low? By the way, I built cpi.c and I have been able to run it under gdb... (In reply to comment #4) > My program was not built with mpicc but directly with g++ and flags from > `pkg-config mpich2-ch3` so I rebuilt it. > Can you please rebuild your program with mpicxx, and try run 'mpirun -gdb' again? The Cflags from `pkg-config mpich2-ch3` are not exactly the same flags you'll get from building with mpicxx (check out 'mpicxx -show'). Maybe we need to rework the pkgconfig flags. > Unfortunately I do not see any difference. I am sure that my program uses MPI > since it is able to run 3 processes on my machine and let them communicate > together. My program is quite big and loads many plugins when it starts so > would it be possible that the timeout value is too low? > > By the way, I built cpi.c and I have been able to run it under gdb... (In reply to comment #4) > My program was not built with mpicc but directly with g++ and flags from > `pkg-config mpich2-ch3` so I rebuilt it. > > Unfortunately I do not see any difference. I am sure that my program uses MPI > since it is able to run 3 processes on my machine and let them communicate > together. My program is quite big and loads many plugins when it starts so > would it be possible that the timeout value is too low? That is possible. Can you try moving MPI_Init() to the very beginning of your program (before loading any plugins)? That may help if it's a timeout problem. Alternatively, you can edit src/pm/mpd/mpdgdbdrv.py line 100, and change the "3" at the end of the line to something larger (10 should be plenty), then do a "make install" from the top of the build directory again. > By the way, I built cpi.c and I have been able to run it under gdb... OK, that's a good sign. -d Sorry for the (very) long delay but I think I found the problem! I tried to raise the timeout value and to move the MPI::Init() function at the beginning of my program without any success. Then I tried to print some messages in mpdgdbdrv.py but they were lost somewhere (messages are probably redirected). So I ran directly this script after I added the following lines : stdout.write("b \"" + gdb_line + "\"") stdout.flush() I added them around the line 204, after the line: while not gdb_line.startswith('Breakpoint'): but before the "try". This is the output I got: $ /usr/bin/mpdgdbdrv.py my_program b ""b " "b " Breakpoint 1 at 0x413a96: file /home/me/my_program/Source/main.cpp, line 50. "b "(gdb) "mpdgdbdrv (<module> 115): timed out waiting for initial Breakpoint response $ Look at the spaces before the word "Breakpoint" so this line does not start with "Breakpoint"! Consequently I added the line: gdb_line = gdb_line.strip() after: gdb_line = gdb_sout_serr.readline() # drain breakpoint response Now, "mpirun -gdb" seems to work... But one thing: text completion does not work for file names for instance, is it possible to fix that? Rgds, Laurent. Nice catch! I'll add that fix to the repo. Unfortunately, text completion would not be an easy thing to add. The text is being read by the front-end mpiexec program and just forwarded to the back-end gdb instances. The front-end knows nothing about the symbols in your program, it just forwards commands. So, it would be a major undertaking to add that feature. We'd probably be better off spending time getting mpich2 to work with the debugger in eclipse/ptp. Sorry. -d This message is a reminder that Fedora 11 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 11. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '11'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 11's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 11 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Fedora 11 changed to end-of-life (EOL) status on 2010-06-25. Fedora 11 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed. |