This is something we have been trying to track down for a while. Problem: 1. We start a openmp run of a fortran program using intel compiler on a quad 64 bit system. The run is started with "nohup run&" command, where run is a small script that does: ulimit -s 1200000 time ./xtdhf 2. If we leave the konsole window that the nohup process started from and logout job is killed. 3. If we ctrl-D and close the konsole window and logout job keeps running. 4. Happens every time. Details: Fedora 9+all updates including test. Using KDE-4.2.2. X86_64 system. I had been having a similar problem which I think are related: When I ssh from home and start the same job as above and do not log out, my providers connection times out and kills the ssh connnection. The job is then killed. If I ctrl-d out from the ssh connection the job keeps running not matter how many time I log in and out or even if it times out. Thanks
maybe related to Bug 467622
It is related since I reported that too. Did some more testing....if I repeat the above experiment with different binaries it does not crash. For example I tried bulding kdelibs from my account by putting "make -j4" into the run script, and it was running on 4 cpu's. I kept the nohup window open and logged in an out and it did not crash! Which, makes me remember something I heard but vaguely remember; Intel compiler has all its dynamical libraries in a different location then the system ones. I specify those in /etc/ld.so.conf file and everything works fine. I am using nvidia binary drivers. Apparently, "libGL.so.1 "takes over" dynamic library loading" here: http://www.nvnews.net/vbulletin/showthread.php?t=131234 It may just be the case since if we disable some kde "effects" for logout that link to GL the crash does not happen. Anyway I can get around this?
I think you can get around this by running "nohup run&" not from konsole, but with "ssh <remote machine> nohup run&" from your computer or by running it directly in terminal where you are logged on via ssh... then it should survive. If I understand correctly, you are connected to the remote machine via ssh, you run konsole on remote machine and "nohup run&" is a child process of konsole. ssh timeouts, konsole is killed by SIGHUP and bash kills child processes (not with SIGHUP) as well. IMHO not a bug in nohup and something what could be reassigned to either bash, ssh or x11 - but I doubt they have any chance to fix it... Could you please confirm this solution? If so, I'll close it NOTABUG, otherwise CANTFIX ...
Not exactly. I connect to remote machine via ssh, I do not run konsole on the remote machine, I just do "nohup run&" in the shell login. If I then ctrl-d and kill the connection the job keep running fine, but if I let the ssh timeout (which I did not set since if I ssh locally between computers in my office the connection never times out) the job is killed as well. I think these two things are related somehow. However, what I describe in the original bug report is more important and only happens to applications that have their libraries in an unconventional location (like the intel compiler). This is happening to all the members of my research group. They start a nohup job and would like to logout without closing their windows and the job is gone! something weird is going on!
I'll try to reassign that to openssh, as I guess it's not fault of nohup. We'll see if that's the correct component which could act/prevent such thing, for me it looks like nohup is killed by SIGKILL, but some strace might say more about this...
I don't think it is the openssh. I have done more testing: 1. Open konsole and run a true openmp job (not make -j4). If we now logout the job is aborted or hangs on a single cpu. . More detailed examination shows: a) If one compiles the same program to run on a single processor the job keeps running after logout/login. b) Compiling the program to link statically with all the libraries does not help. c) If I run the program via strace (still using nohup) I get thousands of lines of "sched_yield() = 0" in the strace output (in the case of crash or hang). 2, Everything works perfectly if one closes the konsole window that started the nohup job before logout. The question is what does logout do to open shells (konsole). KDE is obviously saving some information about that window to open it exactly as it was in the next login. Does it try to save the processes it is running as well? I think the key is understanding this. Thanks
This message is a reminder that Fedora 9 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 9. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '9'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 9's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 9 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Fedora 9 changed to end-of-life (EOL) status on 2009-07-10. Fedora 9 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.