Bug 495074 - nohup jobs killed on logout if konsole is open!
Summary: nohup jobs killed on logout if konsole is open!
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: openssh
Version: 9
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Jan F. Chadima
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-04-09 16:10 UTC by Sammy
Modified: 2009-07-14 15:27 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-07-14 15:27:50 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Sammy 2009-04-09 16:10:58 UTC
This is something we have been trying to track down for a while.


Problem:

1. We start a openmp run of a fortran program using intel compiler on a
   quad 64 bit system. The run is started with "nohup run&" command, where
   run is a small script that does:

ulimit -s 1200000
time ./xtdhf

2. If we leave the konsole window that the nohup process started from and
   logout job is killed.

3. If we ctrl-D and close the konsole window and logout job keeps running.

4. Happens every time.

Details: Fedora 9+all updates including test. Using KDE-4.2.2. X86_64 system.

I had been having a similar problem which I think are related:
When I ssh from home and start the same job as above and do not log out, my
providers connection times out and kills the ssh connnection. The job is then
killed. If I ctrl-d out from the ssh connection the job keeps running not matter
how many time I log in and out or even if it times out.

Thanks

Comment 1 Kamil Dudka 2009-04-09 16:37:18 UTC
maybe related to Bug 467622

Comment 2 Sammy 2009-04-09 18:43:31 UTC
It is related since I reported that too.

Did some more testing....if I repeat the above experiment with
different binaries it does not crash. For example I tried bulding
kdelibs from my account by putting "make -j4" into the run script,
and it was running on 4 cpu's. I kept the nohup window open and
logged in an out and it did not crash!

Which, makes me remember something I heard but vaguely remember;

Intel compiler has all its dynamical libraries in a different location
then the system ones. I specify those in /etc/ld.so.conf file and 
everything works fine.

I am using nvidia binary drivers.

Apparently, "libGL.so.1 "takes over" dynamic library loading" here:

 http://www.nvnews.net/vbulletin/showthread.php?t=131234

It may just be the case since if we disable some kde "effects" for logout
that link to GL the crash does not happen.

Anyway I can get around this?

Comment 3 Ondrej Vasik 2009-04-10 08:09:10 UTC
I think you can get around this by running "nohup run&" not from konsole, but with 
"ssh <remote machine> nohup run&" from your computer or by running it directly in terminal where you are logged on via ssh... then it should survive. If I understand correctly, you are connected to the remote machine via ssh, you run konsole on remote machine and "nohup run&" is a child process of konsole. ssh timeouts, konsole is killed by SIGHUP and bash kills child processes (not with SIGHUP) as well. IMHO not a bug in nohup and something what could be reassigned to either bash, ssh or x11 - but I doubt they have any chance to fix it... 

Could you please confirm this solution? If so, I'll close it NOTABUG, otherwise CANTFIX ...

Comment 4 Sammy 2009-04-10 13:20:58 UTC
Not exactly. I connect to remote machine via ssh, I do not run konsole
on the remote machine, I just do "nohup run&" in the shell login. If
I then ctrl-d and kill the connection the job keep running fine, but
if I let the ssh timeout (which I did not set since if I ssh locally
between computers in my office the connection never times out) the job
is killed as well.

I think these two things are related somehow.

However, what I describe in the original bug report is more important
and only happens to applications that have their libraries in an
unconventional location (like the intel compiler). This is happening
to all the members of my research group. They start a nohup job and
would like to logout without closing their windows and the job is gone!
something weird is going on!

Comment 5 Ondrej Vasik 2009-04-22 10:54:26 UTC
I'll try to reassign that to openssh, as I guess it's not fault of nohup. We'll see if that's the correct component which could act/prevent such thing, for me it looks like nohup is killed by SIGKILL, but some strace might say more about this...

Comment 6 Sammy 2009-04-23 23:16:36 UTC
I don't think it is the openssh. I have done more testing:

1. Open konsole and run a true openmp job (not make -j4). If we now logout
   the job is aborted or hangs on a single cpu.
.  More detailed examination shows:  
  
   a) If one compiles the same program to run on a single processor the job
      keeps running after logout/login.
   b) Compiling the program to link statically with all the libraries does not
      help.
   c) If I run the program via strace (still using nohup) I get thousands of
      lines of "sched_yield() = 0" in the strace output (in the case of crash
      or hang).

2, Everything works perfectly if one closes the konsole window that started the
   nohup job before logout.

   The question is what does logout do to open shells (konsole). KDE is obviously
   saving some information about that window to open it exactly as it was in the
   next login. Does it try to save the processes it is running as well? I think
   the key is understanding this.

    Thanks

Comment 7 Bug Zapper 2009-06-10 03:39:44 UTC
This message is a reminder that Fedora 9 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 9.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '9'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 9's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 9 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 8 Bug Zapper 2009-07-14 15:27:50 UTC
Fedora 9 changed to end-of-life (EOL) status on 2009-07-10. Fedora 9 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.