Bug 67414

Summary: atd doesnt clean up old jobs
Product: [Retired] Red Hat Linux Reporter: Need Real Name <nkk>
Component: atAssignee: Jens Petersen <petersen>
Status: CLOSED RAWHIDE QA Contact: Aaron Brown <abrown>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.2CC: tao
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2002-07-19 09:25:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
patch to correct some of atd's bad behavior. none

Description Need Real Name 2002-06-24 18:51:26 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)

Description of problem:
I went through most of the newsgroups and still did not find an answer for my 
problem with at command.  I seek your help in fixing, what looks like a bug.
Any help that you might provide is greatly appreciated.

Thanks
Karthik Nandyal
Instinet Corp,
New York, NY.

Issue 1
----------


Problem : The at job shows up in queue  even after it has been run successfully.

How this problem affects us ?   - Any new job that I submit after this behaves 
randomly (meaning that the new job runs sometimes)

My observations -   If I submit a subsequent job, then the queue gets cleared. 
But this happens ONLY if the subsequent jobs runs successfully.  It is almost 
like the new job "kicks"  the stale job out of the queue.

Example:

[root@nh0029 root]# date
Mon Jun 24 12:03:13 EDT 2002       <===  Current Time

[root@nh0029 root]# atq
19      2002-06-24 11:09 = rts     <===  This job completed normally. But is 
not  removed from the queue.

[root@nh0029 at]# ls -la /var/spool/at
drwx------    3 daemon   daemon       1024 Jun 24 11:09 .
drwxr-xr-x   11 root     root         1024 Apr  5 12:02 ..
-rwx------    1 rts      66           3878 Jun 24 11:09 =000130104a74d  <===  
This is the job and still is there.
-rw-------    1 daemon   daemon          6 Jun 24 11:09 .SEQ
drwx------    2 daemon   daemon       1024 Jun 24 11:09 spool

[root@nh0029 at]# uname -a
Linux nh0029 2.4.18-3smp #1 SMP Thu Apr 18 07:27:31 EDT 2002 i686 unknown



[root@nh0001 root]# atq -V
at version 3.1.8
Bug reports to: ig25.de (Thomas Koenig)


Issue 2
-------



Problem :  Multiple atd  starts up when i invoke atd daemon using the rc 
scripts.

How this problem affects us ?   -  Since there is more than one atd,   my job 
gets confused as to which one it should get from the pid and use it.

My observations -   I manually kill the second atd.


Example:


[root@nh0029 at]# service atd stop
stopping atd:                                              [  OK  ]    <====  
atd  is now stopped
[root@nh0029 at]# service atd start
Starting atd:                                              [  OK  ]    <====  I 
start atd

[root@nh0029 at]# ps -ef | grep -i atd
daemon   23116     1  0 11:09 ?        00:00:00 /usr/sbin/atd          <==== 2 
instances of atd startup
daemon   23589     1  0 12:06 ?        00:00:00 /usr/sbin/atd

[root@nh0029 at]# cat /var/run/atd.pid
23589                                                              <==== only 
one pid is trapped in at.pid


Version-Release number of selected component (if applicable):


How reproducible:
Sometimes

Steps to Reproduce:
1.submit a job using at command and then do a atq
2.Let the job run as it should
3.atq  now still shows the job
	

Actual Results:  atq still shows old jobs

Expected Results:  atq should not have shown anything in queue and also should 
have removed the job from /var/spool/at/

But it doesny

Additional info:

Comment 1 Mike Gahagan 2002-07-01 20:07:50 UTC
This system was actually running 7.3, not 7.2


Comment 2 David Sainty 2002-07-09 17:52:15 UTC
1. "=" does not mean that the job has completed successfully.  "=" means that
the job is currently running (big difference).  If your job (whatever it was)
consistented of a waiting / looping / runaway process, you would see the at job
remaining in the "=" (running) queue.  It would appear that this is what was
really happening on your machine in question.  You can verify through the ps and
top commands just what processes are currently doing what (e.g. "ps -xaf"). 
When you say that the job had completed normally, are you _sure_ that the job
can completely terminated successfully (did you check the running processes)?

2. The problem w/ old jobs remaining and new jobs not running is a curious one.
 Even an old job in the "=" queue (i.e. currently running) will not block new
jobs from running.  Two cases which I suppose could cause problems like this:
    a) file system runs out of space
    b) if more than one at daemon (atd) is running (see below).

3. The issue w/ the atd process not terminating when the "service atd stop" is
issued, and the second atd process starting w/ the "service atd start" can be
explained as follows:  If a job which has been spawned by the atd process is
still running when you do a "service atd stop", atd continues to run until the
spawned job terminates.  If the old atd process is still running when you start
the at daemon again (start a new atd process) you will then have two atd
processes running concurrently.  I wonder what kind of unpredictable behaviour
this could generate, and whether you have seen some of this behaviour yourself.

4. The /var/spool/at job files being zero length...  I believe this must be the
result of at becoming confused?  I have not seen this before, and am still not
clear how this condition occurs.


Comment 3 Mike Gahagan 2002-07-18 21:07:54 UTC
Created attachment 65869 [details]
patch to correct some of atd's bad behavior.

Comment 4 Bill Huang 2002-07-19 09:15:57 UTC
I have applied the patch.The issue 1 is solved.however issue 2 is still left in
my case.

Comment 5 Bill Huang 2002-07-19 09:25:14 UTC
Confirmed again.both of them are solved.

Comment 6 Bill Huang 2002-07-19 09:26:15 UTC
See 8.1.3-30.