Red Hat Bugzilla – Bug 67414
atd doesnt clean up old jobs
Last modified: 2007-04-18 12:43:36 EDT
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
Description of problem:
I went through most of the newsgroups and still did not find an answer for my
problem with at command. I seek your help in fixing, what looks like a bug.
Any help that you might provide is greatly appreciated.
New York, NY.
Problem : The at job shows up in queue even after it has been run successfully.
How this problem affects us ? - Any new job that I submit after this behaves
randomly (meaning that the new job runs sometimes)
My observations - If I submit a subsequent job, then the queue gets cleared.
But this happens ONLY if the subsequent jobs runs successfully. It is almost
like the new job "kicks" the stale job out of the queue.
[root@nh0029 root]# date
Mon Jun 24 12:03:13 EDT 2002 <=== Current Time
[root@nh0029 root]# atq
19 2002-06-24 11:09 = rts <=== This job completed normally. But is
not removed from the queue.
[root@nh0029 at]# ls -la /var/spool/at
drwx------ 3 daemon daemon 1024 Jun 24 11:09 .
drwxr-xr-x 11 root root 1024 Apr 5 12:02 ..
-rwx------ 1 rts 66 3878 Jun 24 11:09 =000130104a74d <===
This is the job and still is there.
-rw------- 1 daemon daemon 6 Jun 24 11:09 .SEQ
drwx------ 2 daemon daemon 1024 Jun 24 11:09 spool
[root@nh0029 at]# uname -a
Linux nh0029 2.4.18-3smp #1 SMP Thu Apr 18 07:27:31 EDT 2002 i686 unknown
[root@nh0001 root]# atq -V
at version 3.1.8
Bug reports to: email@example.com (Thomas Koenig)
Problem : Multiple atd starts up when i invoke atd daemon using the rc
How this problem affects us ? - Since there is more than one atd, my job
gets confused as to which one it should get from the pid and use it.
My observations - I manually kill the second atd.
[root@nh0029 at]# service atd stop
stopping atd: [ OK ] <====
atd is now stopped
[root@nh0029 at]# service atd start
Starting atd: [ OK ] <==== I
[root@nh0029 at]# ps -ef | grep -i atd
daemon 23116 1 0 11:09 ? 00:00:00 /usr/sbin/atd <==== 2
instances of atd startup
daemon 23589 1 0 12:06 ? 00:00:00 /usr/sbin/atd
[root@nh0029 at]# cat /var/run/atd.pid
23589 <==== only
one pid is trapped in at.pid
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.submit a job using at command and then do a atq
2.Let the job run as it should
3.atq now still shows the job
Actual Results: atq still shows old jobs
Expected Results: atq should not have shown anything in queue and also should
have removed the job from /var/spool/at/
But it doesny
This system was actually running 7.3, not 7.2
1. "=" does not mean that the job has completed successfully. "=" means that
the job is currently running (big difference). If your job (whatever it was)
consistented of a waiting / looping / runaway process, you would see the at job
remaining in the "=" (running) queue. It would appear that this is what was
really happening on your machine in question. You can verify through the ps and
top commands just what processes are currently doing what (e.g. "ps -xaf").
When you say that the job had completed normally, are you _sure_ that the job
can completely terminated successfully (did you check the running processes)?
2. The problem w/ old jobs remaining and new jobs not running is a curious one.
Even an old job in the "=" queue (i.e. currently running) will not block new
jobs from running. Two cases which I suppose could cause problems like this:
a) file system runs out of space
b) if more than one at daemon (atd) is running (see below).
3. The issue w/ the atd process not terminating when the "service atd stop" is
issued, and the second atd process starting w/ the "service atd start" can be
explained as follows: If a job which has been spawned by the atd process is
still running when you do a "service atd stop", atd continues to run until the
spawned job terminates. If the old atd process is still running when you start
the at daemon again (start a new atd process) you will then have two atd
processes running concurrently. I wonder what kind of unpredictable behaviour
this could generate, and whether you have seen some of this behaviour yourself.
4. The /var/spool/at job files being zero length... I believe this must be the
result of at becoming confused? I have not seen this before, and am still not
clear how this condition occurs.
Created attachment 65869 [details]
patch to correct some of atd's bad behavior.
I have applied the patch.The issue 1 is solved.however issue 2 is still left in
Confirmed again.both of them are solved.