Bug 224597

Summary: atd dies when it tries to process queue
Product: [Fedora] Fedora Reporter: david.hagood
Component: atAssignee: Marcela Mašláňová <mmaslano>
Status: CLOSED DUPLICATE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 7CC: dougmc+bugzilla, mark, rh-bugzilla, steve
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-06-27 09:53:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description david.hagood 2007-01-26 16:16:38 UTC
Description of problem:
atd will die when it tries to process any events.


Version-Release number of selected component (if applicable):
at-3.1.10-6.fc7

How reproducible:
Every time

Steps to Reproduce:
1. service atd restart
2. echo true | at "now+1 min"
3. tail -f /var/log/messages
  
Actual results:
When the at job runs, atd dies with:
*** glibc detected *** atd: free(): invalid pointer: 0xbfc6e2ef ***
======= Backtrace: =========
/lib/libc.so.6[0x8243dd]
/lib/libc.so.6(cfree+0x90)[0x827a30]
atd[0xba77fa]
atd[0xba892f]
atd(main+0x3e1)[0xba8d81]
/lib/libc.so.6(__libc_start_main+0xe0)[0x7d3e90]
atd[0xba74d1]
======= Memory map: ========
007be000-00908000 r-xp 00000000 fd:00 40108039   /lib/libc-2.5.90.so
00908000-0090a000 r-xp 0014a000 fd:00 40108039   /lib/libc-2.5.90.so
0090a000-0090b000 rwxp 0014c000 fd:00 40108039   /lib/libc-2.5.90.so
0090b000-0090e000 rwxp 0090b000 00:00 0
00b70000-00b71000 r-xp 00b70000 00:00 0          [vdso]
00ba6000-00baa000 r-xp 00000000 fd:00 45137924   /usr/sbin/atd
00baa000-00bab000 rwxp 00004000 fd:00 45137924   /usr/sbin/atd
00bb8000-00bbb000 r-xp 00000000 fd:00 40108076   /lib/libdl-2.5.90.so
00bbb000-00bbc000 r-xp 00002000 fd:00 40108076   /lib/libdl-2.5.90.so
00bbc000-00bbd000 rwxp 00003000 fd:00 40108076   /lib/libdl-2.5.90.so
00c24000-00c2e000 r-xp 00000000 fd:00 40108130   /lib/libpam.so.0.81.6
00c2e000-00c2f000 rwxp 00009000 fd:00 40108130   /lib/libpam.so.0.81.6
00cb4000-00cc5000 r-xp 00000000 fd:00 40108129   /lib/libaudit.so.0.0.0
00cc5000-00cc7000 rwxp 00011000 fd:00 40108129   /lib/libaudit.so.0.0.0
00d68000-00d83000 r-xp 00000000 fd:00 40108037   /lib/ld-2.5.90.so
00d83000-00d84000 r-xp 0001a000 fd:00 40108037   /lib/ld-2.5.90.so
00d84000-00d85000 rwxp 0001b000 fd:00 40108037   /lib/ld-2.5.90.so
00f64000-00f6f000 r-xp 00000000 fd:00 40108128   /lib/libgcc_s-4.1.1-20070105.so.1
00f6f000-00f70000 rwxp 0000a000 fd:00 40108128   /lib/libgcc_s-4.1.1-20070105.so.1
0972e000-0974f000 rw-p 0972e000 00:00 0
b7e00000-b7e21000 rw-p b7e00000 00:00 0
b7e21000-b7f00000 ---p b7e21000 00:00 0
b7f54000-b7f89000 r--s 00000000 fd:00 2310146    /var/db/nscd/group
b7f89000-b7fbe000 r--s 00000000 fd:00 2310145    /var/db/nscd/passwd
b7fbe000-b7fc0000 rw-p b7fbe000 00:00 0
b7fd2000-b7fd3000 rw-p b7fd2000 00:00 0
bfc5b000-bfc71000 rw-p bfc5b000 00:00 0          [stack]

Module is unknown
Jan 26 10:13:00 Deathwish atd[3157]: Module is unknown


Expected results:
atd should run the job without error.

Additional info:
Among other things, the /etc/pam.d/atd file is WOEFULLY out of date - it refers
to the PAM modules by explicit directory path, it is using the (now missing)
pam_stack module.

Comment 1 david.hagood 2007-01-26 16:18:13 UTC
Also, the man page indicates that atd will take a "-n" parameter to prevent it
from forking, and a "-d" to send debug info to stderr rather than syslog -
neither of which work. Either the man page should be updated or the program
corrected to support the commands.

Comment 2 Marcela Mašláňová 2007-03-06 15:24:01 UTC
I have at-3.1.10-9.fc7

I can't reproduce your problem. Only problem is glibc detected when I use: atd -d

PAM module was fixed -> thanks for comment

Comment 3 david.hagood 2007-03-06 23:39:32 UTC
Sorry, but the problem persists for me.

I now get the following in syslog:

atd[9053]: Module is unknown

when the job times out, and atd dies.



Comment 4 david.hagood 2007-03-07 00:07:15 UTC
You have to use the -d option to see anything at all, otherwise the daemon just
silently dies.

Try this:

service atd stop
gdb --args /usr/sbin/atd -d
GNU gdb Red Hat Linux (6.6-4.fc7rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...
(no debugging symbols found)
Using host libthread_db library "/lib/libthread_db.so.1".
(gdb) run
Starting program: /usr/sbin/atd -d

(at this point, from another console/Xterm, run the 
  echo true | at "now+1 min"
command. Then switch back to the console/Xterm with the debug session.

After one minute....)


Program received signal SIGHUP, Hangup.
0x00e26402 in __kernel_vsyscall ()
(gdb) bt
#0  0x00e26402 in __kernel_vsyscall ()
#1  0x00283530 in __nanosleep_nocancel () from /lib/libc.so.6
#2  0x0028337f in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:138
#3  0x80002da2 in main () from /usr/sbin/atd
(gdb) cont
Continuing.
*** glibc detected *** /usr/sbin/atd: free(): invalid pointer: 0xbfef856f ***

Module is unknown
======= Backtrace: =========
/lib/libc.so.6[0x25b5ed]
/lib/libc.so.6(cfree+0x90)[0x25ec40]
/usr/sbin/atd[0x8000181a]
/usr/sbin/atd[0x8000294f]
/usr/sbin/atd(main+0x3ce)[0x80002d8e]
/lib/libc.so.6(__libc_start_main+0xe0)[0x20aec0]
/usr/sbin/atd[0x800014f1]
======= Memory map: ========
001ea000-001f4000 r-xp 00000000 ee:00 40108131   /lib/libpam.so.0.81.6
001f4000-001f5000 rwxp 00009000 ee:00 40108131   /lib/libpam.so.0.81.6
001f5000-00340000 r-xp 00000000 ee:00 40108097   /lib/libc-2.5.90.so
00340000-00342000 r-xp 0014b000 ee:00 40108097   /lib/libc-2.5.90.so
00342000-00343000 rwxp 0014d000 ee:00 40108097   /lib/libc-2.5.90.so
00343000-00346000 rwxp 00343000 00:00 0 
00391000-003a3000 r-xp 00000000 ee:00 3178513    /lib/libaudit.so.0.0.0
003a3000-003a5000 rwxp 00011000 ee:00 3178513    /lib/libaudit.so.0.0.0
0076c000-00777000 r-xp 00000000 ee:00 40108178   /lib/libgcc_s-4.1.2-20070222.so.1
00777000-00778000 rwxp 0000a000 ee:00 40108178   /lib/libgcc_s-4.1.2-20070222.so.1
007cc000-007e7000 r-xp 00000000 ee:00 3178504    /lib/ld-2.5.90.so
007e7000-007e8000 r-xp 0001a000 ee:00 3178504    /lib/ld-2.5.90.so
007e8000-007e9000 rwxp 0001b000 ee:00 3178504    /lib/ld-2.5.90.so
00e26000-00e27000 r-xp 00e26000 00:00 0          [vdso]
00fbb000-00fbe000 r-xp 00000000 ee:00 3178522    /lib/libdl-2.5.90.so
00fbe000-00fbf000 r-xp 00002000 ee:00 3178522    /lib/libdl-2.5.90.so
00fbf000-00fc0000 rwxp 00003000 ee:00 3178522    /lib/libdl-2.5.90.so
80000000-80004000 r-xp 00000000 ee:00 8896612    /usr/sbin/atd
80004000-80005000 rw-p 00004000 ee:00 8896612    /usr/sbin/atd
819f6000-81a17000 rw-p 819f6000 00:00 0 
b7d00000-b7d21000 rw-p b7d00000 00:00 0 
b7d21000-b7e00000 ---p b7d21000 00:00 0 
b7e92000-b7ec7000 r--s 00000000 ee:00 2310146    /var/db/nscd/group
b7ec7000-b7efc000 r--s 00000000 ee:00 2310145    /var/db/nscd/passwd
b7efc000-b7efe000 rw-p b7efc000 00:00 0 
b7f0f000-b7f10000 rw-p b7f0f000 00:00 0 
bfee4000-bfef9000 rw-p bfee4000 00:00 0          [stack]

Program received signal SIGABRT, Aborted.
0x00e26402 in __kernel_vsyscall ()
(gdb) bt
#0  0x00e26402 in __kernel_vsyscall ()
#1  0x0021df60 in *__GI_raise (sig=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#2  0x0021f7b1 in *__GI_abort () at abort.c:88
#3  0x0025380b in __libc_message (do_abort=2, 
    fmt=0x318c04 "*** glibc detected *** %s: %s: 0x%s ***\n")
    at ../sysdeps/unix/sysv/linux/libc_fatal.c:170
#4  0x0025b5ed in _int_free (av=0x343120, mem=0xbfef856f) at malloc.c:5788
#5  0x0025ec40 in *__GI___libc_free (mem=0xbfef856f) at malloc.c:3566
#6  0x8000181a in ?? () from /usr/sbin/atd
#7  0x8000294f in ?? () from /usr/sbin/atd
#8  0x80002d8e in main () from /usr/sbin/atd
(gdb) 

Also, PLEASE check the ATD manpage as it lists several options that ATD does not
accept, such as the "-n" (don't fork) option.

The debuginfo RPM for at does not seem to contain any files at all, so it is not
possible to debug what is going on in atd itself.


Comment 5 david.hagood 2007-03-07 00:08:13 UTC
BTW - I had updated to the same version of AT as you when I generated that last
comment.


Comment 6 Enrico Scholz 2007-06-05 06:56:58 UTC
it's because of

 --- atd.c ---
|    char newname[256];
|
|        free(newname);


The do-not-fork option works, but requires an additional argument erroneously.
Option string should be 

| getopt(argc, argv, "sdl:b:n")

but not

| getopt(argc, argv, "sdl:b:n:")


Comment 7 Doug McLaren 2007-06-18 17:49:46 UTC
I'm seeing the same error in the atd that comes with F7, with all updates
applied.  The system was updated from FC6, and there were at jobs in the queue
at the time -- but new at jobs cause it to die too.

When the time comes from atd to run a program, it does, but then atd
immediately dies.  atd -d fails with the same error as given
above --

# atd -d 
*** glibc detected *** atd: free(): invalid pointer: 0xbfd46bef ***
======= Backtrace: =========
/lib/libc.so.6[0xe39f41]
/lib/libc.so.6(cfree+0x90)[0xe3d580]
atd[0x8000181a]
...

I guess for now my workaround is to replace running `atd' with a script that
just runs at -d in a loop --

   #!/bin/sh
   while : ; do
      at -d
   done

and run that instead.


Comment 8 Doug McLaren 2007-06-18 17:51:42 UTC
(In reply to comment #7)

>       at -d

er, atd -d ...


Comment 9 Matt Cavic 2007-06-19 21:04:17 UTC
Atd crashes for me as well, after running a job.  In /var/log/messages I get:
atd[2621]: Open of jobfile failed: Success


Comment 10 Bill C. Riemers 2007-06-21 14:41:32 UTC
Same problem here using Fedora 7 x86_64

# rpm -q at
at-3.1.10-11.fc7
# service atd restart;(echo "/bin/true"|at now);sleep 60;service atd status
Stopping atd:                                              [FAILED]
Starting atd:                                              [  OK  ]
job 6 at Thu Jun 21 10:32:00 2007
Jun 21 10:32:51 d490f7x64 atd[7572]: Open of jobfile failed: Success
atd dead but pid file exists



Comment 11 Alex 2007-06-22 08:12:50 UTC
Same problem here using Fedora 7 i386

# rpm -q at
at-3.1.10-11.fc7
In var/log/messages: Open of jobfile failed: Success
# service atd status
atd dead but pid file exists


Comment 12 Marcela Mašláňová 2007-06-27 09:53:02 UTC

*** This bug has been marked as a duplicate of 243064 ***