Bug 243064 - broken at
broken at
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: at (Show other bugs)
7
i686 Linux
high Severity high
: ---
: ---
Assigned To: Marcela Mašláňová
: Reopened
: 224597 240275 241882 244844 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-06-07 00:43 EDT by Henrique Martins
Modified: 2007-11-30 17:12 EST (History)
16 users (show)

See Also:
Fixed In Version: 3.1.10-13.fc7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-07-05 15:22:36 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
gdb log of atd (3.21 KB, text/plain)
2007-06-11 14:27 EDT, Mamoru TASAKA
no flags Details
Patch to fix at.c (adding jobs, signaling atd, atrm, at -c) (1.75 KB, patch)
2007-07-04 01:12 EDT, Michael Staib
no flags Details | Diff

  None (edit)
Description Henrique Martins 2007-06-07 00:43:28 EDT
at seems somewhat broken and atd dies on fc7
using at-3.1.10-11.fc7 on freshly updated fc6 to f7
rpm --verify yields nothing (man page for atd claims that /usr/spool/at and
/usr/spool/at/spool need to be 700, owner root, but they are 700, owner daemon, 
just as in fc6)

as a normal user I do
% at <time>
at> echo hello > /tmp/hello.txt
at> ^d <EOT>
job <xxx> at <date/time>
% at -l
<xxx> at <date/time>
% atrm <xxx>
Cannot unlink a000ee012c673e: Permission denied

That file exists under /var/spool/at with the same owner:group permissions as
the owner that executed the at command.  Can be removed with "sudo atrm <xxx>"

If I don't atrm the job, it starts but never completes.  A message header file
is created under /var/spool/at/spool with a "subject: output from your job
<xxx>" and "to: <user>" lines, but it's never sent. Atd seems to be dead, "at
-l" still shows the process as running. "service atd status" output "atd dead
but pid still exists".

Used to work fine under fc6, broken by the update process
Comment 1 Steve Stavropoulos 2007-06-11 05:48:10 EDT
I can confirm this bug. I also see at /var/log/messages the following line:
atd[10935]: Open of jobfile failed: Success
Comment 2 Mamoru TASAKA 2007-06-11 14:27:37 EDT
Created attachment 156744 [details]
gdb log of atd

Well, the following is what I tried:

* My system is F-devel i386
* at is at-3.1.10-11.fc7

* Then launch atd by service atd start
* As root, request a at job
-------------------------------------------------
[root@localhost ~]# at 3:21
at> echo "Hello" >> /tmp/at.check
at> <EOT>
job 20 at Tue Jun 12 03:21:00 2007
-------------------------------------------------
* launch gdb, attach to atd process
* and cont
* Wait till the job request is to be executed '03:21)

Results
* atd aborted as attached.
* and in /var/log/messages:
---------------------------------------------------------
Jun 12 03:21:25 localhost atd[25318]: Open of jobfile failed: Success
---------------------------------------------------------
Comment 3 Mamoru TASAKA 2007-06-11 22:26:06 EDT
The point where abort() is called on atd.c was
--------------------------------------------------
   337          free(newname);
--------------------------------------------------
This is strange. mailname is a array of size 256
--------------------------------------------------
   264      char newname[256];
--------------------------------------------------
and should never be free'd.

Changing priority, severity
Comment 4 Adam Sulmicki 2007-06-19 08:50:39 EDT
same here, fresh install of f7. atd crashes.

# atd -nd


*** glibc detected *** atd: free(): invalid pointer: 0xbf9a55df ***
======= Backtrace: =========
/lib/libc.so.6[0xe7bf41]
/lib/libc.so.6(cfree+0x90)[0xe7f580]
atd[0x8000181a]
atd[0x8000294f]
atd(main+0x3ce)[0x80002d8e]
/lib/libc.so.6(__libc_start_main+0xe0)[0xe29f70]
atd[0x800014f1]
======= Memory map: ========
00110000-0011a000 r-xp 00000000 08:01 7677961    /lib/libnss_files-2.6.so
0011a000-0011b000 r-xp 00009000 08:01 7677961    /lib/libnss_files-2.6.so
0011b000-0011c000 rwxp 0000a000 08:01 7677961    /lib/libnss_files-2.6.so
0013b000-00156000 r-xp 00000000 08:01 7679064    /lib/ld-2.6.so
00156000-00157000 r-xp 0001a000 08:01 7679064    /lib/ld-2.6.so
00157000-00158000 rwxp 0001b000 08:01 7679064    /lib/ld-2.6.so
001c7000-001ca000 r-xp 00000000 08:01 7679066    /lib/libdl-2.6.so
001ca000-001cb000 r-xp 00002000 08:01 7679066    /lib/libdl-2.6.so
001cb000-001cc000 rwxp 00003000 08:01 7679066    /lib/libdl-2.6.so
00431000-0043b000 r-xp 00000000 08:01 7680562    /lib/libpam.so.0.81.6
0043b000-0043c000 rwxp 00009000 08:01 7680562    /lib/libpam.so.0.81.6
0063d000-00648000 r-xp 00000000 08:01 7679074    /lib/libgcc_s-4.1.2-20070503.so.1
00648000-00649000 rwxp 0000a000 08:01 7679074    /lib/libgcc_s-4.1.2-20070503.so.1
00709000-0070a000 r-xp 00709000 00:00 0          [vdso]
007d0000-007e2000 r-xp 00000000 08:01 7678031    /lib/libaudit.so.0.0.0
007e2000-007e4000 rwxp 00011000 08:01 7678031    /lib/libaudit.so.0.0.0
00e14000-00f62000 r-xp 00000000 08:01 7679065    /lib/libc-2.6.so
00f62000-00f64000 r-xp 0014e000 08:01 7679065    /lib/libc-2.6.so
00f64000-00f65000 rwxp 00150000 08:01 7679065    /lib/libc-2.6.so
00f65000-00f68000 rwxp 00f65000 00:00 0 
80000000-80004000 r-xp 00000000 08:01 276543     /usr/sbin/atd
80004000-80005000 rw-p 00004000 08:01 276543     /usr/sbin/atd
81d84000-81da5000 rw-p 81d84000 00:00 0 
b7e00000-b7e21000 rw-p b7e00000 00:00 0 
b7e21000-b7f00000 ---p b7e21000 00:00 0 
b7f62000-b7f64000 rw-p b7f62000 00:00 0 
b7f81000-b7f82000 rw-p b7f81000 00:00 0 
bf992000-bf9a7000 rw-p bf992000 00:00 0          [stack]
Przerwane
Comment 5 Adam Sulmicki 2007-06-19 09:29:27 EDT
better debug info. Seems to happen at the same place as for Mamoru Tasaka.

(gdb) run -nb
The program being debugged has been started already.
Start it from the beginning? (y or n) y

Starting program: /root/rpmbuild/BUILD/at-3.1.10/atd -nb
[Detaching after fork from child process 31749. (Try `set detach-on-fork off'.)]
*** glibc detected *** /root/rpmbuild/BUILD/at-3.1.10/atd: free(): invalid
pointer: 0xbfdb59af ***
======= Backtrace: =========
/lib/libc.so.6[0x3a1f41]
/lib/libc.so.6(cfree+0x90)[0x3a5580]
/root/rpmbuild/BUILD/at-3.1.10/atd[0x8000183a]
/root/rpmbuild/BUILD/at-3.1.10/atd[0x8000296f]
/root/rpmbuild/BUILD/at-3.1.10/atd(main+0x3ce)[0x80002dae]
/lib/libc.so.6(__libc_start_main+0xe0)[0x34ff70]
/root/rpmbuild/BUILD/at-3.1.10/atd[0x800014f1]
======= Memory map: ========
00110000-0011a000 r-xp 00000000 08:01 7677961    /lib/libnss_files-2.6.so
0011a000-0011b000 r-xp 00009000 08:01 7677961    /lib/libnss_files-2.6.so
0011b000-0011c000 rwxp 0000a000 08:01 7677961    /lib/libnss_files-2.6.so
0033a000-00488000 r-xp 00000000 08:01 7679065    /lib/libc-2.6.so
00488000-0048a000 r-xp 0014e000 08:01 7679065    /lib/libc-2.6.so
0048a000-0048b000 rwxp 00150000 08:01 7679065    /lib/libc-2.6.so
0048b000-0048e000 rwxp 0048b000 00:00 0 
008d6000-008d7000 r-xp 008d6000 00:00 0          [vdso]
00982000-00985000 r-xp 00000000 08:01 7679066    /lib/libdl-2.6.so
00985000-00986000 r-xp 00002000 08:01 7679066    /lib/libdl-2.6.so
00986000-00987000 rwxp 00003000 08:01 7679066    /lib/libdl-2.6.so
00b0b000-00b16000 r-xp 00000000 08:01 7679074    /lib/libgcc_s-4.1.2-20070503.so.1
00b16000-00b17000 rwxp 0000a000 08:01 7679074    /lib/libgcc_s-4.1.2-20070503.so.1
00b1a000-00b24000 r-xp 00000000 08:01 7680562    /lib/libpam.so.0.81.6
00b24000-00b25000 rwxp 00009000 08:01 7680562    /lib/libpam.so.0.81.6
00d7e000-00d90000 r-xp 00000000 08:01 7678031    /lib/libaudit.so.0.0.0
00d90000-00d92000 rwxp 00011000 08:01 7678031    /lib/libaudit.so.0.0.0
00e07000-00e22000 r-xp 00000000 08:01 7679064    /lib/ld-2.6.so
00e22000-00e23000 r-xp 0001a000 08:01 7679064    /lib/ld-2.6.so
00e23000-00e24000 rwxp 0001b000 08:01 7679064    /lib/ld-2.6.so
80000000-80004000 r-xp 00000000 08:01 4551859    /root/rpmbuild/BUILD/at-3.1.10/atd
80004000-80005000 rw-p 00004000 08:01 4551859    /root/rpmbuild/BUILD/at-3.1.10/atd
814a4000-814c5000 rw-p 814a4000 00:00 0 
b7e00000-b7e21000 rw-p b7e00000 00:00 0 
b7e21000-b7f00000 ---p b7e21000 00:00 0 
b7f42000-b7f44000 rw-p b7f42000 00:00 0 
b7f61000-b7f62000 rw-p b7f61000 00:00 0 
bfda2000-bfdb7000 rw-p bfda2000 00:00 0          [stack]

Program received signal SIGABRT, Aborted.
0x008d6402 in __kernel_vsyscall ()
(gdb) bt
#0  0x008d6402 in __kernel_vsyscall ()
#1  0x00362fa0 in raise () from /lib/libc.so.6
#2  0x003648b1 in abort () from /lib/libc.so.6
#3  0x00399ebb in __libc_message () from /lib/libc.so.6
#4  0x003a1f41 in _int_free () from /lib/libc.so.6
#5  0x003a5580 in free () from /lib/libc.so.6
#6  0x8000183a in run_file (filename=0x814a52ff "!00007012ca9ff", uid=501, 
    gid=501) at atd.c:337
#7  0x8000296f in run_loop () at atd.c:926
#8  0x80002dae in main (argc=2, argv=0xbfdb5d64) at atd.c:1080
(gdb) 
Comment 6 Henrique Martins 2007-06-19 12:21:14 EDT
I removed the two free(newname) in atd.c, rebuilt the rpm, and installed.  The
code now doesn't abort (i.e. atd doesn't die) but things are still a bit wrong.

The "Open of jobfile: success" still shows in syslog.  When the code is trying
to cleanup it opens a file (filename) and rather than checking for no error it
checks that the returned fd is STDIN, i.e. it is not expecting to have any other
fd open.

Also atrm as the user still doesn't work.  I suspect there's something wrong
with authentication, either with pam or selinux patches (though I have it disabled).

My fc6 at is at 3.1.8-85.  My fc7 is 3.1.10-11.  There are lots of patches
between 3.1.8 and 3.1.10 and one is probably the culprit.
Comment 7 Mamoru TASAKA 2007-06-19 12:56:08 EDT
Now we must downgrade at to 3.1.8-85 forcely with EPOCH bumped?
Actually I have to say that now Fedora 7 atd is completely broken.
Comment 8 Markus Enzenberger 2007-06-21 12:53:35 EDT
I have the same problem. atd crashes at the end of the first job it runs.
atrm and at -c work only as root even for jobs owned by the user. I think
it is a permission problem, because I get the following entry in my daily
logwatch emails:

 --------------------- pam_unix Begin ------------------------ 

 atd:
    Password Failures:
       markus: 1 Time(s)
    Sessions Opened:
       markus by (uid=0): 1 Time(s)
 
 
 ---------------------- pam_unix End -------------------------
Comment 9 Henrique Martins 2007-06-24 13:19:52 EDT
The free(newname) is a bug but it's not the problem here.  The
at-3.1.10-perm.patch, patch 17 of the current build seems to have done it.
Uncommenting out a couple of PRIV_START, PRIV_END that that patch commented out
fixes the atrm and at -c problem (though it may re-introduce whatever problem
that patch was trying to solve.)

I'll attach the patch next
Comment 10 Marcela Mašláňová 2007-06-25 09:22:09 EDT
*** Bug 240275 has been marked as a duplicate of this bug. ***
Comment 11 Marcela Mašláňová 2007-06-25 09:24:27 EDT
*** Bug 244844 has been marked as a duplicate of this bug. ***
Comment 12 Henrique Martins 2007-06-25 13:27:52 EDT
No patch, as I got the code to run but the cleanup process is broken, i.e. no
mail sent back and files still hanging around in /var/spool/at.  I'll go hunt
for more commented out PRIV_ directive but something is definitely funny with
this release, probably between debian running this as daemon and fedora running
this as root.
Comment 13 Marcela Mašláňová 2007-06-27 05:46:22 EDT
Thanks for suggestions, I'm working on it. It must be something with permissions.
Comment 14 Marcela Mašláňová 2007-06-27 05:53:09 EDT
*** Bug 224597 has been marked as a duplicate of this bug. ***
Comment 15 Tomasz Ostrowski 2007-06-29 06:58:17 EDT
This is a code in atd.c after "rpmbuild -bp at.spec":

#####################################################
    /* We're the parent.  Let's wait.
     */
    close(fd_in); 
    close(fd_out);

    /* We inherited the master's SIGCHLD handler, which does a   
       non-blocking waitpid. So this blocking one will eventually
       return with an ECHILD error. 
     */
    waitpid(pid, (int *) NULL, 0);

#ifdef HAVE_PAM
    PRIV_START
        pam_setcred(pamh, PAM_DELETE_CRED | PAM_SILENT);
        retcode = pam_close_session(pamh, PAM_SILENT);
        pam_end(pamh, retcode);
    PRIV_END
#endif

    /* Send mail.  Unlink the output file after opening it, so it 
     * doesn't hang around after the run (if we are to send mail).
     */
    if( send_mail != -1 ) {  
        stat(filename, &buf);
        if (open(filename, O_RDONLY) != STDIN_FILENO)
            perr("Open of jobfile failed");
        unlink(filename);
    }

#ifdef  WITH_PAM
    pam_setcred(pamh, PAM_DELETE_CRED | PAM_SILENT );
    pam_close_session(pamh, PAM_SILENT);
    pam_end(pamh, PAM_ABORT);
    closelog();
    openlog("atd", LOG_PID, LOG_ATD);
#endif

#####################################################

It looks like this commands
  pam_setcred(pamh, PAM_DELETE_CRED | PAM_SILENT)
  pam_close_session(pamh, PAM_SILENT);
  pam_end(pamh, PAM_ABORT);
are called twice. I think they should not.

I think they also they probably open and leave opened a file descriptor, which
causes this "if (open(filename, O_RDONLY) != STDIN_FILENO)" to trigger and show
this MS-copyrighted message "Open of jobfile failed: Success" - the open was
successful but it did not end as standard input.

I think this code should look like this (not tested):

#####################################################
    /* We're the parent.  Let's wait.
     */
    close(fd_in); 
    close(fd_out);

    /* We inherited the master's SIGCHLD handler, which does a   
       non-blocking waitpid. So this blocking one will eventually
       return with an ECHILD error. 
     */
    waitpid(pid, (int *) NULL, 0);

    /* Send mail.  Unlink the output file after opening it, so it 
     * doesn't hang around after the run (if we are to send mail).
     */
    if( send_mail != -1 ) {  
        stat(filename, &buf);
        if (open(filename, O_RDONLY) != STDIN_FILENO)
            perr("Open of jobfile failed");
        unlink(filename);
    }

#ifdef  WITH_PAM
    PRIV_START
        pam_setcred(pamh, PAM_DELETE_CRED | PAM_SILENT);
        retcode = pam_close_session(pamh, PAM_SILENT);
        pam_end(pamh, retcode);
    PRIV_END
    closelog();
    openlog("atd", LOG_PID, LOG_ATD);
#endif

#####################################################

I'm do not understand PAM or know how to program it - I can be very wrong.
Comment 16 david.hagood 2007-06-29 09:50:56 EDT
I'd ditch the 
  if (open(filename, O_RDONLY) != STDIN_FILENO)

and replace it with
  if (open(filename, O_RDONLY) >-1)

as IMHO it is stupid to test against a specific FD number rather than just
testing for success (open being defined as returning -1 on error).
Comment 17 ptomblin 2007-06-29 09:57:37 EDT
I agree with David about the stupidity of checking for a specific FD number.  I
wonder if the reason it works on Debian but not Fedora is that pam code is
somehow opening a file, so it's grabbing the recently closed STDIN_FILENO?
Comment 18 Marcela Mašláňová 2007-07-02 05:10:21 EDT
I remove some of unlink files from patch instinet and at is almost working, but
only for root.

If you change on if (open(filename, O_RDONLY) >-1) what was working? 

Comment #15: the HAVE_PAM and WITH_PAM are not the same things, now I'm working
on pam authentication. Thanks for all suggestions.

Comment 19 Marcela Mašláňová 2007-07-03 10:49:18 EDT
I updated F-7 and rawhide, atd daemon seems to be ok.
Some problems still occure - only root can create jobs and so on.
Comment 20 Fedora Update System 2007-07-03 12:23:31 EDT
at-3.1.10-12.fc7 has been pushed to the Fedora 7 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 21 Henrique Martins 2007-07-03 14:07:16 EDT
>> Some problems still occure - only root can create jobs and so on.

How can this be marked as closed, if it only runs for root?? 

I just downloaded and installed at-3.1.10-12.fc7 and when trying to schedule a
job  as a regular user I get "Cannot give away file: Operation not permitted". 
This is even worse than my own patched version ...
Comment 22 ptomblin 2007-07-03 14:10:48 EDT
Henrique, it's closed because the problem described in the bug report is fixed.
 That's how bug reports work - if you have a *different* problem with at, you
file a different bug report.
Comment 23 Mamoru TASAKA 2007-07-03 14:13:26 EDT
The reporter reports on *normal user* at jobs, which is *NOT FIXED YET*.
Reopening.
Comment 24 Henrique Martins 2007-07-03 14:28:04 EDT
>>it's closed because the problem described in the bug report is fixed.

I don't really want to start a war of words, just get this problem solved.  I
originated this bug report with a title of "broken at" and presented some
symptoms and test conditions.  This new rev level still has a "broken at", just
broken in a different way, i.e. the original test still fails.
Comment 25 Michael Staib 2007-07-04 01:10:33 EDT
An unprivileged user creating an at job receives the error "Cannot give away
file: Operation not permitted". at creates the desired job file, but it's chmod
000 and owned by root:(real gid). Running it in gdb shows that this error is
printed when chmodding the file to 700. By this point at has sacrificed its
privileges and fallen to daemon:daemon and so doesn't have chmod rights on the file.

This was easily corrected by chowning the file to (real uid):(real gid) before
at gives up root, as appears to be done on my FC6 installation (running
3.1.8-85)... however, this chown was commented out in the code I have (3.1.10-12).

Anyway, if those lines are uncommented, the at job is created fine (despite a
signaling error), but never runs. This appears to be because at fails to SIGHUP
atd - its permissions are denied because at is acting as daemon:daemon and atd
is being run by root. This I remedied by simply moving the permission drop to
after the signal, though not releasing permissions at all would work as well
(given that execution ends almost immediately after the signal anyway).

atrm fails as well: Before process_jobs is called, at drops root:root for
daemon:daemon. However, PRIV_END is called before the unlink operation, which
means that the privileges are now (real uid):(real gid). Before the unlink, at
sets the gid back down to effective in the hopes of having write access to the
directory. Unfortunately, group daemon does not have write access on
/var/spool/at. In this case, it works to set permissions to those of user
daemon, who does. And, for that matter, is the only one who does.

Finally, at -c fails too: While it doesn't give up its root privileges, it tries
to access things while being in the root group, which doesn't benefit it any.
Replacing the setgid with PRIV_START and PRIV_END works.
Comment 26 Michael Staib 2007-07-04 01:12:32 EDT
Created attachment 158496 [details]
Patch to fix at.c (adding jobs, signaling atd, atrm, at -c)

A patch that makes all the changes described above.
Comment 27 Marcela Mašláňová 2007-07-04 04:39:44 EDT
Hello Michael,
I really appreciate your patch. Thank you.
Comment 28 Marcela Mašláňová 2007-07-04 08:34:55 EDT
*** Bug 241882 has been marked as a duplicate of this bug. ***
Comment 29 Mamoru TASAKA 2007-07-04 09:43:02 EDT
at-3.1.10-13.fc7 seems to be working properly on my system.
Comment 30 Adam Tkac 2007-07-04 09:59:33 EDT
Works as expected

Adam
Comment 31 Dave 2007-07-04 12:37:25 EDT
I've been waiting for the fix to "AT". This morning it showed up in yum and I
installed it. Alas, it's still broken.

sudo  echo "echo test > /home/mine/testat.txt" | at now
Cannot give away file: Operation not permitted

sudo ls -al  /var/spool/cron/atjobs
ls: cannot access /var/spool/cron/atjobs: No such file or directory

sudo rpm -qa |grep at
at-3.1.10-12.fc7

at -V
at version 3.1.9

The version announced doesn't match the rpm installed!

I've tried creating the /var/spool/cron/atjobs directory and setting ownership
to daemon, but that didn't fix it.

Net result is that AT is still broken!
Comment 32 Henrique Martins 2007-07-04 13:49:25 EDT
Doesn't look like that at-3.1.10-13.fc7 has been pushed to the f7 updates depots
yet.  However, if you apply the patch posted today and rebuild, at/atd seem to
work.  I tested at, at -c, at -l, atrm and getting email from atd.  They all worked.

I've said it before, it seems curious that a lot of the fixes to make this
version  work revert changes made to the 3.1.8 version, that came with fc6,
changes that were left in place but commented out.  There may still be some
conditions where this new at may not work, i.e. selinux enabled (I have it
disabled) or pam enabled.
Comment 33 Mamoru TASAKA 2007-07-04 13:52:48 EDT
Not-yet-released version of at is available from:
http://koji.fedoraproject.org/packages/at/
Comment 34 Michael Staib 2007-07-04 14:01:02 EDT
@31: If subtraction works, you have 3.1.10. I don't think they updated the
version string.

@32: Curious indeed. I was curious myself, so I looked at the debian atd
changelog at
http://packages.debian.org/changelogs/pool/main/a/at/at_3.1.10/changelog this
morning... they have a completely different permission structure than we do as
of 3.1.9, and it looks like all the bugs I encountered last night are fixed by
having those permissions/users/etc. instead. I couldn't test them because I have
SElinux and PAM enabled (still works, by the way, it ran my alarm-launch script
this morning), and PAM complained and prevented me from running atd as daemon.
But just by doing the permissions checks on paper it seems it would all work...
Comment 35 Marcela Mašláňová 2007-07-05 06:01:37 EDT
Ok, I'll check the permission again.

at -V should work, I'll fix it.

Thanks for comments.
Comment 36 Fedora Update System 2007-07-05 15:22:26 EDT
at-3.1.10-13.fc7 has been pushed to the Fedora 7 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.