Bug 718422

Summary: localhost atd[19184]: File a0000f0149b7f3 is in wrong format - aborting
Product: [Fedora] Fedora Reporter: edo <edosurina>
Component: atAssignee: Marcela Mašláňová <mmaslano>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 19CC: anders.blomdell, edosurina, evi.linux.redhat.bugzilla.01, jmlich83, mmaslano, public.oss
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: at-3.1.13-13.fc19 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 977510 (view as bug list) Environment:
Last Closed: 2013-12-05 11:40:08 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 977510    
Attachments:
Description Flags
messages log file
none
Example of a valid atd task file from google-earth rpm installation
none
Patch to unlink dead jobs less frequently none

Description edo 2011-07-02 18:11:22 UTC
Description of problem:
My message log file is total full of this messages:
localhost atd[19184]: File a0000f0149b7f3 is in wrong format - aborting


Version-Release number of selected component (if applicable):
at-3.1.12-8

How reproducible:
always

Steps to Reproduce:
1.start session
2.
3.
  
Actual results:
total full var/log/messages file

Expected results:
no messages

Additional info:

Comment 1 edo 2011-07-04 08:23:05 UTC
Created attachment 511134 [details]
messages log file

messages log file

Comment 2 Marcela Mašláňová 2011-07-19 12:37:46 UTC
Would you mind create a reproducer? What did you have in this job? I wasn't even able produce message "File a0000f0149b7f3 is in wrong format".

Comment 3 Marcela Mašláňová 2011-07-28 13:09:38 UTC
I can't reproduce it on my machines. If you still have problems and you have reproducer, than please feel free to re-open.

Comment 4 edo 2011-08-04 19:46:23 UTC
(In reply to comment #3)
> I can't reproduce it on my machines. If you still have problems and you have
> reproducer, than please feel free to re-open.

I don't know what I must do to create a reproducer...

Comment 5 Marcela Mašláňová 2011-10-05 16:22:09 UTC
I'm sorry, but I wasn't able to reproduce it. You didn't gave me enough data how it happened. If you have valid reproducer, please feel free to reopen.

Comment 6 Xavier Hourcade 2012-01-29 19:58:41 UTC
Hi, I've found hundred of these messages (per second !) here under F16 KDE in /var/log/messages. They came to my attention while printed on screen during boot, in a tied loop cf. #704658

Jan 29 19:53:12 venus atd[6970]: File a0000601519595 is in wrong format - aborting
Jan 29 19:53:12 venus atd[6971]: File a0000601519595 is in wrong format - aborting
Jan 29 19:53:12 venus atd[6972]: File a0000601519595 is in wrong format - aborting

# ls -la /var/spool/at/
total 16
drwx------.  3 daemon daemon 4096 Jan 29 18:53 .
drwxr-xr-x. 15 root   root   4096 Nov  3 03:30 ..
-rwx------.  2 root   root      0 Jan 24 21:03 =0000601519595
-rwx------.  2 root   root      0 Jan 24 21:03 a0000601519595
-rw-------.  1 daemon daemon    6 Jan 24 22:48 .SEQ
drwx------.  2 daemon daemon 4096 Jan 24 22:50 spool

Then I found this at the exact time these files were created :)

# yum history info 86
Loaded plugins: aliases, changelog, fastestmirror, langpacks, presto, refresh-packagekit, show-
              : leaves
Transaction ID : 86
Begin time     : Tue Jan 24 21:03:28 2012
Begin rpmdb    : 2036:c48592129ea6985b23d7247fddd379009e0f1141
End time       :            21:03:59 2012 (31 seconds)
End rpmdb      : 2036:c48592129ea6985b23d7247fddd379009e0f1141
User           : Xavier Hourcade <xaho>
Return-Code    : Success
Command Line   : reinstall google-earth-stable_current_x86_64.rpm
Transaction performed with:
Loading mirror speeds from cached hostfile
 * fedora: mirror.uv.es
 * updates: mirror.uv.es
    Updated       rpm-4.9.1.2-1.fc16.x86_64                     @?fedora
    Installed     yum-3.4.3-7.fc16.noarch                       @updates
    Installed     yum-plugin-aliases-1.1.31-2.fc16.noarch       @fedora
    Installed     yum-plugin-fastestmirror-1.1.31-2.fc16.noarch @fedora
Packages Altered:
    Reinstall google-earth-stable-6.1.0.5001-0.x86_64 @/google-earth-stable_current_x86_64
Scriptlet output:
   1 job 6 at Tue Jan 24 21:05:00 2012
history info

Finally, removed both strangers
# rm /var/spool/at/\=0000601519595
# rm /var/spool/at/a0000601519595

Still, I wonder what they were meant to achieve.
Is anyone from Google around ?

Comment 7 Xavier Hourcade 2012-01-30 16:42:27 UTC
Created attachment 558406 [details]
Example of a valid atd task file from google-earth rpm installation

Working or not, I find all this pretty intrusive, especially since any published documentation about it (I could find) :

> Still, I wonder what they were meant to achieve.

They were definitely generated during google-earth rpm installation, and are meant to run once the cron.daily task to achieve (I assume) post-install eg. checking/correcting repo setup. Called cron.daily I found, was empty here, thou (?)

I could not reproduce the issue with 6.1, nor 6.2 : valid atd tasks were, this time, created and destroyed very promptly after installation.

So, "something had gone wrong for some reason" on our systems, in respect to these atd task files, resulting in the extensive logging described (note, system is *not* even running 24/7 here).

# grep -c "a0000601519595 is in wrong format" /var/log/messages*
/var/log/messages:20341
/var/log/messages-20120122:0
/var/log/messages-20120129:333560

# grep -c "is in wrong format" /var/log/messages*
/var/log/messages:20341
/var/log/messages-20120122:0
/var/log/messages-20120129:333560

Comment 8 Xavier Hourcade 2012-01-30 16:55:06 UTC
Could atd somehow blacklist such 0-size task files ?
Warning just once when it starts, maybe, and keeping quiet unless they change ?

Comment 9 Anders Blomdell 2013-01-21 15:12:15 UTC
It's simple to trigger this bug, just run the following script:

#!/bin/sh
ATD_PID=$(ps -C atd -o pid=)
QUEUE=a
JOBNO=$(printf '%05x' 123)
BAD_TIME=$(expr $(date +%s) / 60 - 61)
CTM=$(printf '%08x' $BAD_TIME )
FILENAME=/var/spool/at/${QUEUE}${JOBNO}${CTM}
touch $FILENAME
chmod 0700 $FILENAME
kill -HUP ${ATD_PID}
sleep 0.5
ls -l $FILENAME
rm -f $FILENAME

AFAICT there are two bugs:

1. It looks like the at command does not validate that it has succeeded to write all data before it makes the file executable (i.e on a disk full an empty file might get created).

2. After 60 minutes (run_time + CHECK_INTERVAL <= now), atd (approx line 750) starts to unlink the lockfile and retries the execution of the job, leaving us with this unfortunate loop.

IMHO, it looks like all perr and paborts int atd.d that has a cause that is not likely to change at the next try should take some action (like clearing the execution bit or unlinking the file entirely) to stop further tries that will flood the log. Fixing at also wouldn't hurt :-)

Comment 10 Marcela Mašláňová 2013-01-21 15:38:25 UTC
I'm sorry I missed the rest of discussion in ticket. Reopening.

Comment 11 Anders Blomdell 2013-01-21 16:00:04 UTC
Created attachment 684388 [details]
Patch to unlink dead jobs less frequently

A minimal patch where we base the unlink of lockfiles on the modification date of the lockfile instead of it's execution time. Before unlinking we bump the file timestamp.

Comment 12 Anders Blomdell 2013-01-21 16:04:14 UTC
Bug is present in Fedora 18 as well (testing done with F18 SRPM on a F17 system).

Comment 13 Marcela Mašláňová 2013-01-22 15:26:37 UTC
I sent email to upstream. I'd rather see it fixed in upstream first.

Comment 14 Anders Blomdell 2013-01-23 14:55:20 UTC
Should probably be placed in (I can do that if you want me to):

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=638256

any links to where you emailed the bug?

Comment 15 Marcela Mašláňová 2013-01-23 15:27:54 UTC
I don't have account there. I usually send email to the maintainer. If you have account there, feel free to open it.

Comment 16 Fedora End Of Life 2013-04-03 19:34:19 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

Comment 17 Marcela Mašláňová 2013-12-02 15:41:36 UTC
(In reply to Anders Blomdell from comment #14)
> Should probably be placed in (I can do that if you want me to):
> 
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=638256
> 
> any links to where you emailed the bug?

I don't see any reaction in the mentioned bug, so I applied your fix. It seems to be working and more people were complaining about the behaviour.

Thanks for your patch.

Comment 18 Fedora Update System 2013-12-05 11:47:33 UTC
at-3.1.13-13.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/at-3.1.13-13.fc19

Comment 19 Fedora Update System 2013-12-05 11:48:46 UTC
at-3.1.13-14.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/at-3.1.13-14.fc20

Comment 20 Fedora Update System 2013-12-27 05:57:16 UTC
at-3.1.13-14.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 21 Fedora Update System 2014-01-01 03:32:05 UTC
at-3.1.13-13.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 22 Jozef Mlich 2017-03-10 16:49:20 UTC
Is this bug fixed in RHEL ? See discussion at https://www.centos.org/forums/viewtopic.php?t=59083 I have same problem with CentOS 7.