Bug 718422
Summary: | localhost atd[19184]: File a0000f0149b7f3 is in wrong format - aborting | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | edo <edosurina> | ||||||||
Component: | at | Assignee: | Marcela Mašláňová <mmaslano> | ||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | unspecified | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 19 | CC: | anders.blomdell, edosurina, evi.linux.redhat.bugzilla.01, jmlich83, mmaslano, public.oss | ||||||||
Target Milestone: | --- | Keywords: | Reopened | ||||||||
Target Release: | --- | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | at-3.1.13-13.fc19 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | |||||||||||
: | 977510 (view as bug list) | Environment: | |||||||||
Last Closed: | 2013-12-05 11:40:08 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 977510 | ||||||||||
Attachments: |
|
Description
edo
2011-07-02 18:11:22 UTC
Created attachment 511134 [details]
messages log file
messages log file
Would you mind create a reproducer? What did you have in this job? I wasn't even able produce message "File a0000f0149b7f3 is in wrong format". I can't reproduce it on my machines. If you still have problems and you have reproducer, than please feel free to re-open. (In reply to comment #3) > I can't reproduce it on my machines. If you still have problems and you have > reproducer, than please feel free to re-open. I don't know what I must do to create a reproducer... I'm sorry, but I wasn't able to reproduce it. You didn't gave me enough data how it happened. If you have valid reproducer, please feel free to reopen. Hi, I've found hundred of these messages (per second !) here under F16 KDE in /var/log/messages. They came to my attention while printed on screen during boot, in a tied loop cf. #704658 Jan 29 19:53:12 venus atd[6970]: File a0000601519595 is in wrong format - aborting Jan 29 19:53:12 venus atd[6971]: File a0000601519595 is in wrong format - aborting Jan 29 19:53:12 venus atd[6972]: File a0000601519595 is in wrong format - aborting # ls -la /var/spool/at/ total 16 drwx------. 3 daemon daemon 4096 Jan 29 18:53 . drwxr-xr-x. 15 root root 4096 Nov 3 03:30 .. -rwx------. 2 root root 0 Jan 24 21:03 =0000601519595 -rwx------. 2 root root 0 Jan 24 21:03 a0000601519595 -rw-------. 1 daemon daemon 6 Jan 24 22:48 .SEQ drwx------. 2 daemon daemon 4096 Jan 24 22:50 spool Then I found this at the exact time these files were created :) # yum history info 86 Loaded plugins: aliases, changelog, fastestmirror, langpacks, presto, refresh-packagekit, show- : leaves Transaction ID : 86 Begin time : Tue Jan 24 21:03:28 2012 Begin rpmdb : 2036:c48592129ea6985b23d7247fddd379009e0f1141 End time : 21:03:59 2012 (31 seconds) End rpmdb : 2036:c48592129ea6985b23d7247fddd379009e0f1141 User : Xavier Hourcade <xaho> Return-Code : Success Command Line : reinstall google-earth-stable_current_x86_64.rpm Transaction performed with: Loading mirror speeds from cached hostfile * fedora: mirror.uv.es * updates: mirror.uv.es Updated rpm-4.9.1.2-1.fc16.x86_64 @?fedora Installed yum-3.4.3-7.fc16.noarch @updates Installed yum-plugin-aliases-1.1.31-2.fc16.noarch @fedora Installed yum-plugin-fastestmirror-1.1.31-2.fc16.noarch @fedora Packages Altered: Reinstall google-earth-stable-6.1.0.5001-0.x86_64 @/google-earth-stable_current_x86_64 Scriptlet output: 1 job 6 at Tue Jan 24 21:05:00 2012 history info Finally, removed both strangers # rm /var/spool/at/\=0000601519595 # rm /var/spool/at/a0000601519595 Still, I wonder what they were meant to achieve. Is anyone from Google around ? Created attachment 558406 [details] Example of a valid atd task file from google-earth rpm installation Working or not, I find all this pretty intrusive, especially since any published documentation about it (I could find) : > Still, I wonder what they were meant to achieve. They were definitely generated during google-earth rpm installation, and are meant to run once the cron.daily task to achieve (I assume) post-install eg. checking/correcting repo setup. Called cron.daily I found, was empty here, thou (?) I could not reproduce the issue with 6.1, nor 6.2 : valid atd tasks were, this time, created and destroyed very promptly after installation. So, "something had gone wrong for some reason" on our systems, in respect to these atd task files, resulting in the extensive logging described (note, system is *not* even running 24/7 here). # grep -c "a0000601519595 is in wrong format" /var/log/messages* /var/log/messages:20341 /var/log/messages-20120122:0 /var/log/messages-20120129:333560 # grep -c "is in wrong format" /var/log/messages* /var/log/messages:20341 /var/log/messages-20120122:0 /var/log/messages-20120129:333560 Could atd somehow blacklist such 0-size task files ? Warning just once when it starts, maybe, and keeping quiet unless they change ? It's simple to trigger this bug, just run the following script: #!/bin/sh ATD_PID=$(ps -C atd -o pid=) QUEUE=a JOBNO=$(printf '%05x' 123) BAD_TIME=$(expr $(date +%s) / 60 - 61) CTM=$(printf '%08x' $BAD_TIME ) FILENAME=/var/spool/at/${QUEUE}${JOBNO}${CTM} touch $FILENAME chmod 0700 $FILENAME kill -HUP ${ATD_PID} sleep 0.5 ls -l $FILENAME rm -f $FILENAME AFAICT there are two bugs: 1. It looks like the at command does not validate that it has succeeded to write all data before it makes the file executable (i.e on a disk full an empty file might get created). 2. After 60 minutes (run_time + CHECK_INTERVAL <= now), atd (approx line 750) starts to unlink the lockfile and retries the execution of the job, leaving us with this unfortunate loop. IMHO, it looks like all perr and paborts int atd.d that has a cause that is not likely to change at the next try should take some action (like clearing the execution bit or unlinking the file entirely) to stop further tries that will flood the log. Fixing at also wouldn't hurt :-) I'm sorry I missed the rest of discussion in ticket. Reopening. Created attachment 684388 [details]
Patch to unlink dead jobs less frequently
A minimal patch where we base the unlink of lockfiles on the modification date of the lockfile instead of it's execution time. Before unlinking we bump the file timestamp.
Bug is present in Fedora 18 as well (testing done with F18 SRPM on a F17 system). I sent email to upstream. I'd rather see it fixed in upstream first. Should probably be placed in (I can do that if you want me to): http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=638256 any links to where you emailed the bug? I don't have account there. I usually send email to the maintainer. If you have account there, feel free to open it. This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle. Changing version to '19'. (As we did not run this process for some time, it could affect also pre-Fedora 19 development cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.) More information and reason for this action is here: https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19 (In reply to Anders Blomdell from comment #14) > Should probably be placed in (I can do that if you want me to): > > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=638256 > > any links to where you emailed the bug? I don't see any reaction in the mentioned bug, so I applied your fix. It seems to be working and more people were complaining about the behaviour. Thanks for your patch. at-3.1.13-13.fc19 has been submitted as an update for Fedora 19. https://admin.fedoraproject.org/updates/at-3.1.13-13.fc19 at-3.1.13-14.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/at-3.1.13-14.fc20 at-3.1.13-14.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report. at-3.1.13-13.fc19 has been pushed to the Fedora 19 stable repository. If problems still persist, please make note of it in this bug report. Is this bug fixed in RHEL ? See discussion at https://www.centos.org/forums/viewtopic.php?t=59083 I have same problem with CentOS 7. |