Description of problem:
If the rootfs is 100% full, and the "at" command is used to create a job, it'll create /var/spool/at/a<ID> but fails to write anything to it.
Result is ... a zero-byte-sized jobfile. "atd" doesn't particularly like these, it spits out endless sequences of errors.
"at" should clean up after itself when writing the jobspec spoolfile fail
The specific rev doesn't matter much, all very similar around this.
The idea is that "writefile()" creates the jobspec file 'atomically' - i.e. _removes_ it via the "fcreated" test in panic() if things go wrong.
But idea != implementation:
223 static void
224 writefile(time_t runtimer, char queue)
[ ... ]
318 if ((fd = open(atfile, O_CREAT | O_EXCL | O_TRUNC | O_WRONLY, S_IRUSR)) ==
[ ... ]
351 if ((fp = fdopen(fd, "w")) == NULL)
352 panic("Cannot reopen atjob file");
[ ... ]
337 /* We've successfully created the file; let's set the flag so it
338 * gets removed in case of an interrupt or error.
340 fcreated = 1;
[ ... none of the writes to "FILE* fp" are error-checked ... ]
478 fprintf(fp, "\n");
479 if (ferror(fp))
480 panic("Output error");
[ ... ]
The problem here is "FILE" doing internal buffering. So assume the file got created successfully - which is easily possible even on a 100% full filesystem because it's only a directory metadata operation, and the dir contents for /var/spool/at are an already-existing block.
ferror() does _not_ invoke fflush() and that means unless the jobspec file was large enough to have had an actual disk write before, this doesn't pick up any error at the point it's invoked.
fclose() as per manpage does call fflush() and does return an error - which the code here doesn't bother to check.
The result is that for a jobspec small enough (most of them are) so that all ops end up in the in-mem buffer for the FILE, then all fprintf()/fputs()/fputc() etc that are used to populate the contents of the file may actually fail, but the place where the ENOSPC would manifest itself is only the not-error-checked fclose() at #486. Tada ... zero-byte-sized atjob.
The easiest fix I can think of would be to use:
if (fflush(fp) || ferror(fp) || fclose(fp)) panic("Output error"");
instead of the ferror() test at #479, and remove the separate fclose() line.
It's conceivable that this may have worked at some foggy point in the past (on a system with a libc where ferror implied fflush), given the age of "at".
In glibc, though, ferror() only tests a status bit in FILE, no attempt to fflush() before, hence ... not following the idea as mentioned.
Another option to fix it would be to switch the FILE to unbuffered, i.e. add a
setvbuf(fp, NULL, _IONBF, 0);
after creating it. In that case, the ferror() would pick up the error. For corrrectness, I'd think my first suggestion better though because that also tests for the fclose() return value. Functionally / wrt to the desired outcome, it doesn't matter though.
The fflush() is unnecessary as it is implicit in the fclose() call.
Also fclose() returns 0 on success.
@@ -494,7 +498,8 @@ writefile(time_t runtimer, char queue)
+ if (fclose(fp) != 0)
+ panic("Output error");
/* Set the x bit so that we're ready to start executing
Sanity only, patch verified
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
*** Bug 1195314 has been marked as a duplicate of this bug. ***