Bug 1166882

Summary: "at" leaves zero-byte-sized files in /var/spool/at if filesystem is full / if jobspec can't be written
Product: Red Hat Enterprise Linux 6 Reporter: Frank Hirtz <fhirtz>
Component: atAssignee: Tomas Mraz <tmraz>
Status: CLOSED ERRATA QA Contact: Leos Pol <lpol>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.4CC: lpol, misterbonnie, ovasik, psklenar
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Fixed In Version: at-3.1.10-47.el6 Doc Type: Bug Fix
Doc Text:
Cause: The at command was not properly checking return value on fclose function call. Consequence: In case the /var/spool/at filesystem was full the at command could leave empty stale files in the spool directory. Fix: The at command now properly checks the return value from the fclose function call. Result: The are no empty files left anymore when the /var/spool/at filesystem is full.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-02-19 15:39:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1075802    

Description Frank Hirtz 2014-11-21 20:56:05 UTC
Description of problem:

If the rootfs is 100% full, and the "at" command is used to create a job, it'll create /var/spool/at/a<ID> but fails to write anything to it.

Result is ... a zero-byte-sized jobfile. "atd" doesn't particularly like these, it spits out endless sequences of errors.

"at" should clean up after itself when writing the jobspec spoolfile fail

The specific rev doesn't matter much, all very similar around this.

The idea is that "writefile()" creates the jobspec file 'atomically' - i.e. _removes_ it via the "fcreated" test in panic() if things go wrong.

But idea != implementation:

    223 static void
    224 writefile(time_t runtimer, char queue)
    225 {
[ ... ]
    318     if ((fd = open(atfile, O_CREAT | O_EXCL | O_TRUNC | O_WRONLY, S_IRUSR)) == 
[ ... ]
    351     if ((fp = fdopen(fd, "w")) == NULL)
    352     panic("Cannot reopen atjob file");
[ ... ]
    337     /* We've successfully created the file; let's set the flag so it
    338      * gets removed in case of an interrupt or error.
    339      */
    340     fcreated = 1;
[ ... none of the writes to "FILE* fp" are error-checked ... ]
    478     fprintf(fp, "\n");
    479     if (ferror(fp))
    480     panic("Output error");
 [ ... ]
    485     fclose(fp);

The problem here is "FILE" doing internal buffering. So assume the file got created successfully - which is easily possible even on a 100% full filesystem because it's only a directory metadata operation, and the dir contents for /var/spool/at are an already-existing block.

ferror() does _not_ invoke fflush() and that means unless the jobspec file was large enough to have had an actual disk write before, this doesn't pick up any error at the point it's invoked.
fclose() as per manpage does call fflush() and does return an error - which the code here doesn't bother to check.

The result is that for a jobspec small enough (most of them are) so that all ops end up in the in-mem buffer for the FILE, then all fprintf()/fputs()/fputc() etc that are used to populate the contents of the file may actually fail, but the place where the ENOSPC would manifest itself is only the not-error-checked fclose() at #486. Tada ... zero-byte-sized atjob.

The easiest fix I can think of would be to use:

if (fflush(fp) || ferror(fp) || fclose(fp)) panic("Output error"");

instead of the ferror() test at #479, and remove the separate fclose() line.

It's conceivable that this may have worked at some foggy point in the past (on a system with a libc where ferror implied fflush), given the age of "at".

In glibc, though, ferror() only tests a status bit in FILE, no attempt to fflush() before, hence ... not following the idea as mentioned.

Another option to fix it would be to switch the FILE to unbuffered, i.e. add a

  setvbuf(fp, NULL, _IONBF, 0);

after creating it. In that case, the ferror() would pick up the error. For corrrectness, I'd think my first suggestion better though because that also tests for the fclose() return value. Functionally / wrt to the desired outcome, it doesn't matter though.

Comment 2 Tomas Mraz 2014-11-24 15:11:30 UTC
The fflush() is unnecessary as it is implicit in the fclose() call.

Also fclose() returns 0 on success.

Proper fix:

@@ -494,7 +498,8 @@ writefile(time_t runtimer, char queue)
     if (ferror(stdin))
        panic("Input error");

-    fclose(fp);
+    if (fclose(fp) != 0)
+       panic("Output error");

     /* Set the x bit so that we're ready to start executing

Comment 4 Leos Pol 2015-02-11 14:20:00 UTC
Sanity only, patch verified

Comment 6 errata-xmlrpc 2015-02-19 15:39:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Comment 7 Tomas Mraz 2015-02-23 15:17:24 UTC
*** Bug 1195314 has been marked as a duplicate of this bug. ***