Bug 21216

Summary: Nasty Problem with fclose/ferror combination
Product: [Retired] Red Hat Linux Reporter: Garance A Drosehn <gad>
Component: glibcAssignee: Jakub Jelinek <jakub>
Status: CLOSED NOTABUG QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 7.0   
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2000-11-22 05:02:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Garance A Drosehn 2000-11-22 01:23:40 UTC
I went to port some code from freebsd to linux, for use on
linux machines here at RPI.  (in my case, I am porting our
own (RPI) version of lpr&lpd to linux).

The freeBSD code has the following:
 	 (void)fclose(pfp);
 	 if (ferror(pfp)) {
 	 	 ...do stuff...
 	 }

First off, I will freely admit that this code is WRONG.  It
happens to have worked on freebsd, and some other operating
systems I have ported the code to, but now that I see that
code I realize that it is wrong, since it is calling ferror
on a i/o stream which has already been closed.

The problem is how linux (at least on redhat 7.0) reacts to
that code.  Apparently the call to ferror with that bad I/O
stream can cause linux to corrupt some internal data structure
somewhere.  I do not have time to track down where.

The effect of this is very bizzare problems, very far from
this specific call to ferror.  Sometimes it can cause syslog
messages (elsewhere in my lpd) to be corrupted.  Most often,
it causes problems when fopen-ing a new stream.  The odd
thing is that it seems you have to fopen (and leave open) more
than one stream for problems to show up.  And the problem that
does show up is that the process immediately dies, in the
middle of some call to fopen.  Gone.  Goodbye.  (and not
always, but only sometimes).

Once I painfully tracked my disappearing-process problems
back to the above code, I simply changed the code so that
it called ferror before calling fclose.  All my mysterious
problems went away.

So, I realize this is a little short on specifics, but I would
suggest that someone check to see what ferror does when given
a recently-closed i/o stream (or for that matter, a pointer
into random memory).  I don't care too much what it returns
in that case (although I would say it should always return
"an error", since the file-pointer is invalid).  I wouldn't
even mind if the process died RIGHT THERE, to indicate right
where the error is.  Just make sure that such a call will not
corrupt any internal data structures, thus causing headaches
far away from the code which has the actual bug.

Apparently this combination of an fclose immediately followed
by an ferror might appear in other programs, because some
people I've talked to seem to think it's "okay, even though
it's stupid".  This specific bit of code has been in freebsd
for at least six years with no problem, and it also works (or
at least, doesn't seem to cause problems) on several versions
of AIX, Solaris, SunOS (pre-solaris), and IRIX to which I have
ported this RPI version of lpr&lpd.

If you're curious, the bug in freebsd is in the cgetnext routine,
in the getcap.c module, in libc.  If linux HAS a cgetnext routine
in libc (I never did check...), you might want to eyeball that
too, just to see if it has the same error.

Comment 1 Jakub Jelinek 2000-11-22 07:37:22 UTC
libc is allowed to start nethack, format your disks, whatever it wants
in this case.
The fact that it magically works on some other system means nothing,
the results of such operation are undefined. glibc in ferror
has to acquire lock of the stream in question first (thus writes into
memory). Perhaps other systems either don't care about multiple
threads (and do no locking) or slow each operation down (by checking
if the file descriptor is valid at the start of every single routine).
You can turn some of such checks by recompiling glibc with IO_DEBUG,
but as such checks just catch some cases and can pass even on invalid
FILE descriptors and also slow things down, they are not enabled by
default.
So think about ferror on fclosed FILE as if you put random garbage
into that memory area yourself.