See bug #515521, comments #48, #49, #51 and #52
glibc is now forking a ptchown program under certain circumstances
qemu is leaking file descriptors to that process because we do not set O_CLOEXEC on all file descriptors
In bug #515521, looking at the selinux policy added, we're leaking a tap fds, the pidfile fd and image file fds
We should go through qemu code and make sure O_CLOEXEC gets set
Markus, Juan, Kevin, Justin - any of you fancy taking this?
Hm, I'm not sure I agree with what glibc is doing here. Basically they want to forbid non-O_CLOEXEC files for programs - and what if there are descriptors that we do want to pass to processes that we fork ourselves, just not to glibc's silently forked processes?
I think we can work around it in some places, but for example the Linux userspace emulator looks like something where we simply _can't_ add O_CLOEXEC without breaking things.
(In reply to comment #1)
> Hm, I'm not sure I agree with what glibc is doing here. Basically they want to
> forbid non-O_CLOEXEC files for programs - and what if there are descriptors
> that we do want to pass to processes that we fork ourselves, just not to
> glibc's silently forked processes?
Sounds reasonable to me. Moving back to glibc
Well if we know of one file descriptor that can not be closed, we can dontaudit the leak. from svirt_t to ptchown_t
You can reset the CLOEXEC flag for those cases when it should really matter.
The problem is simple: the default of all file descriptors from the first days of unix should have been to set CLOEXEC. Then is the very, very few places where this flag is needed it can be set. These places are trivially to find because the child won't have the descriptor and won't work.
Therefore, in code like qemu, the default should implicitly be changed to always set CLOEXEC (through a wrapper function to open etc or whatever).
Again, there are unlikely any places where this flag has to be unset and if there are, they are trivially to find.
Regardless of what the default for CLOEXEC is or should be, and without disputing the thesis that many, perhaps most applications would be better off have all their file descriptors CLOEXEC at all times, it is entirely legitimate for an application to have CLOEXEC off on some of its file descriptors. By keeping it off, the application assumes responsibility for controlling its flow across exec() manually.
However, it can do so only for the exec()s it actually sees. If a library exec()s behind the application's back, then the library is responsible for controlling the flow of file descriptors across that exec(). The library cannot know which file descriptors are CLOEXEC, because that's an application decision (regardless of what the default for CLOEXEC is). Thus, the library must assume the application's file descriptors are not CLOEXEC, and close all of them.
If a library doesn't do that, how's the application supposed to keep a file descriptor open safely across one of its own exec()s? The only way I can see is this: somehow lock out the library to prevent it from execing behind the application's back, clear CLOEXEC, exec(). Doesn't look practical to me.
Once again, I'm all for applications CLOEXECing all the file descriptors they don't need to keep open across an exec(). I just want to know how to deal with the ones we do want to keep open, safely, and in presence of libraries that do their own exec()ing.
(In reply to comment #5)
> However, it can do so only for the exec()s it actually sees. If a library
> exec()s behind the application's back, then the library is responsible for
> controlling the flow of file descriptors across that exec().
Do you think it is at all likely that any reasonable code will call exec() somewhere in a library and expect the child to inherit some file descriptors without controlling them? I think this is highly unlikely at best and broken in all cases.
How would such a child process learn about the file descriptor is the first place? We are not talking about 0 to 2, which are special and usually shouldn't be CLOEXEC. Any other descriptor can and often will have a different ID in different runs. The runtime can implicitly open descriptors and shift the values for the descriptors available to the application. Therefore there has to be somehow a reference to the inherited descriptor being passed to the library (parameter, static var, ...) and then the library either dup2()s the descriptor or passes a parameter (textual or so) describing the descriptor value to the child). In that case the library can and must make sure the descriptor is actually inherited. Nothing else is save. And in any case, the application knows that the file descriptor is used this way.
I recognize that there is the _potential_ for such a problem which requires code changes. But a) it is unlikely and b) confining a security critical program like qemu as tightly as possible is crucial for system security.
(In reply to comment #6)
> Do you think it is at all likely that any reasonable code will call exec()
> somewhere in a library and expect the child to inherit some file descriptors
> without controlling them?
No, definitely not. The point is that the library assumes that the application doesn't need any non-O_CLOEXEC file descriptor while doing an exec() that the application didn't explicitly ask for and is probably unaware of it. Which in my opinion is a wrong assumption. I'm not saying that I know the perfect solution, but I do see a problem here.
That said, I see no reason why, say, image files shouldn't be O_CLOEXEC in qemu. We can do better here. But even if we can use O_CLOEXEC here, I think the fundamental problem still remains.
Given that the application may have good reason to have some file descriptors not marked with O_CLOEXEC - and that the application knows nothing of pt_chown or any other glibc helper programs - it is only possible for glibc itself to ensure that no file descriptors are leaked to the pt_chown process.
Seems clear to me that glibc really needs to do the "close every file descriptor I'm not interested in to pass on" dance before execing a helper program like pt_chown.
(In reply to comment #7)
> That said, I see no reason why, say, image files shouldn't be O_CLOEXEC in
> qemu. We can do better here.
Right, let's go ahead and do that.
> But even if we can use O_CLOEXEC here, I think the fundamental problem still
Indeed, I think we should have a separate "close all file descriptors before execing pt_chown" bug filed against glibc.
Perhaps what we actually need is a new clone flag which says 'close all fds', for library helpers to use.
Or if these helpers actually _need_ one or more fds to be cloned, then maybe something more complex -- like a way to specify which fd(s) are to be kept open.
I quite agree that QEMU should make sure not to leak fds across its exec()s, and I also agree that CLOEXEC is a useful tool for that. I don't know whether it actually leaks any right now. Let's find out. We can use this bug to track. Even if it doesn't leak, preventive use of CLOEXEC makes sense. Upstream job.
Independent of that, libraries should also make sure not leak fds across their exec()s, glibc for pt_chown in particular. I think Uli made the case for that quite convincingly in comment#6. Do we have a bug for tracking the glibc/pt_chown issue already?
(In reply to comment #11)
> Independent of that, libraries should also make sure not leak fds across their
> exec()s, glibc for pt_chown in particular. I think Uli made the case for that
> quite convincingly in comment#6. Do we have a bug for tracking the
> glibc/pt_chown issue already?
Filed bug #530558 now
A patch to add FD_CLOEXEC was committed upstream as 40ff6d7e8dceca227e7f8a3e8e0d58b2c66d19b4. Mark, I'll leave it to you if you want to cherry-pick the patch from there or wait for the next rebase.
This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component.
This patch is in 0.12.x which is currently available in rawhide and in updates-testing for Fedora 12.