Al Viro talked to me earlier this week about a sysctl flaw. Below is an edited
extract from his messages. I'm going to assign this "moderate" because it
requires an interface to go away and most likely would lead to DoS. Not sent
We have an exploitable hole in sysctl unregistration. There is no exclusion
whatsoever between procfs or sysctl(2) access to ctl_table instances, methods
*and* data and unregister_sysctl_table().
IOW, there's nothing to stop inetdev_destroy() in the middle of access to
/proc/sys/net/ipv4/conf/<interface>/<whatever> or in the middle of corresponding
sysctl. In the best case that's an oopsable hole. In the worst... You can
open such file, wait for interface to go away, try to grab as much memory as
possible in hope to hit the (kfreed) ctl_table. Then fill it with pointers to
your function. Then do read from file you've opened and if you are lucky, you'll
get it called as ->proc_handler() in kernel mode.
Realistically, I see only two ways to deal with that: have
unregister_sysctl_table() block until accesses in progress are finished (after
making sure there will be no new ones - we can do that, at least) or have it
return -EBUSY and let callers retry in whatever ways they like.
IMO the latter is only going to cause more bugs. Note that
unregister_sysctl_table() _already_ is process-synchronous since it removes
procfs entries. Moreover, on a preempt kernel it _already_ blocks. What I
would like to do is the same kind of extremely unfair rwsem we are using for
net_device removal; in case of sysctl we _do_ have dynamic-only structure to
work with - table headers. So it's nowhere near the horror level of net_device
sweep; we can get away with all changes local to kernel/sysctl.c and the data
structure private to it.
The real question is whether anything with dynamic sysctls wants to do them
under a spinlock. Note that we'll only block in the cases that right now would
lead to access to kfreed objects, so we don't get any regressions. Opened
procfs files will *not* block that sucker - only access itself would (I'm going
to stick pointer to table header into ->private_data, have it pin down the
header and have it check the "going away" flag before access).
Fortunately, very few places really want unregister_sysctl_table() on dynamic
data structures. Most of them are in net/*. We do have one done under a
spinlock in arch/s390/appldata/appldata_base.c; that sucker tries to protect
itself, but still doesn't get it right (moreover, it can't be fixed that way -
the fundamental problem is that driver doesn't know that ctl_table is about to
get used and it's freeing the table itself that gets you). We also have a lot
of sloppy parport code (another source of dynamic sysctls)
third-party code might be affected - trading triggerable oops (if not worse) for
potential deadlock in narrow subset of previously vulnerable scenarios.
Created attachment 119069 [details]
First draft proposed patch
*** Bug 170243 has been marked as a duplicate of this bug. ***
public, removing embargo
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.