Description of problem: Reading another process's /proc/<pid>/loginuid appears to race with updating the /proc/<pid>/loginuid. Following is an IRC analysis by Serge Hallyn: > but it seems an obvious race to me > between both kernel/auditsc.c:audit_syscall-exit and syscall_enter, > and the loginuid_read . . . > i wasn't saying there *may* be a race, I'm saying there *has* to be one. > so long as someone cats /proc/pid/loginuid while pid is between (in my > version) lines 1177 and 1192 of kernel/auditsc.c. Why audit_get_context > sets tsk->audit_context to NULL I don't know - maybe if it just didn't do > that you'd be fine. Earlier in that tread we discussed that the auid is probably better suited for the task struct than the audit context as its lifetime is different than the audit context. Here are further comments from Klaus Weidner: > Thanks to Serge's confirmation, I think the scope of the clone/auid > problem is now sufficiently clear. The race condition affects reading the > /proc/PID/loginuid file for a process other than the reading process. In > a very small time window, reading the loginuid file will give the value > "4294967295" (-1) instead of the correct value. This affects only using > the /proc/ interface to read the value; audit records will always have > the correct auid since they do not use this interface. > . . . > > According to Serge's analysis, the problem is in kernel/auditsc.c. > audit_get_context() sets tsk->context to NULL (for unknown reasons), so > while audit_syscall_exit() is executing there is a window from the call > to audit_get_context() until the t-sk->audit_context=context line that it > is NULL. During this time, audit_get_loginuid() will return -1: > > return ctx ? ctx->loginuid : -1; > > There's no locking, so the proc_loginuid_read() function in > fs/proc/base.c can call the function during the invalid time window. > > By similar reasoning, I think audit_set_loginuid() looks unsafe also, but > since this is a privileged and rare operation it's probably not very > relevant in real life. It may not even be triggerable since it's > forbidden to set the loginuid of other processes. Version-Release number of selected component (if applicable): 2.6.18-8.1.3.lspp.80.el5 How reproducible: This was reproduced in a testcase. Essentially read another process's /proc/<pid>/loginuid repeatedly while changing audit rules. Steps to Reproduce: 1. Read /proc/<pid>/loginuid repeatedly from a process other than the owning process. 2. Change audit rules while in the loginuid read loop. 3. You will see times when the loginuid is -1. Actual results: The /proc/<pid>/loginuid read is sometimes -1. Expected results: No race - /proc/<pid>/loginuid is stable. Additional info:
https://www.redhat.com/archives/linux-audit/2007-October/msg00011.html
maybe for upstream, but we can't change the task_struct in RHEL5. So I think this patch is the way to go. And it should go upstream until someone makes a better fix. at the moment since audit is the only consumer of loginuid i'm not surprised at all it is in the audit context. Maybe if there was another user or planned user it would have gone into the task_struct to start....
I can no longer provoke the race with the referenced patch applied. Looks like a reasonable fix to me.
in 2.6.18-64.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0314.html