Bug 1822593

Summary: kernel: race condition in kernel/audit.c may allow low privilege users trigger kernel panic
Product: [Other] Security Response Reporter: Guilherme de Almeida Suckevicz <gsuckevi>
Component: vulnerabilityAssignee: Nobody <nobody>
Status: NEW --- QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: unspecifiedCC: 530415489, acaringi, airlied, bhu, blc, dvlasenk, hdegoede, hkrzesin, itamar, jarodwilson, jeremy, jforbes, jlelli, john.j5live, jonathan, josef, jross, jshortt, jstancek, jwboyer, kernel-maint, kernel-mgr, lgoncalv, linville, masami256, mchehab, mjg59, mlangsdo, mvanderw, nmurray, pmatouse, qzhao, rbriggs, rt-maint, rvrbovsk, sgrubb, steved, williams, wmealing
Target Milestone: ---Keywords: Security
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
A flaw was found in the implementation of audit service where it may be possible to exceed the number of events while the audit service is being restarted (ie, while it is being upgraded) that could allow a local user to panic the system.
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1825110, 1825111, 1825156    
Bug Blocks: 1819337    

Description Guilherme de Almeida Suckevicz 2020-04-09 12:59:44 UTC
A race condition was found in the Linux kernel audit subsystem. When the system is configured to panic on events being dropped, an attacker who is able to trigger an audit event that starts while auditd is in the process of starting may be able to cause the system to panic by exploiting a race condition in audit event handling. This creates a denial of service by causing a panic.

Comment 1 Guilherme de Almeida Suckevicz 2020-04-09 13:03:38 UTC
Acknowledgments:

Name: Weichen.Chen (Alibaba)

Comment 5 Petr Matousek 2020-04-17 08:56:15 UTC
External References:

https://www.openwall.com/lists/oss-security/2020/04/17/1

Comment 6 Petr Matousek 2020-04-17 08:57:03 UTC
Created kernel tracking bugs for this issue:

Affects: fedora-all [bug 1825156]

Comment 19 Petr Matousek 2020-05-26 14:49:14 UTC
Mitigation:

To exploit this flaw, the attacker must have configured the  set audit-failure to AUDIT_FAIL_PANIC(2). Configuring this to not panic on failure while restarting the daemon will result in the server surviving audit event failure during heavy audit events even when the daemon is restarting.

Comment 20 Steve Grubb 2020-05-27 14:21:57 UTC
I think this report is inaccurate in its root cause analysis and possibly disputed. The script given does this: 

while true; do ps aux | grep "/sbin/auditd" | grep -v "grep" | awk '{print $2}' | xargs kill; service auditd start; systemctl reset-failed auditd.service; done

It is never expected that an admin directly send a signal to the audit daemon. They could, but that is not how it is intended to be used. It is documented that you need to use the service command to start or stop the daemon. When you run the stop code, it winds up here:

https://github.com/linux-audit/audit-userspace/blob/2.8_maintenance/init.d/auditd.stop

Notice there is a tail command which is to specifically wait until the audit daemon exits. The problem that the reporter is seeing is caused by a second instance of the audit daemon being started, due to a scripting mistake, which causes the kernel to have to figure out who it should send events to, but the netlink table is not finished being fully updated. The result is -ECONNREFUSED error code. We only allow for 5 retries and then it calls audit_log_lost(). So, if there were 5 events in the backlog, it will cycle through those quickly. Normally, people do not configure the audit system in a way that generates thousands of events per second. That would fill your hard drive and be meaningless. So, the reproducer is again using a configuration that would be abnormal, but we have to consider the possibility that unmounting a drive with watches on it could trigger rapid number of events. But the root of the issue is simply the front and back of netlink haven't sync'ed and the root cause is a second audit daemon slipping in sometimes.

An unprivileged user cannot trigger this. An unprivileged user could trip over an audit rule which creates an event which goes into the backlog. But the issue is the -ECONNREFUSED error code resulting from an admin action.

If an admin uses the audit system by way of the service command, this would not happen.

Comment 21 Wade Mealing 2020-06-10 01:46:37 UTC
With this in mind then, I'm going to request this issue be "DISPUTED" from the CVE database.

Comment 22 msiddiqu 2020-06-16 02:58:57 UTC
In reply to comment #21:
> With this in mind then, I'm going to request this issue be "DISPUTED" from
> the CVE database.

CVE-2020-10708 was Rejected. Removed alias from this flaw bug.