477744 – (CVE-2008-5713) CVE-2008-5713 kernel: soft lockup occurs when network load is very high

Bug 477744 (CVE-2008-5713) - CVE-2008-5713 kernel: soft lockup occurs when network load is very high

Summary: CVE-2008-5713 kernel: soft lockup occurs when network load is very high

Keywords:
Status:	CLOSED ERRATA
Alias:	CVE-2008-5713
Product:	Security Response
Classification:	Other
Component:	vulnerability
Sub Component:
Version:	unspecified
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Red Hat Product Security
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	471398 477745 477746
Blocks:
TreeView+	depends on / blocked

Reported:	2008-12-23 09:08 UTC by Eugene Teo (Security Response)
Modified:	2021-11-12 19:54 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2010-12-21 17:55:22 UTC
Embargoed:

Attachments	(Terms of Use)
reproducer (1.03 KB, text/plain) 2008-12-23 09:09 UTC, Eugene Teo (Security Response)	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2009:0264	0	normal	SHIPPED_LIVE	Important: kernel security update	2009-02-10 15:51:56 UTC

Description Eugene Teo (Security Response) 2008-12-23 09:08:04 UTC

From Flavio Leitner:
On many core SMP machine (such as 16 core or more), soft lockup can occur when heavy network load are produced concurrently.

The lockup happens at __qdisc_run()(@net/sched/sch_generic.c:line 84). Because driver continue to send packet and return NETDEV_TX_OK, __qdisc_run() can't exit from qdisc_restart() loop.

This behavior may improve throughput, but some application can stuck over 10s.
This issue has been fixed on vanilla kernel.

Version-Release number of selected component:
kernel version: 2.6.18-92.el5 (RHEL5.2GA)

How reproducible:
It can be reproducible in dozens of seconds, on 16 core SMP box. This issue is easy to happen, when UDP workload is very high.

Steps to Reproduce:
On 16 core SMP machine, execute netperf in higher than 16 parallel with the following options, then it occurs at a client side.

# netperf -H <netserver_address> -l 60 -t UDP_STREAM -- -s 262144 -r 262144 -m
16384

Actual results:
A lot of soft lockup messages are recorded into syslog, and performance problem appears in some applications.

Expected results:
In kernel, any CPU doesn't dedicate to some work without schedule() for a long time.

Hardware info:
Express5800/140Rf-4

Business impact:
It makes customer's applications unresponsive too long and it makes impossible to apply RHEL5.2 to performance/latency sensitive systems.

Additional info:
git patch: [NET]: Add preemption point in qdisc_run
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=2ba2506ca7ca62c56edaa334b0fe61eb5eab6ab0

Comment 1 Eugene Teo (Security Response) 2008-12-23 09:09:55 UTC

Created attachment 327745 [details]
reproducer

Comment 2 Eugene Teo (Security Response) 2008-12-23 09:10:23 UTC

(In reply to comment #1)
> Created an attachment (id=327745) [details]
> reproducer

Note netperf must be installed. Reproducer triggers the loockup for me safely with 40 tasks (the second cmd line parameter) on 16cpu machine. The first parameter is hostname of computer where "netserver" (part of netperf package) is running.

Comment 10 Vincent Danen 2010-12-21 17:55:22 UTC

This was addressed via:

Red Hat Enterprise Linux version 5 (RHSA-2009:0264)

Note You need to log in before you can comment on or make changes to this bug.