608641 – vegas and veno possible division by zero bug

Bug 608641 - vegas and veno possible division by zero bug

Summary: vegas and veno possible division by zero bug

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.3
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Neil Horman
QA Contact:	Eryu Guan
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-06-28 11:21 UTC by Veaceslav Falico
Modified:	2018-11-14 19:37 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2011-01-13 21:39:46 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
proposed patch (1.67 KB, patch) 2010-08-09 15:17 UTC, Veaceslav Falico	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2011:0017	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 5.6 kernel security and bug fix update	2011-01-13 10:37:42 UTC

Description Veaceslav Falico 2010-06-28 11:21:09 UTC

When trying to avoid the zero-case in tcp_vegas_rtt_calc (upstream tcp_vegas_pkts_acked ) of an unsigned RTT sample, we just add 1 to the unsigned value, which can lead to 0-case when we have the MAX_U32 RTT sample, caused by a really rare situation or a bug in other code. In the upstream it's fixed with a) signed value b) if (RTT_SAMPLE < 0) return;, so it doesn't occur.

The codepath is:

1) When using the tcp_vegas congestion control, we have the rtt_sample set to tcp_vegas_rtt_calc function, and it's called in tcp_clean_rtx_queue with the socket and RTT difference between now and sockets.

2) In the tcp_vegas_rtt_calc we have:

123 static void tcp_vegas_rtt_calc(struct sock *sk, u32 usrtt)
124 {   
125     struct vegas *vegas = inet_csk_ca(sk);
126     u32 vrtt = usrtt + 1; /* Never allow zero rtt or baseRTT */
127 
128     /* Filter to find propagation delay: */
129     if (vrtt < vegas->baseRTT)
130         vegas->baseRTT = vrtt;
131 
132     /* Find the min RTT during the last RTT to find
133      * the current prop. delay + queuing delay:
134      */
135     vegas->minRTT = min(vegas->minRTT, vrtt);

So if we receive the usrtt == MAX_U32, then we have minRTT == 0.

3) When the cong_avoid (tcp_vegas_cong_avoid) is called, we have:

245             rtt = vegas->minRTT;
<snip comments>
255             target_cwnd = ((old_wnd * vegas->baseRTT)
256                        << V_PARAM_SHIFT) / rtt;

So that we get a division by zero.

Comment 2 Veaceslav Falico 2010-08-02 10:47:47 UTC

Hello,

The customer confirmed that with the 

if (vrtt == 0)
    vrtt = 1;

patch in tcp_vegas_rtt_calc() the problem does not occur.

Is there anything else I can do?

Thank you!

Comment 3 Veaceslav Falico 2010-08-09 15:17:05 UTC

Created attachment 437620 [details]
proposed patch

Hello,

The upstream is quite heavily modified (and the logic of the caller of tcp_vegas_rtt_calc also), so that I've tried to take only the parts that affect us and our bug. In the upstream if we see that the time diff of rtt is <=0, we just return without any warnings, considering it to be just a bogus value.

Please review and say if testing is needed.

Thank you!

Comment 5 Neil Horman 2010-08-11 19:38:03 UTC

http://brewweb.devel.redhat.com/brew/taskinfo?taskID=2676539
test build

Comment 6 RHEL Program Management 2010-08-12 18:19:22 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 12 Jarod Wilson 2010-08-31 01:17:49 UTC

in kernel-2.6.18-214.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 14 Eryu Guan 2010-11-09 06:32:18 UTC

Following steps in comment 9, got panic on -194 kernel 
Code: f7 75 10 8d 14 09 8b 8b d4 04 00 00 29 c2 3b 8b d0 04 00 00 
RIP  [<ffffffff8855419d>] :tcp_vegas:tcp_vegas_cong_avoid+0x82/0x14d
 RSP <ffff810001757b00>
 hit
<0>Kernel panic - not syncing: Fatal exception

Code: 66 83 7f 02 02 77 18 55 89 d8 51 8b 4c 24 08 8b 54 24 0c e8 6b 6f b9 c7 59 5b e9 c1 00 00 00 8d 0c 36 31 d2 0f af 77 08 8d 04 36 <f7> 77 04 29 c1 89 4f 10 8b 93 6c 03 00 00 3b 93 68 03 00 00 77
EIP: [<f8a5e148>] tcp_veno_cong_avoid+0xac/0x16f [tcp_veno] SS:ESP 0068:c074adcc
 <0>Kernel panic - not syncing: Fatal exception in interrupt

Confirmed there was no panic on -230 kernel.

Comment 16 errata-xmlrpc 2011-01-13 21:39:46 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html

Note You need to log in before you can comment on or make changes to this bug.