Bug 608641 - vegas and veno possible division by zero bug
Summary: vegas and veno possible division by zero bug
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel   
(Show other bugs)
Version: 5.3
Hardware: All Linux
Target Milestone: rc
: ---
Assignee: Neil Horman
QA Contact: Eryu Guan
Depends On:
TreeView+ depends on / blocked
Reported: 2010-06-28 11:21 UTC by Veaceslav Falico
Modified: 2018-11-14 19:37 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2011-01-13 21:39:46 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
proposed patch (1.67 KB, patch)
2010-08-09 15:17 UTC, Veaceslav Falico
no flags Details | Diff

External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0017 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.6 kernel security and bug fix update 2011-01-13 10:37:42 UTC

Description Veaceslav Falico 2010-06-28 11:21:09 UTC
When trying to avoid the zero-case in tcp_vegas_rtt_calc (upstream tcp_vegas_pkts_acked ) of an unsigned RTT sample, we just add 1 to the unsigned value, which can lead to 0-case when we have the MAX_U32 RTT sample, caused by a really rare situation or a bug in other code. In the upstream it's fixed with a) signed value b) if (RTT_SAMPLE < 0) return;, so it doesn't occur.

The codepath is:

1) When using the tcp_vegas congestion control, we have the rtt_sample set to tcp_vegas_rtt_calc function, and it's called in tcp_clean_rtx_queue with the socket and RTT difference between now and sockets.

2) In the tcp_vegas_rtt_calc we have:

123 static void tcp_vegas_rtt_calc(struct sock *sk, u32 usrtt)
124 {   
125     struct vegas *vegas = inet_csk_ca(sk);
126     u32 vrtt = usrtt + 1; /* Never allow zero rtt or baseRTT */
128     /* Filter to find propagation delay: */
129     if (vrtt < vegas->baseRTT)
130         vegas->baseRTT = vrtt;
132     /* Find the min RTT during the last RTT to find
133      * the current prop. delay + queuing delay:
134      */
135     vegas->minRTT = min(vegas->minRTT, vrtt);

So if we receive the usrtt == MAX_U32, then we have minRTT == 0.

3) When the cong_avoid (tcp_vegas_cong_avoid) is called, we have:

245             rtt = vegas->minRTT;
<snip comments>
255             target_cwnd = ((old_wnd * vegas->baseRTT)
256                        << V_PARAM_SHIFT) / rtt;

So that we get a division by zero.

Comment 2 Veaceslav Falico 2010-08-02 10:47:47 UTC

The customer confirmed that with the 

if (vrtt == 0)
    vrtt = 1;

patch in tcp_vegas_rtt_calc() the problem does not occur.

Is there anything else I can do?

Thank you!

Comment 3 Veaceslav Falico 2010-08-09 15:17:05 UTC
Created attachment 437620 [details]
proposed patch


The upstream is quite heavily modified (and the logic of the caller of tcp_vegas_rtt_calc also), so that I've tried to take only the parts that affect us and our bug. In the upstream if we see that the time diff of rtt is <=0, we just return without any warnings, considering it to be just a bogus value.

Please review and say if testing is needed.

Thank you!

Comment 5 Neil Horman 2010-08-11 19:38:03 UTC
test build

Comment 6 RHEL Product and Program Management 2010-08-12 18:19:22 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update

Comment 12 Jarod Wilson 2010-08-31 01:17:49 UTC
in kernel-2.6.18-214.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 14 Eryu Guan 2010-11-09 06:32:18 UTC
Following steps in comment 9, got panic on -194 kernel 
Code: f7 75 10 8d 14 09 8b 8b d4 04 00 00 29 c2 3b 8b d0 04 00 00 
RIP  [<ffffffff8855419d>] :tcp_vegas:tcp_vegas_cong_avoid+0x82/0x14d
 RSP <ffff810001757b00>
<0>Kernel panic - not syncing: Fatal exception

Code: 66 83 7f 02 02 77 18 55 89 d8 51 8b 4c 24 08 8b 54 24 0c e8 6b 6f b9 c7 59 5b e9 c1 00 00 00 8d 0c 36 31 d2 0f af 77 08 8d 04 36 <f7> 77 04 29 c1 89 4f 10 8b 93 6c 03 00 00 3b 93 68 03 00 00 77
EIP: [<f8a5e148>] tcp_veno_cong_avoid+0xac/0x16f [tcp_veno] SS:ESP 0068:c074adcc
 <0>Kernel panic - not syncing: Fatal exception in interrupt

Confirmed there was no panic on -230 kernel.

Comment 16 errata-xmlrpc 2011-01-13 21:39:46 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.