When trying to avoid the zero-case in tcp_vegas_rtt_calc (upstream tcp_vegas_pkts_acked ) of an unsigned RTT sample, we just add 1 to the unsigned value, which can lead to 0-case when we have the MAX_U32 RTT sample, caused by a really rare situation or a bug in other code. In the upstream it's fixed with a) signed value b) if (RTT_SAMPLE < 0) return;, so it doesn't occur. The codepath is: 1) When using the tcp_vegas congestion control, we have the rtt_sample set to tcp_vegas_rtt_calc function, and it's called in tcp_clean_rtx_queue with the socket and RTT difference between now and sockets. 2) In the tcp_vegas_rtt_calc we have: 123 static void tcp_vegas_rtt_calc(struct sock *sk, u32 usrtt) 124 { 125 struct vegas *vegas = inet_csk_ca(sk); 126 u32 vrtt = usrtt + 1; /* Never allow zero rtt or baseRTT */ 127 128 /* Filter to find propagation delay: */ 129 if (vrtt < vegas->baseRTT) 130 vegas->baseRTT = vrtt; 131 132 /* Find the min RTT during the last RTT to find 133 * the current prop. delay + queuing delay: 134 */ 135 vegas->minRTT = min(vegas->minRTT, vrtt); So if we receive the usrtt == MAX_U32, then we have minRTT == 0. 3) When the cong_avoid (tcp_vegas_cong_avoid) is called, we have: 245 rtt = vegas->minRTT; <snip comments> 255 target_cwnd = ((old_wnd * vegas->baseRTT) 256 << V_PARAM_SHIFT) / rtt; So that we get a division by zero.
Hello, The customer confirmed that with the if (vrtt == 0) vrtt = 1; patch in tcp_vegas_rtt_calc() the problem does not occur. Is there anything else I can do? Thank you!
Created attachment 437620 [details] proposed patch Hello, The upstream is quite heavily modified (and the logic of the caller of tcp_vegas_rtt_calc also), so that I've tried to take only the parts that affect us and our bug. In the upstream if we see that the time diff of rtt is <=0, we just return without any warnings, considering it to be just a bogus value. Please review and say if testing is needed. Thank you!
http://brewweb.devel.redhat.com/brew/taskinfo?taskID=2676539 test build
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
in kernel-2.6.18-214.el5 You can download this test kernel from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
Following steps in comment 9, got panic on -194 kernel Code: f7 75 10 8d 14 09 8b 8b d4 04 00 00 29 c2 3b 8b d0 04 00 00 RIP [<ffffffff8855419d>] :tcp_vegas:tcp_vegas_cong_avoid+0x82/0x14d RSP <ffff810001757b00> hit <0>Kernel panic - not syncing: Fatal exception Code: 66 83 7f 02 02 77 18 55 89 d8 51 8b 4c 24 08 8b 54 24 0c e8 6b 6f b9 c7 59 5b e9 c1 00 00 00 8d 0c 36 31 d2 0f af 77 08 8d 04 36 <f7> 77 04 29 c1 89 4f 10 8b 93 6c 03 00 00 3b 93 68 03 00 00 77 EIP: [<f8a5e148>] tcp_veno_cong_avoid+0xac/0x16f [tcp_veno] SS:ESP 0068:c074adcc <0>Kernel panic - not syncing: Fatal exception in interrupt Confirmed there was no panic on -230 kernel.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html