Bug 999638 - RHEL6.4: kernel 2.6.32-358.el6 panic, divide error with RIP bictcp_cong_avoid called from tcp_ack in TCP stack
RHEL6.4: kernel 2.6.32-358.el6 panic, divide error with RIP bictcp_cong_avoid...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.4
x86_64 Linux
high Severity high
: rc
: 6.6
Assigned To: Jesper Brouer
Red Hat Kernel QE team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-21 14:30 EDT by Marcelo Ricardo Leitner
Modified: 2014-04-22 09:18 EDT (History)
7 users (show)

See Also:
Fixed In Version: kernel-2.6.32-386.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-12-11 03:43:22 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Marcelo Ricardo Leitner 2013-08-21 14:30:22 EDT
Customer can see the following backtrace on RHEL 6.0 and RHEL 6.4 systems:

TCP: Peer 10.7.10.245:36955/80 unexpectedly shrunk window 74655346:74658266 (repaired)
TCP: Peer 10.7.10.244:31715/80 unexpectedly shrunk window 2346392415:2346395199 (repaired)
TCP: Peer 10.7.10.243:14787/80 unexpectedly shrunk window 4177302606:4177304338 (repaired)
TCP: Peer 10.7.10.243:14787/80 unexpectedly shrunk window 4177302606:4177304338 (repaired)
TCP: Peer 10.7.10.244:45969/80 unexpectedly shrunk window 1732439211:1732442099 (repaired)
TCP: Peer 10.7.10.244:45969/80 unexpectedly shrunk window 1732439211:1732442099 (repaired)
divide error: 0000 [#1] SMP 
last sysfs file: /sys/module/lockd/initstate
CPU 0 
Modules linked in: nfs fscache tcp_diag inet_diag nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc cpufreq_ondemand freq_table pcc_cpufreq ipv6 ext3 jbd power_meter be2net serio_raw iTCO_wdt iTCO_vendor_support hpilo hpwdt sg i7core_edac edac_core shpchp ext4 mbcache jbd2 sd_mod crc_t10dif hpsa dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]

Pid: 0, comm: swapper Not tainted 2.6.32-358.el6.x86_64 #1 HP ProLiant BL460c G7
RIP: 0010:[<ffffffff814c5239>]  [<ffffffff814c5239>] bictcp_cong_avoid+0x1d9/0x390
RSP: 0018:ffff88041a4038f0  EFLAGS: 00010246
RAX: 0000000000000020 RBX: ffff88058d141700 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000025 RDI: 0000000000000000
RBP: ffff88041a403910 R08: 00000000000000d7 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88058d141b10
R13: 0000000000000014 R14: 0000000000001010 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88041a400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007fab9592a0b0 CR3: 0000000404dc7000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffffffff81a00000, task ffffffff81a8d020)
Stack:
 ffff88058d141700 0000000000000506 0000000000000031 0000000000000006
<d> ffff88041a4039e0 ffffffff81497263 ffff88041a403930 ffff880700000013
<d> ffff880400000004 0000000000000286 0000000000000000 0000001300000000
Call Trace:
 <IRQ> 
 [<ffffffff81497263>] tcp_ack+0xc53/0x1280
 [<ffffffff8122aff3>] ? selinux_socket_sock_rcv_skb+0x83/0x220
 [<ffffffff81497c98>] tcp_rcv_established+0x408/0x800
 [<ffffffff8149fc13>] tcp_v4_do_rcv+0x2e3/0x430
 [<ffffffff814a149e>] tcp_v4_rcv+0x4fe/0x8d0
 [<ffffffff8147f28d>] ip_local_deliver_finish+0xdd/0x2d0
 [<ffffffff8147f518>] ip_local_deliver+0x98/0xa0
 [<ffffffff8147e9dd>] ip_rcv_finish+0x12d/0x440
 [<ffffffff8147ef65>] ip_rcv+0x275/0x350
 [<ffffffff8144819b>] __netif_receive_skb+0x4ab/0x750
 [<ffffffff8144a578>] netif_receive_skb+0x58/0x60
 [<ffffffff8144a5cf>] napi_frags_finish+0x4f/0xb0
 [<ffffffff8144d923>] napi_gro_frags+0x43/0x60
 [<ffffffffa0121068>] be_rx_compl_process_gro+0x288/0x2c0 [be2net]
 [<ffffffffa0121323>] be_poll+0x283/0x9c0 [be2net]
 [<ffffffff81055f76>] ? enqueue_task+0x66/0x80
 [<ffffffff8106317e>] ? try_to_wake_up+0x24e/0x3e0
 [<ffffffff8144cd43>] net_rx_action+0x103/0x2f0
 [<ffffffff81076fb1>] __do_softirq+0xc1/0x1e0
 [<ffffffff810e1640>] ? handle_IRQ_event+0x60/0x170
 [<ffffffff8100c1cc>] call_softirq+0x1c/0x30
 [<ffffffff8100de05>] do_softirq+0x65/0xa0
 [<ffffffff81076d95>] irq_exit+0x85/0x90
 [<ffffffff81516c95>] do_IRQ+0x75/0xf0
 [<ffffffff8100b9d3>] ret_from_intr+0x0/0x11
 <EOI> 
 [<ffffffff812d37ae>] ? intel_idle+0xde/0x170
 [<ffffffff812d3791>] ? intel_idle+0xc1/0x170
 [<ffffffff8109dd1d>] ? sched_clock_cpu+0xcd/0x110
 [<ffffffff81414ef7>] cpuidle_idle_call+0xa7/0x140
 [<ffffffff81009fc6>] cpu_idle+0xb6/0x110
 [<ffffffff814f2f2a>] rest_init+0x7a/0x80
 [<ffffffff81c27f7b>] start_kernel+0x424/0x430
 [<ffffffff81c2733a>] x86_64_start_reservations+0x125/0x129
 [<ffffffff81c27438>] x86_64_start_kernel+0xfa/0x109
Code: 8b 93 10 04 00 00 39 d0 0f 83 5c 01 00 00 89 83 10 04 00 00 eb 0a 0f 1f 40 00 8b 83 10 04 00 00 41 0f b7 4c 24 2c c1 e0 04 31 d2 <f7> f1 85 c0 89 c6 89 83 10 04 00 00 75 0d c7 83 10 04 00 00 01 
RIP  [<ffffffff814c5239>] bictcp_cong_avoid+0x1d9/0x390
 RSP <ffff88041a4038f0>

As he sees it on RHEL 6.4 too, this should be a new version of https://bugzilla.redhat.com/show_bug.cgi?id=832203 or a regression, can't tell.

We have a vmcore for this 6.4 system, and:

/usr/src/debug/kernel-2.6.32-358.el6/linux-2.6.32-358.el6.x86_64/net/ipv4/tcp_cubic.c: 300
0xffffffff814c522e <bictcp_cong_avoid+462>:     movzwl 0x2c(%r12),%ecx
0xffffffff814c5234 <bictcp_cong_avoid+468>:     shl    $0x4,%eax
0xffffffff814c5237 <bictcp_cong_avoid+471>:     xor    %edx,%edx
0xffffffff814c5239 <bictcp_cong_avoid+473>:     div    %ecx

crash> struct -o bictcp
struct bictcp {
   [0x0] u32 cnt;
   [0x4] u32 last_max_cwnd;
   [0x8] u32 loss_cwnd;
   [0xc] u32 last_cwnd;
  [0x10] u32 last_time;
  [0x14] u32 bic_origin_point;
  [0x18] u32 bic_K;
  [0x1c] u32 delay_min;
  [0x20] u32 epoch_start;
  [0x24] u32 ack_cnt;
  [0x28] u32 tcp_cwnd;
  [0x2c] u16 delayed_ack;

so bictcp is at %r12

crash> bictcp ffff88058d141b10
struct bictcp {
  cnt = 0x2,                 <-- normal
  last_max_cwnd = 0x0, 
  loss_cwnd = 0x3, 
  last_cwnd = 0x14, 
  last_time = 0x104a3eb, 
  bic_origin_point = 0x3, 
  bic_K = 0x0, 
  delay_min = 0x1, 
  epoch_start = 0x10493db, 
  ack_cnt = 0x14, 
  tcp_cwnd = 0x7, 
  delayed_ack = 0x0,         <-- not good
  sample_cnt = 0x0, 
  found = 0x0, 
  round_start = 0x10493b5, 
  end_seq = 0xfafd8801, 
  last_ack = 0x10493b5, 
  curr_rtt = 0x0
}

crash> jiffies
jiffies = $1 = 0x10104a3f1

crash> kmem -S ffff88058d141b10
CACHE            NAME                 OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE
ffff8808052e1440 TCP                     1664        458       552    138     8k
SLAB              MEMORY            TOTAL  ALLOCATED  FREE
ffff88058d140340  ffff88058d140380      4          4     0
FREE / [ALLOCATED]
  [ffff88058d141700]

Should be a valid struct..

This is a 32GB RAM system serving varnish.

The stack above is the only non-idle during the panic.

After the commit in BZ 832203, I can't see how ca->delayed_ack can reach 0.

Please let me know if you need more info.
Comment 3 Marcelo Ricardo Leitner 2013-08-23 09:36:45 EDT
Hi Prashant,

If that's easy to reproduce, please check with them if disabling GRO works around this issue.

Thanks.
Comment 7 Jesper Brouer 2013-09-09 05:19:58 EDT
When looking into the upstream code changes in this area, I found some commits that looked interesting, only to find that these commits had already been backported in bug#920794 to RHEL6.5.

These fixes are avail in kernel-2.6.32-375.el6.

I request that the customer get this kernel version to try on his system
--Jesper
Comment 8 Marcelo Ricardo Leitner 2013-09-09 09:13:16 EDT
(In reply to Jesper Brouer from comment #7)
> When looking into the upstream code changes in this area, I found some
> commits that looked interesting, only to find that these commits had already
> been backported in bug#920794 to RHEL6.5.
> 
> These fixes are avail in kernel-2.6.32-375.el6.
> 
> I request that the customer get this kernel version to try on his system
> --Jesper

Jesper, the test has been already requested. Customer seems fine with testing but we have to ship that package to them (in progress).

But uh, could you share the commit list you thought relevant in there? I'd love to know where you're targeting at :)

Thanks,
Marcelo
Comment 9 Jesper Brouer 2013-09-09 10:21:21 EDT
(In reply to Marcelo Ricardo Leitner from comment #8)

> But uh, could you share the commit list you thought relevant in there? I'd
> love to know where you're targeting at :)

It was mostly a hunch to a fix of a div by zero due, caused by snd_cwnd being 0, which is fixed in:

https://bugzilla.redhat.com/show_bug.cgi?id=920794#c8
 Related patch: http://patchwork.lab.bos.redhat.com/patch/57483
 Related patch: http://patchwork.lab.bos.redhat.com/patch/57490

But it does not correlate with your crash dump analysis... so I might be wrong...
Comment 10 Jesper Brouer 2013-09-09 10:34:05 EDT
(In reply to Marcelo Ricardo Leitner from comment #0)
> 
> After the commit in BZ 832203, I can't see how ca->delayed_ack can reach 0.

Yes, I was also under that assumption, but looking closer at the upstream commit b9f47a3aae (tcp_cubic: limit delayed_ack ratio to prevent divide error), which states it fixes div-by-zero, I think it contains an error...


commit b9f47a3aaeabdce3b42829bbb27765fa340f76ba
Author: stephen hemminger <shemminger@vyatta.com>
Date:   Wed May 4 10:04:56 2011 +0000

    tcp_cubic: limit delayed_ack ratio to prevent divide error
    
    TCP Cubic keeps a metric that estimates the amount of delayed
    acknowledgements to use in adjusting the window. If an abnormally
    large number of packets are acknowledged at once, then the update
    could wrap and reach zero. This kind of ACK could only
    happen when there was a large window and huge number of
    ACK's were lost.
    
    This patch limits the value of delayed ack ratio. The choice of 32
    is just a conservative value since normally it should be range of
    1 to 4 packets.
    
    Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c
index 34340c9..f376b05 100644
--- a/net/ipv4/tcp_cubic.c
+++ b/net/ipv4/tcp_cubic.c
@@ -93,6 +93,7 @@ struct bictcp {
        u32     ack_cnt;        /* number of acks */
        u32     tcp_cwnd;       /* estimated tcp cwnd */
 #define ACK_RATIO_SHIFT        4
+#define ACK_RATIO_LIMIT (32u << ACK_RATIO_SHIFT)
        u16     delayed_ack;    /* estimate the ratio of Packets/ACKs << 4 */
        u8      sample_cnt;     /* number of samples to decide curr_rtt */
        u8      found;          /* the exit point is found? */
@@ -398,8 +399,12 @@ static void bictcp_acked(struct sock *sk, u32 cnt, s32 rtt_us)
        u32 delay;
 
        if (icsk->icsk_ca_state == TCP_CA_Open) {
-               cnt -= ca->delayed_ack >> ACK_RATIO_SHIFT;
-               ca->delayed_ack += cnt;
+               u32 ratio = ca->delayed_ack;
+
+               ratio -= ca->delayed_ack >> ACK_RATIO_SHIFT;
+               ratio += cnt;
+
+               ca->delayed_ack = min(ratio, ACK_RATIO_LIMIT);
        }
 
It seems wrong to use min(), to avoid ca->delayed_ack reaching zero, shouldn't it be max().  Or perhaps I'm misunderstanding the code, as I'm a bit uncertain about what this code path achieves...
Comment 11 Marcelo Ricardo Leitner 2013-09-09 10:42:43 EDT
Thanks for sharing that, Jesper. That's exactly why I asked for it: that code is way not clear for me too. Thanks!
Comment 22 Jesper Brouer 2013-12-11 03:43:22 EST
In Customer Case 00893760 (closed); it was verified/confirmed that kernel-2.6.32-386.el6 solved their crash.

Closing ths BZ
Comment 25 todayyang 2014-04-21 02:30:18 EDT
(In reply to Marcelo Ricardo Leitner from comment #11)
> Thanks for sharing that, Jesper. That's exactly why I asked for it: that
> code is way not clear for me too. Thanks!

this issue happened when apply this patch. 

that code is way not clear for me too that avoid the fixes div-by-zero
Comment 26 Marcelo Ricardo Leitner 2014-04-22 09:18:48 EDT
(In reply to todayyang from comment #25)
> (In reply to Marcelo Ricardo Leitner from comment #11)
> > Thanks for sharing that, Jesper. That's exactly why I asked for it: that
> > code is way not clear for me too. Thanks!
> 
> this issue happened when apply this patch. 
> 
> that code is way not clear for me too that avoid the fixes div-by-zero

Sorry, I don't understand what you mean here. You are having this issue (div-by-zero) after applying this patch?

Note You need to log in before you can comment on or make changes to this bug.