Hide Forgot
Customer can see the following backtrace on RHEL 6.0 and RHEL 6.4 systems: TCP: Peer 10.7.10.245:36955/80 unexpectedly shrunk window 74655346:74658266 (repaired) TCP: Peer 10.7.10.244:31715/80 unexpectedly shrunk window 2346392415:2346395199 (repaired) TCP: Peer 10.7.10.243:14787/80 unexpectedly shrunk window 4177302606:4177304338 (repaired) TCP: Peer 10.7.10.243:14787/80 unexpectedly shrunk window 4177302606:4177304338 (repaired) TCP: Peer 10.7.10.244:45969/80 unexpectedly shrunk window 1732439211:1732442099 (repaired) TCP: Peer 10.7.10.244:45969/80 unexpectedly shrunk window 1732439211:1732442099 (repaired) divide error: 0000 [#1] SMP last sysfs file: /sys/module/lockd/initstate CPU 0 Modules linked in: nfs fscache tcp_diag inet_diag nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc cpufreq_ondemand freq_table pcc_cpufreq ipv6 ext3 jbd power_meter be2net serio_raw iTCO_wdt iTCO_vendor_support hpilo hpwdt sg i7core_edac edac_core shpchp ext4 mbcache jbd2 sd_mod crc_t10dif hpsa dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Pid: 0, comm: swapper Not tainted 2.6.32-358.el6.x86_64 #1 HP ProLiant BL460c G7 RIP: 0010:[<ffffffff814c5239>] [<ffffffff814c5239>] bictcp_cong_avoid+0x1d9/0x390 RSP: 0018:ffff88041a4038f0 EFLAGS: 00010246 RAX: 0000000000000020 RBX: ffff88058d141700 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000025 RDI: 0000000000000000 RBP: ffff88041a403910 R08: 00000000000000d7 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff88058d141b10 R13: 0000000000000014 R14: 0000000000001010 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88041a400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00007fab9592a0b0 CR3: 0000000404dc7000 CR4: 00000000000007f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffffffff81a00000, task ffffffff81a8d020) Stack: ffff88058d141700 0000000000000506 0000000000000031 0000000000000006 <d> ffff88041a4039e0 ffffffff81497263 ffff88041a403930 ffff880700000013 <d> ffff880400000004 0000000000000286 0000000000000000 0000001300000000 Call Trace: <IRQ> [<ffffffff81497263>] tcp_ack+0xc53/0x1280 [<ffffffff8122aff3>] ? selinux_socket_sock_rcv_skb+0x83/0x220 [<ffffffff81497c98>] tcp_rcv_established+0x408/0x800 [<ffffffff8149fc13>] tcp_v4_do_rcv+0x2e3/0x430 [<ffffffff814a149e>] tcp_v4_rcv+0x4fe/0x8d0 [<ffffffff8147f28d>] ip_local_deliver_finish+0xdd/0x2d0 [<ffffffff8147f518>] ip_local_deliver+0x98/0xa0 [<ffffffff8147e9dd>] ip_rcv_finish+0x12d/0x440 [<ffffffff8147ef65>] ip_rcv+0x275/0x350 [<ffffffff8144819b>] __netif_receive_skb+0x4ab/0x750 [<ffffffff8144a578>] netif_receive_skb+0x58/0x60 [<ffffffff8144a5cf>] napi_frags_finish+0x4f/0xb0 [<ffffffff8144d923>] napi_gro_frags+0x43/0x60 [<ffffffffa0121068>] be_rx_compl_process_gro+0x288/0x2c0 [be2net] [<ffffffffa0121323>] be_poll+0x283/0x9c0 [be2net] [<ffffffff81055f76>] ? enqueue_task+0x66/0x80 [<ffffffff8106317e>] ? try_to_wake_up+0x24e/0x3e0 [<ffffffff8144cd43>] net_rx_action+0x103/0x2f0 [<ffffffff81076fb1>] __do_softirq+0xc1/0x1e0 [<ffffffff810e1640>] ? handle_IRQ_event+0x60/0x170 [<ffffffff8100c1cc>] call_softirq+0x1c/0x30 [<ffffffff8100de05>] do_softirq+0x65/0xa0 [<ffffffff81076d95>] irq_exit+0x85/0x90 [<ffffffff81516c95>] do_IRQ+0x75/0xf0 [<ffffffff8100b9d3>] ret_from_intr+0x0/0x11 <EOI> [<ffffffff812d37ae>] ? intel_idle+0xde/0x170 [<ffffffff812d3791>] ? intel_idle+0xc1/0x170 [<ffffffff8109dd1d>] ? sched_clock_cpu+0xcd/0x110 [<ffffffff81414ef7>] cpuidle_idle_call+0xa7/0x140 [<ffffffff81009fc6>] cpu_idle+0xb6/0x110 [<ffffffff814f2f2a>] rest_init+0x7a/0x80 [<ffffffff81c27f7b>] start_kernel+0x424/0x430 [<ffffffff81c2733a>] x86_64_start_reservations+0x125/0x129 [<ffffffff81c27438>] x86_64_start_kernel+0xfa/0x109 Code: 8b 93 10 04 00 00 39 d0 0f 83 5c 01 00 00 89 83 10 04 00 00 eb 0a 0f 1f 40 00 8b 83 10 04 00 00 41 0f b7 4c 24 2c c1 e0 04 31 d2 <f7> f1 85 c0 89 c6 89 83 10 04 00 00 75 0d c7 83 10 04 00 00 01 RIP [<ffffffff814c5239>] bictcp_cong_avoid+0x1d9/0x390 RSP <ffff88041a4038f0> As he sees it on RHEL 6.4 too, this should be a new version of https://bugzilla.redhat.com/show_bug.cgi?id=832203 or a regression, can't tell. We have a vmcore for this 6.4 system, and: /usr/src/debug/kernel-2.6.32-358.el6/linux-2.6.32-358.el6.x86_64/net/ipv4/tcp_cubic.c: 300 0xffffffff814c522e <bictcp_cong_avoid+462>: movzwl 0x2c(%r12),%ecx 0xffffffff814c5234 <bictcp_cong_avoid+468>: shl $0x4,%eax 0xffffffff814c5237 <bictcp_cong_avoid+471>: xor %edx,%edx 0xffffffff814c5239 <bictcp_cong_avoid+473>: div %ecx crash> struct -o bictcp struct bictcp { [0x0] u32 cnt; [0x4] u32 last_max_cwnd; [0x8] u32 loss_cwnd; [0xc] u32 last_cwnd; [0x10] u32 last_time; [0x14] u32 bic_origin_point; [0x18] u32 bic_K; [0x1c] u32 delay_min; [0x20] u32 epoch_start; [0x24] u32 ack_cnt; [0x28] u32 tcp_cwnd; [0x2c] u16 delayed_ack; so bictcp is at %r12 crash> bictcp ffff88058d141b10 struct bictcp { cnt = 0x2, <-- normal last_max_cwnd = 0x0, loss_cwnd = 0x3, last_cwnd = 0x14, last_time = 0x104a3eb, bic_origin_point = 0x3, bic_K = 0x0, delay_min = 0x1, epoch_start = 0x10493db, ack_cnt = 0x14, tcp_cwnd = 0x7, delayed_ack = 0x0, <-- not good sample_cnt = 0x0, found = 0x0, round_start = 0x10493b5, end_seq = 0xfafd8801, last_ack = 0x10493b5, curr_rtt = 0x0 } crash> jiffies jiffies = $1 = 0x10104a3f1 crash> kmem -S ffff88058d141b10 CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE ffff8808052e1440 TCP 1664 458 552 138 8k SLAB MEMORY TOTAL ALLOCATED FREE ffff88058d140340 ffff88058d140380 4 4 0 FREE / [ALLOCATED] [ffff88058d141700] Should be a valid struct.. This is a 32GB RAM system serving varnish. The stack above is the only non-idle during the panic. After the commit in BZ 832203, I can't see how ca->delayed_ack can reach 0. Please let me know if you need more info.
Hi Prashant, If that's easy to reproduce, please check with them if disabling GRO works around this issue. Thanks.
When looking into the upstream code changes in this area, I found some commits that looked interesting, only to find that these commits had already been backported in bug#920794 to RHEL6.5. These fixes are avail in kernel-2.6.32-375.el6. I request that the customer get this kernel version to try on his system --Jesper
(In reply to Jesper Brouer from comment #7) > When looking into the upstream code changes in this area, I found some > commits that looked interesting, only to find that these commits had already > been backported in bug#920794 to RHEL6.5. > > These fixes are avail in kernel-2.6.32-375.el6. > > I request that the customer get this kernel version to try on his system > --Jesper Jesper, the test has been already requested. Customer seems fine with testing but we have to ship that package to them (in progress). But uh, could you share the commit list you thought relevant in there? I'd love to know where you're targeting at :) Thanks, Marcelo
(In reply to Marcelo Ricardo Leitner from comment #8) > But uh, could you share the commit list you thought relevant in there? I'd > love to know where you're targeting at :) It was mostly a hunch to a fix of a div by zero due, caused by snd_cwnd being 0, which is fixed in: https://bugzilla.redhat.com/show_bug.cgi?id=920794#c8 Related patch: http://patchwork.lab.bos.redhat.com/patch/57483 Related patch: http://patchwork.lab.bos.redhat.com/patch/57490 But it does not correlate with your crash dump analysis... so I might be wrong...
(In reply to Marcelo Ricardo Leitner from comment #0) > > After the commit in BZ 832203, I can't see how ca->delayed_ack can reach 0. Yes, I was also under that assumption, but looking closer at the upstream commit b9f47a3aae (tcp_cubic: limit delayed_ack ratio to prevent divide error), which states it fixes div-by-zero, I think it contains an error... commit b9f47a3aaeabdce3b42829bbb27765fa340f76ba Author: stephen hemminger <shemminger> Date: Wed May 4 10:04:56 2011 +0000 tcp_cubic: limit delayed_ack ratio to prevent divide error TCP Cubic keeps a metric that estimates the amount of delayed acknowledgements to use in adjusting the window. If an abnormally large number of packets are acknowledged at once, then the update could wrap and reach zero. This kind of ACK could only happen when there was a large window and huge number of ACK's were lost. This patch limits the value of delayed ack ratio. The choice of 32 is just a conservative value since normally it should be range of 1 to 4 packets. Signed-off-by: Stephen Hemminger <shemminger> Signed-off-by: David S. Miller <davem> diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c index 34340c9..f376b05 100644 --- a/net/ipv4/tcp_cubic.c +++ b/net/ipv4/tcp_cubic.c @@ -93,6 +93,7 @@ struct bictcp { u32 ack_cnt; /* number of acks */ u32 tcp_cwnd; /* estimated tcp cwnd */ #define ACK_RATIO_SHIFT 4 +#define ACK_RATIO_LIMIT (32u << ACK_RATIO_SHIFT) u16 delayed_ack; /* estimate the ratio of Packets/ACKs << 4 */ u8 sample_cnt; /* number of samples to decide curr_rtt */ u8 found; /* the exit point is found? */ @@ -398,8 +399,12 @@ static void bictcp_acked(struct sock *sk, u32 cnt, s32 rtt_us) u32 delay; if (icsk->icsk_ca_state == TCP_CA_Open) { - cnt -= ca->delayed_ack >> ACK_RATIO_SHIFT; - ca->delayed_ack += cnt; + u32 ratio = ca->delayed_ack; + + ratio -= ca->delayed_ack >> ACK_RATIO_SHIFT; + ratio += cnt; + + ca->delayed_ack = min(ratio, ACK_RATIO_LIMIT); } It seems wrong to use min(), to avoid ca->delayed_ack reaching zero, shouldn't it be max(). Or perhaps I'm misunderstanding the code, as I'm a bit uncertain about what this code path achieves...
Thanks for sharing that, Jesper. That's exactly why I asked for it: that code is way not clear for me too. Thanks!
In Customer Case 00893760 (closed); it was verified/confirmed that kernel-2.6.32-386.el6 solved their crash. Closing ths BZ
(In reply to Marcelo Ricardo Leitner from comment #11) > Thanks for sharing that, Jesper. That's exactly why I asked for it: that > code is way not clear for me too. Thanks! this issue happened when apply this patch. that code is way not clear for me too that avoid the fixes div-by-zero
(In reply to todayyang from comment #25) > (In reply to Marcelo Ricardo Leitner from comment #11) > > Thanks for sharing that, Jesper. That's exactly why I asked for it: that > > code is way not clear for me too. Thanks! > > this issue happened when apply this patch. > > that code is way not clear for me too that avoid the fixes div-by-zero Sorry, I don't understand what you mean here. You are having this issue (div-by-zero) after applying this patch?