Description of problem: Netdump sends incomplete dumps with newer versions of RHEL4. Version-Release number of selected component (if applicable): It were discovered to be broken with 2.6.9-78.22.EL and 2.6.9-78.30.EL. It works fine with 2.6.9.68-23. The machine has this Ethernet adapter: 08:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5754 Gigabit Eth) Subsystem: Dell: Unknown device 021e Flags: bus master, fast devsel, latency 0, IRQ 66 Memory at f7ef0000 (64-bit, non-prefetchable) [size=64K] Expansion ROM at <ignored> [disabled] Those are the results with 2.6.9-78.30 just compiled with redhat/configs/kernel-2.6.9-i686-smp.config (with debug disabled): serial terminal output: SysRq : Crashing the kernel by request Unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: c0213168 *pde = 36f46001 Oops: 0002 [#1] SMP Modules linked in: md5(U) ipv6(U) parport_pc(U) lp(U) parport(U) netconsole(U) netdump(U) autofs4(U) i2c_dev(U) i2c_core(U) sunrpc(U) ipt_REJECT(U) ipt_state(U) ip_conntrack(U) iptable_filter(U) ip_tables(U) cpufreq_powersave(U) joydev(U) dm_mirror(U) dm_mod(U) button(U) battery(U) ac(U) uhci_hcd(U) ehci_hcd(U) snd_hda_intel(U) snd_pcm_oss(U) snd_mixer_oss(U) snd_pcm(U) snd_timer(U) snd_page_alloc(U) snd_hwdep(U) snd(U) soundcore(U) tg3(U) sr_mod(U) ext3(U) jbd(U) ata_piix(U) ahci(U) libata(U) sd_mod(U) scsi_mod(U) CPU: 4 EIP: 0060:[<c0213168>] Not tainted VLI EFLAGS: 00010016 (2.6.9-prep) EIP is at sysrq_handle_crash+0x0/0x8 eax: 00000063 ebx: c0348854 ecx: f3957000 edx: f7f21f88 esi: 00000063 edi: f3957000 ebp: f7f21f88 esp: c03ffdf8 ds: 007b es: 007b ss: 0068 Process swapper (pid: 0, threadinfo=c03ff000 task=c3850030) Stack: c0213300 c02fec45 c0300090 00000202 00000006 00000001 c3a10380 00000001 0000002e c020dee0 f7f21f88 c0346be0 f677d80c c3a10380 0000002e c026fa75 00000001 00000001 f66ee0ac 00000001 f6756000 f677d80c c026de64 00000001 Call Trace: [<c0213300>] __handle_sysrq+0x62/0xd9 [<c020dee0>] kbd_event+0x83/0xb0 [<c026fa75>] input_event+0x331/0x351 [<c026de64>] hidinput_hid_event+0x1d0/0x208 [<c026a431>] hid_process_event+0x28/0x52 [<c026a707>] hid_input_field+0x2ac/0x2f9 [<c026a7cd>] hid_input_report+0x79/0x98 [<c026a882>] hid_irq_in+0x96/0xf2 [<c02618ce>] usb_hcd_giveback_urb+0x14/0x3e [<f893f01e>] ehci_urb_done+0x70/0x7b [ehci_hcd] [<f893f252>] qh_completions+0x229/0x2ab [ehci_hcd] [<f8941ab9>] scan_periodic+0xff/0x246 [ehci_hcd] [<f8942423>] ehci_work+0x3b/0x8f [ehci_hcd] [<f89425b2>] ehci_irq+0x13b/0x152 [ehci_hcd] [<c026191e>] usb_hcd_irq+0x26/0x4b [<c01074f2>] handle_IRQ_event+0x25/0x4f [<c01074cd>] handle_IRQ_event+0x0/0x4f [<c0107a8d>] do_IRQ+0x157/0x1f2 ======================= [<c02dddfc>] common_interrupt+0x18/0x20 [<c01040e8>] mwait_idle+0x33/0x42 [<c01040a0>] cpu_idle+0x26/0x3b Code: 11 c0 c7 05 10 e9 45 c0 00 00 00 00 c7 05 38 e9 45 c0 00 00 00 00 c7 05 2c e9 45 c0 6e ad 87 4b 89 15 28 e9 45 c0 e9 a4 f0 f1 ff <c6> 05 00 00 00 00 00 c3 e9 12 b6 f4 ff e9 fb 00 f5 ff 85 d2 89 CPU#0 is frozen. CPU#1 is frozen. CPU#2 is frozen. CPU#3 is frozen. CPU#4 is executing netdump. CPU#5 is frozen. CPU#6 is frozen. CPU#7 is frozen. < netdump activated - performing handshake with the server. > NETDUMP START! < handshake completed - listening for dump requests. > 0(4294806222)- The netdump-server machine produces the following logs: Jan 22 08:15:22 caramujo netdump[2730]: Got too many timeouts waiting for memory page for client 192.168.2.170, ignoring it Jan 22 08:15:25 caramujo netdump[2730]: Got too many timeouts waiting for SHOW_STATUS for client 192.168.2.170, rebooting it And the vmware file at the server is incomplete: $ du -h 192.168.2.170-2009-01-22-08:13/* 4,0K 192.168.2.170-2009-01-22-08:13/log 224K 192.168.2.170-2009-01-22-08:13/vmcore-incomplete If enabling debug at netdump.c, it produces a similar report. However, the size of vmware-incomplete has always 4096 bytes. Steps to Reproduce: the tests were done by enabling SysReq keys, and pressing SysReq-C
Created attachment 329882 [details] patch to trace tg3 events Ok, so I've gone over the logs again, and since we manage to preform the handshake successfully, I can assert that we are able to send frames for a time during the netdump mode of operation (I'm making a technical distinction here, in that during the handshake the netdump_mode variable in the kernel is set to one). So there is a period of time during which the the netpol path through the tg3 poll controller and napi poll methods works. Thats good news. I need to look back at the older code and see how this changed, but my first guess as to what might be going on here is that during the napi poll, if the tg3 driver detects certain problem with the hardware, it will attempt to reset the chip, and will not receive any frames until that operation is complete. unfortunately the reset operation is scheduled via keventd, which during a dump operation, will never run. So I wonder if the system in question here is encountering this hardware error, and the result is that it spins waiting for a recovery that will never happen. I've prepared this debug patch for you (untested, sorry, but it should build). It will start printing tg3 information to the serial console about its progress through tg3_interrupt and tg3_poll when netdump starts. Please build a kernel with it, and send in the resulting output. It should help us confirm my theory above. Thanks!
Created attachment 329890 [details] Captured logs for tg3 + netdump modules The patch didn't compiled fine, due to two silly errors on line 4000 and 4114. I fixed it and generated the logs. I'm enclosing the full captured log. Basically, those are the last messages of the log, before the trouble: <2>ENTERING TG3_INTERRUPT <2>CHECKING IF TG3 HAS WORK <2>TG3 HAS WORK :)! <2>TG3_INTERRUPT DONE <2>ENTERING TG3_POLL <2>CALLING TG3_TX <2>TG3 CHECKNIG RX QUEUE <2>TG3 RX WORK DONE = 0 <2>TG3 DONE WITH NAPI POLL <6>got new req, command 1. <6>got MEM command. <6> ... send_netdump_mem <6> ... send_netdump_mem: sending message <6> ... send_netdump_mem: sent message <6> ... send_netdump_mem: sending message <6> ... send_netdump_mem: sent message <6> ... send_netdump_mem: sending message <6> ... send_netdump_mem: sent message <6> ... send_netdump_mem: sending message <6> ... send_netdump_mem: sent message <6> ... send_netdump_mem: returning <6>main netdump loop: polling controller ... <2>ENTERING TG3_INTERRUPT <2>CHECKING IF TG3 HAS WORK <2>TG3 HAS WORK :)! <2>TG3_INTERRUPT DONE <2>ENTERING TG3_POLL <2>CALLING TG3_TX <2>TG3 CHECKNIG RX QUEUE <2>TG3 RX WORK DONE = 0 <2>TG3 DONE WITH NAPI POLL <6>got new req, command 1. <6>got MEM command. <6> ... send_netdump_mem <6> ... send_netdump_mem: sending message <6> ... send_netdump_mem: sent message <6> ... send_netdump_mem: sending message <6> ... send_netdump_mem: sent message <6> ... send_netdump_mem: sending message <6> ... send_netdump_mem: sent message <6> ... send_netdump_mem: sending message <6> ... send_netdump_mem: sent message <6> ... send_netdump_mem: returning <6>main netdump loop: polling controller ... <2>ENTERING TG3_INTERRUPT <2>SD_STATUS IS UPDATED! LEAVING <2>TG3_INTERRUPT DONE <2>ENTERING TG3_POLL <2>TG3 CHECKNIG RX QUEUE <2>TG3 RX WORK DONE = 0 <2>TG3 DONE WITH NAPI POLL (and the last set of messages keeps repeting)
Interesting, so its not that we're getting out of order reads here, its that TG3 is legitimately telling us that we didn't do any work and are exiting. I'll dig into this a bit deeper. Thanks!
Created attachment 329981 [details] patch to ensure device quota is full in netpoll path Actually on inspection of the full log, it looks like we are able to receive 64 packets before we stop getting anything. Thats interesting, given that 64 is the fixed per-poll budget that netpoll uses in its poll path. On closer inspection, I think I see the problem. It appears that tg3 is being too kind for its own good here. In the tg3_poll function (the poll method for the driver), it checks the budget value that was passed in against its stored device quota, and limits it to the latter value (ostensibly to ensure that it doesn't get too big a timeslice on the processor). Unfortunately, the netpoll path assumes that drivers will just do up to the amount of work that gets passed in in the budget value without question. The way tg3 is written, it will decrement dev->quota until it reaches zero, then limit its work to zero on each iteration. This works in the nominal case because net_rx_action always refills the dev->quota value, but netpoll never does, so if we don't ever run softirqs (which is the case in 4.8 netdump mode), we have a problem. This patch should fix the issue. Please test it out and confirm. Thanks! gospo, I'd thought about removing the check from tg3_poll, so that it just does <budget> amount of work, but this felt like the broader more correct fix. I'd appreciate your opinion on this. Thanks!
nhorman, I wondered the same thing, but saw that the quota should be reset in __netif_rx_schedule, so if this patch works it seems apparent that we need to understand why netif_rx_schedule is being called not being called correctly to reset the quota. Maybe this is a side effect of the sortirq handler now being disabled and just the right code in the tg3 driver?
good point, I had missed that, although that makes me wonder if perhaps we're not calling netif_rx_complete, making this an infinite loop of sorts (we never remove ourselve from the napi poll list, and so we never clear LINK_STATE_RX_SCHED, so we never re-schedule, so we never reset dev->quota. The logs seem to bear that out. If thats the case this patch should still fix the problem. Once we get results to confirm, we can figure the best approach to solve this.
The patch is on the right direction, but it seems that there are still some issues. From time to time, the Ethernet traffic is holded for a large amount of time, then it returns. I've kept the system up for about 3 hours and I've got only 47864 Kb of the vmcore-incomplete file. The logs seem normal: <2>ENTERING TG3_POLL <2>CALLING TG3_TX <2>TG3 CHECKNIG RX QUEUE <2>TG3 RX WORK DONE = 1 <2>TG3 COMPLETING NAPI POLL <2>TG3 DONE WITH NAPI POLL <2>ENTERING TG3_INTERRUPT <2>CHECKING IF TG3 HAS WORK <2>TG3 HAS WORK :)! <2>TG3_INTERRUPT DONE <2>ENTERING TG3_POLL <2>CALLING TG3_TX <2>TG3 CHECKNIG RX QUEUE <2>TG3 COMPLETING NAPI POLL <2>TG3 DONE WITH NAPI POLL <2>ENTERING TG3_INTERRUPT <2>CHECKING IF TG3 HAS WORK <2>TG3 HAS WORK :)! <2>TG3_INTERRUPT DONE <2>ENTERING TG3_POLL <2>CALLING TG3_TX The first Ethernet hangup happens when vmcore-incomplete reaches about 4000 KB of size: Time Size File 14:44:54 12K vmcore-incomplete 14:44:55 40K vmcore-incomplete 14:44:56 64K vmcore-incomplete 14:44:57 80K vmcore-incomplete 14:44:58 120K vmcore-incomplete 14:44:59 160K vmcore-incomplete 14:45:00 200K vmcore-incomplete 14:45:01 240K vmcore-incomplete 14:45:02 260K vmcore-incomplete 14:45:03 280K vmcore-incomplete 14:45:04 320K vmcore-incomplete 14:45:05 360K vmcore-incomplete 14:45:06 400K vmcore-incomplete 14:45:07 440K vmcore-incomplete 14:45:08 480K vmcore-incomplete 14:45:09 512K vmcore-incomplete 14:45:10 520K vmcore-incomplete 14:45:11 560K vmcore-incomplete 14:45:12 576K vmcore-incomplete 14:45:13 580K vmcore-incomplete 14:45:14 616K vmcore-incomplete 14:45:15 628K vmcore-incomplete 14:45:16 656K vmcore-incomplete 14:45:17 696K vmcore-incomplete 14:45:18 736K vmcore-incomplete 14:45:19 776K vmcore-incomplete 14:45:20 808K vmcore-incomplete 14:45:21 836K vmcore-incomplete 14:45:22 856K vmcore-incomplete 14:45:23 896K vmcore-incomplete 14:45:24 936K vmcore-incomplete 14:45:25 976K vmcore-incomplete 14:45:26 1016K vmcore-incomplete 14:45:27 1052K vmcore-incomplete 14:45:28 1080K vmcore-incomplete 14:45:29 1100K vmcore-incomplete 14:45:30 1136K vmcore-incomplete 14:45:31 1176K vmcore-incomplete 14:45:32 1216K vmcore-incomplete 14:45:33 1256K vmcore-incomplete 14:45:34 1296K vmcore-incomplete 14:45:35 1316K vmcore-incomplete 14:45:36 1336K vmcore-incomplete 14:45:37 1376K vmcore-incomplete 14:45:38 1416K vmcore-incomplete 14:45:39 1456K vmcore-incomplete 14:45:40 1496K vmcore-incomplete 14:45:41 1516K vmcore-incomplete 14:45:42 1536K vmcore-incomplete 14:45:43 1576K vmcore-incomplete 14:45:44 1616K vmcore-incomplete 14:45:45 1656K vmcore-incomplete 14:45:46 1696K vmcore-incomplete 14:45:47 1724K vmcore-incomplete 14:45:48 1736K vmcore-incomplete 14:45:49 1776K vmcore-incomplete 14:45:50 1816K vmcore-incomplete 14:45:51 1856K vmcore-incomplete 14:45:52 1896K vmcore-incomplete 14:45:53 1924K vmcore-incomplete 14:45:54 1948K vmcore-incomplete 14:45:56 1976K vmcore-incomplete 14:45:57 2016K vmcore-incomplete 14:45:58 2056K vmcore-incomplete 14:45:59 2096K vmcore-incomplete 14:46:00 2128K vmcore-incomplete 14:46:01 2136K vmcore-incomplete 14:46:02 2176K vmcore-incomplete 14:46:03 2216K vmcore-incomplete 14:46:04 2256K vmcore-incomplete 14:46:05 2296K vmcore-incomplete 14:46:06 2320K vmcore-incomplete 14:46:07 2336K vmcore-incomplete 14:46:08 2376K vmcore-incomplete 14:46:09 2416K vmcore-incomplete 14:46:10 2456K vmcore-incomplete 14:46:11 2496K vmcore-incomplete 14:46:12 2508K vmcore-incomplete 14:46:13 2536K vmcore-incomplete 14:46:14 2576K vmcore-incomplete 14:46:15 2616K vmcore-incomplete 14:46:16 2656K vmcore-incomplete 14:46:17 2692K vmcore-incomplete 14:46:18 2700K vmcore-incomplete 14:46:19 2736K vmcore-incomplete 14:46:20 2776K vmcore-incomplete 14:46:21 2812K vmcore-incomplete 14:46:22 2852K vmcore-incomplete 14:46:23 2892K vmcore-incomplete 14:46:24 2932K vmcore-incomplete 14:46:25 2944K vmcore-incomplete 14:46:26 2972K vmcore-incomplete 14:46:27 3012K vmcore-incomplete 14:46:28 3052K vmcore-incomplete 14:46:29 3092K vmcore-incomplete 14:46:30 3132K vmcore-incomplete 14:46:31 3172K vmcore-incomplete 14:46:32 3200K vmcore-incomplete 14:46:33 3228K vmcore-incomplete 14:46:34 3252K vmcore-incomplete 14:46:35 3292K vmcore-incomplete 14:46:36 3332K vmcore-incomplete 14:46:37 3372K vmcore-incomplete 14:46:38 3404K vmcore-incomplete 14:46:39 3412K vmcore-incomplete 14:46:40 3444K vmcore-incomplete 14:46:41 3476K vmcore-incomplete 14:46:42 3516K vmcore-incomplete 14:46:46 3544K vmcore-incomplete 14:46:47 3548K vmcore-incomplete 14:46:49 3568K vmcore-incomplete 14:46:50 3584K vmcore-incomplete 14:46:51 3608K vmcore-incomplete 14:46:52 3624K vmcore-incomplete 14:46:54 3632K vmcore-incomplete 14:46:55 3672K vmcore-incomplete 14:46:56 3696K vmcore-incomplete 14:46:57 3712K vmcore-incomplete 14:46:58 3728K vmcore-incomplete 14:46:59 3732K vmcore-incomplete 14:47:00 3772K vmcore-incomplete 14:47:01 3812K vmcore-incomplete 14:47:03 3832K vmcore-incomplete 14:47:04 3872K vmcore-incomplete 14:47:06 3912K vmcore-incomplete 14:47:07 3948K vmcore-incomplete 14:47:08 3984K vmcore-incomplete 14:47:10 4004K vmcore-incomplete Then a long wait: almost 10 minutes 14:52:58 4044K vmcore-incomplete 14:52:59 4088K vmcore-incomplete 14:53:00 4116K vmcore-incomplete 14:53:01 4128K vmcore-incomplete 14:53:02 4168K vmcore-incomplete 14:53:03 4208K vmcore-incomplete 14:53:04 4248K vmcore-incomplete 14:53:05 4288K vmcore-incomplete 14:53:06 4328K vmcore-incomplete 14:53:08 4340K vmcore-incomplete 14:53:09 4368K vmcore-incomplete 14:53:10 4408K vmcore-incomplete 14:53:11 4448K vmcore-incomplete 14:53:12 4488K vmcore-incomplete 14:53:13 4528K vmcore-incomplete 14:53:14 4540K vmcore-incomplete 14:53:15 4568K vmcore-incomplete 14:53:16 4608K vmcore-incomplete 14:53:17 4648K vmcore-incomplete 14:53:18 4688K vmcore-incomplete 14:53:19 4728K vmcore-incomplete 14:53:20 4764K vmcore-incomplete 14:53:21 4768K vmcore-incomplete 14:53:22 4808K vmcore-incomplete 14:53:23 4848K vmcore-incomplete 14:53:24 4888K vmcore-incomplete 14:53:25 4928K vmcore-incomplete 14:53:26 4944K vmcore-incomplete 14:53:27 4968K vmcore-incomplete 14:53:28 5008K vmcore-incomplete 14:53:29 5048K vmcore-incomplete 14:53:30 5088K vmcore-incomplete 14:53:31 5124K vmcore-incomplete 14:53:32 5128K vmcore-incomplete 14:53:33 5168K vmcore-incomplete 14:53:34 5208K vmcore-incomplete 14:53:35 5248K vmcore-incomplete 14:53:36 5284K vmcore-incomplete 14:53:37 5304K vmcore-incomplete 14:53:38 5328K vmcore-incomplete 14:53:39 5368K vmcore-incomplete 14:53:40 5408K vmcore-incomplete 14:53:41 5448K vmcore-incomplete 14:53:42 5464K vmcore-incomplete 14:53:43 5488K vmcore-incomplete 14:53:44 5528K vmcore-incomplete 14:53:45 5568K vmcore-incomplete 14:53:46 5608K vmcore-incomplete 14:53:47 5632K vmcore-incomplete 14:53:48 5648K vmcore-incomplete 14:53:49 5688K vmcore-incomplete 14:53:50 5728K vmcore-incomplete 14:53:51 5768K vmcore-incomplete 14:53:52 5808K vmcore-incomplete 14:53:53 5840K vmcore-incomplete 14:53:54 5856K vmcore-incomplete 14:53:55 5888K vmcore-incomplete 14:53:56 5928K vmcore-incomplete 14:53:57 5968K vmcore-incomplete 14:53:58 6008K vmcore-incomplete 14:53:59 6032K vmcore-incomplete 14:54:00 6048K vmcore-incomplete 14:54:01 6088K vmcore-incomplete 14:54:02 6128K vmcore-incomplete 14:54:03 6168K vmcore-incomplete 14:54:04 6208K vmcore-incomplete 14:54:05 6244K vmcore-incomplete 14:54:06 6260K vmcore-incomplete 14:54:07 6288K vmcore-incomplete 14:54:08 6328K vmcore-incomplete 14:54:09 6368K vmcore-incomplete 14:54:10 6408K vmcore-incomplete 14:54:11 6448K vmcore-incomplete 14:54:12 6480K vmcore-incomplete 14:54:13 6496K vmcore-incomplete 14:54:14 6528K vmcore-incomplete 14:54:15 6568K vmcore-incomplete 14:54:17 6608K vmcore-incomplete 14:54:18 6648K vmcore-incomplete 14:54:19 6680K vmcore-incomplete I'll enclose the captured data up to now. I'll let it run until the end to see if we can get more info.
Created attachment 330005 [details] Serial capture log after applying the patch Ok, this is a partial log after having it retrieving the dump for a while. We have about 23.8 MB captured by now. I'll keep it running until the task completion, but it should take a looong time.
I don't suppose your taking a tcpdump on this, are you? a 10 minute pause is certainly over the threshold at which the netdump server would give up and try to reboot the box, so they must be exchanging some sort of message here. A tcpdump would tell us what that is. logs from the netdump server would be great too if you have them. Thanks!
Created attachment 330052 [details] Captured packets with the server/client netdump conversation Ok, this is seems to be the second point where the capture stops: 15:15:43 47856K vmcore-incomplete 16:02:49 (still with the same size) There's no other messages at the dmesg on the netdump-server. The communication didn't stop, as shown on a partial traffic captured with wireshark, during the time where vmcore weren't updated. The data were saved as libpcap format. The dump is compressed with bzip2, since, otherwise, it would exceed the maximum connection time, while sending the file to bugzilla.
during the course of this trace snippet, did one of those long pauses occur? I ask because I've been combing over it and the communication in it seems both regular and advancing in both directions. I don't see any repeated sequence numbers in the trace, and not error replies.
I only started the wireshark capture after arriving the 47856KB of vmcore-incomplete file. So, all those packets happened during the communication pause. Could this be due to some sort of errors at RHEL 5 netdump-server?
Its possible, if you have debug logs from the server it would be good to take a look. The definitive answer I think will be in the tcpdump however. Can you recreate the problem and get a capture that has segments during normal communication and during a pause? Thanks!
Created attachment 330271 [details] Progress of vmcore captured file Ok, I've re-generated the dump. First, this is the file size progress over time. I've generated it with this small shell script: O="";while :;do R="`du -s vmcore* 2>/dev/null|sed -r s/"\s+"/"K "/`"; if [ "$R" != "$O" ]; then O=$R; echo "`date +%T` $R"; fi; sleep 1; done |tee /tmp/vmcore.log
Created attachment 330278 [details] Captured patches for the entire conversation with netdump-server Ok, this is a complete dump of the interface since the beginning. You'll see that the first time I've pressed SysReq-C, the OOPS weren't generated. I had to press it twice.
I finally got around to building a kernel with the patch in comment #4 and netdump works without any issue. I say ship it!
Ok, so I see the problem here. After about 2062 seconds worth of transfer (about 65MB) the netdump client just stops responding to ARP frames. The trace doesn't go beyond that, so I assume that eventually it starts responding again? But loss of your l2 entry for a client would certainly cause what your seeing. It would be a good test to see if add a static arp entry for your client makes the pauses go away. I'd also like you to please apply this patch on top of the others you have. It reverts part of the patch for bz 461014. I _think_ its safe because more recently we've started disabling softirqs in netpoll_poll_dev, which makes this call redundant (except for the hard irq part). At any rate, Andy and I will really have to go over that to be sure, but I wanted you to try to apply it to see if it improved your performance, or made the pause problem go away. Thanks!
Created attachment 330283 [details] patch to selectively enable softirqs during netdump This is the patch that reverts a small part of 461014. I don't know if its universally safe to do yet, but I for the purposes of this test I think its worth getting the data point.
Created attachment 330284 [details] netpoll-quota-fix.patch Whether we decide to keep the patch in comment #18 or not, I think this patch should replace the patch in comment #4. There are perfectly valid cases where drivers may actually poll more than the quota of frames and quota may ultimately be negative. This would would result in an addition of the budget to the quota and quota would be < quota for a valid case. I would prefer we only modify quota if it's less than or equal to zero, so I propose this patch.
Created attachment 330507 [details] New capture logs Ok, I let the capture to occur for about 3 hours and 24 minutes, appling the 3 patches (tg3 debug, enable softirqs and netpoll quota patches). The results were the same as with just the tg3debug + the old netpoll patch. So, enabling softirq's didn't cause any changes. Also, on the first libpcap captured file, the ARP messages happened only after rebooting the crashed machine. the enclosed tarball has 3 files: vmcore.cap - console output; vmcore.log - the script that shows the fs progression of the captured data; captured_pkgs.libpcap - with the libpcap captured log at the ethernet interface using a cross-over cable to link the crashed machine and the netdump server (about 2,5 GB of dump).
Hmm... It didn't really created an attachment. Anyway, the logs are on this tarball: http://new-people.redhat.com/~mchehab/.bz481207/captured_data.tar.bz2
Ok, well, thats good to know. Andy I think that actually jives with your results regarding your bnx2 testing, in that captures wasn't significantly slowed down if we didn't have softirqs on. That makes me feel loads better. So now I'm a bit stumped as to why your transfer is so slow, if those arp frames are from after you rebooted the system. I'll take a look at the new trace above to see what comes up
I just found the remaining problem, although I'm not sure of its root cause, or how to solve it. It appears to be overly fast retransmits from the netdump-server. If you look at frames 26315,26339,26379,26427 and 26502 you can see that the server is re-requesting the same frame over and over and over. It appears that this particular card is just getting far enough behind in the outstanding request count that the server is needlessly retransmitting frames causing undue work and delay. Please apply this patch to the netdump-server code and see if it resolves your situaiton. It won't be the final patch (we'll probably make the timeout configurable), but this will be a good test. Thanks!
Created attachment 330631 [details] patch to up netdump server retrans timeout
It took me some time to find netdump-server source. I was expecting it to be at CVS or at RHEL, but the package were in EPEL instead... Anyway, the change didn't solve. Since this is a timeout issue, I decided to remove the tg3 debug patch. Without the patch, I did two tests with just netpoll-quota-fix.patch (id=330284). On the first test, everything works fine. the dump were created quickly: $ ls -la 192.168.2.170-2009-02-03-10:20 total 250628 drwx------ 2 netdump netdump 4096 Fev 3 10:31 . drwxr-xr-x 10 netdump netdump 4096 Fev 3 10:42 .. -rw------- 1 netdump netdump 75789 Fev 3 10:31 log -rw------- 1 netdump netdump 5100277760 Fev 3 10:31 vmcore However, a second test where I just changed the netdump-server (removing the timeout patch - to see if it would cause any effect), I got an OOPS at the RHEL4 machine. I'll attach the full dmesg log.
Created attachment 330726 [details] dmesg of the OOPSes dmesg of the crashed machine. I've merged the dmesg + the console output after pressing SysRq-C, in order to give a better picture of the Kernel panic.
Ok, thats good news, I hadn't considered the slowdown of the debug patch. As for the oops, if that was the result of just changing the netdump server, lets table that for now. If the problem is reproducible on a stock build, we can pursue it in another bug. I'll build a kernel in brew with andy's patch from comment 19 and if you could test/verify it with a stock netdump-server out of epel, I think we can call this issue solved.
http://brewweb.devel.redhat.com/brew/taskinfo?taskID=1671386 Ok, I'm building a kernel here with just andys patch. IF you can confirm that it by itself solves your problem, I'll go ahead and post it. Thanks!
Created attachment 330776 [details] Panic log Neil, I tested crashing RHEL 4.7 several times, with stock netdump server. From time to time, the OOPS still occurs. I'm enclosing the dump of the latest OOPS. Just to remind, I'm using here RHEL 4 78.30.EL kernel + andy's patch from comment 19. No other patch is applied on it. The server is running EPEL netdump-server-0.7.16-21.el5
I don't see what OOPS you are talking about. All I see in that log are the sysrq oops (which is expected, and the resultant show_state that a netdump system provides to the console). What are you seeing that you think shouldn't be there?
I was talking about the sysrq oops. When it appears, the netdump produces an incomplete vmware (with 0 bytes).
Have you checked your netdump server, and verified that it has sufficient space to store the vmcore file? The behavior you are describing is what happens when the netdump-server determines there is insufficient space to store a vmcore.
Could be. I don't have much available space on my notebook, where the netdump server is running: /dev/sda3 19G 15G 3,9G 79% / I'll do some additional tests here, being sure that I'll have enough space before generating the dumps.
Mauro, I bet lack of space on your laptop is the issue. I ran into the same situation last week and was frustrated for about 30 mins until nhorman told me to check and see if I had enough to room to hold the vmcores. I wiped out all my old logs in /var/crash and found that I had enough space and the dumps were working fine.
Andy, This is not the case. Take a look: $ du -s 1* 260220 192.168.2.170-2009-02-04-16:16 84 192.168.2.170-2009-02-04-16:24 I need 260 K for a dump. I have more than that free at my / $ df -h Sist. Arq. Tam Usad Disp Uso% Montado em /dev/sda3 19G 15G 3,9G 79% / /dev/sda7 42G 7,4G 32G 19% /d00 /dev/sda5 152G 86G 58G 60% /home tmpfs 1,9G 0 1,9G 0% /dev/shm Well, the RHEL4 is running on a machine with 4GB of ram. Does netdump-server check if I have 4Gb before start actually receiving the vmcore?
You need way more than 260K for a dump, you need enough space to store the entire vmcore (which will be approximately the size of ram on the client system). and yes, netdump-server when a dump request is received, gets the page size and the page count from the client. It then checks to see if there are (page_size*page_count) bytes available on the configured file system to store that amount of data. If its not available, you get an immediate reboot. Based on what you have listed above (I assume you went with the standard netdump-server config, which puts cores under /var/crash), you're target filesystem only has 3.9GB of space available. So if it needs 4, you're very close to, if not over, the limit. Please free up some space and try again.
Yes, freeing some space solved the issue of the 0-byte vmware-incomplete: 254M 192.168.2.170-2009-02-04-19:05 253M 192.168.2.170-2009-02-04-19:14 238M 192.168.2.170-2009-02-04-19:22 254M 192.168.2.170-2009-02-04-19:31 253M 192.168.2.170-2009-02-05-08:32 256M 192.168.2.170-2009-02-05-08:35 255M 192.168.2.170-2009-02-05-08:38 254M 192.168.2.170-2009-02-05-08:41 254M 192.168.2.170-2009-02-05-08:45 253M 192.168.2.170-2009-02-05-08:47 254M 192.168.2.170-2009-02-05-08:51 47M 192.168.2.170-2009-02-05-08:57 On the last core dump, it got a Panic on 8042 in the middle of the netdump. not sure why. Notice: sometimes, I need to do SysRq-C more than once to generate an OOPS and start netdump. This is the dump: SysRq : Crashing the kernel by request SysRq : Crashing the kernel by request SysRq : Crashing the kernel by request SysRq : Crashing the kernel by request SysRq : Crashing the kernel by request SysRq : Crashing the kernel by request SysRq : Crashing the kernel by request Unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: c0213168 *pde = 363c8001 Oops: 0002 [#1] SMP Modules linked in: md5(U) ipv6(U) parport_pc(U) lp(U) parport(U) autofs4(U) i2c) CPU: 5 EIP: 0060:[<c0213168>] Not tainted VLI EFLAGS: 00010016 (2.6.9-prep) EIP is at sysrq_handle_crash+0x0/0x8 eax: 00000063 ebx: c0348854 ecx: f5473000 edx: f7f23f88 esi: 00000063 edi: f5473000 ebp: f7f23f88 esp: c0400df8 ds: 007b es: 007b ss: 0068 Process swapper (pid: 0, threadinfo=c0400000 task=c3863670) Stack: c0213300 c02fec44 c030008f 00000202 00000006 00000001 c3a0e480 00000001 0000002e c020dee0 f7f23f88 c0346be0 f591a80c c3a0e480 0000002e c026fa75 00000001 00000001 f5afe0ac 00000001 f5afc000 f591a80c c026de64 00000001 Call Trace: [<c0213300>] __handle_sysrq+0x62/0xd9 [<c020dee0>] kbd_event+0x83/0xb0 [<c026fa75>] input_event+0x331/0x351 [<c026de64>] hidinput_hid_event+0x1d0/0x208 [<c026a431>] hid_process_event+0x28/0x52 [<c026a707>] hid_input_field+0x2ac/0x2f9 [<c026a7cd>] hid_input_report+0x79/0x98 [<c026a882>] hid_irq_in+0x96/0xf2 [<c02618ce>] usb_hcd_giveback_urb+0x14/0x3e [<f893f01e>] ehci_urb_done+0x70/0x7b [ehci_hcd] [<f893f252>] qh_completions+0x229/0x2ab [ehci_hcd] [<f8941ab9>] scan_periodic+0xff/0x246 [ehci_hcd] [<f8942423>] ehci_work+0x3b/0x8f [ehci_hcd] [<f89425b2>] ehci_irq+0x13b/0x152 [ehci_hcd] [<c026191e>] usb_hcd_irq+0x26/0x4b [<c01074f2>] handle_IRQ_event+0x25/0x4f [<c01074cd>] handle_IRQ_event+0x0/0x4f [<c0107a8d>] do_IRQ+0x157/0x1f2 ======================= [<c02dde10>] common_interrupt+0x18/0x20 [<c01040e8>] mwait_idle+0x33/0x42 [<c01040a0>] cpu_idle+0x26/0x3b Code: 11 c0 c7 05 10 e9 45 c0 00 00 00 00 c7 05 38 e9 45 c0 00 00 00 00 c7 05 2 <0>Kernel panic - not syncing: Fatal exception in interrupt Badness in panic at kernel/panic.c:121 [<c0122e08>] panic+0x159/0x166 [<c01060d0>] die+0x164/0x16b [<c0123649>] vprintk+0x136/0x14a [<c011b6e9>] do_page_fault+0x0/0x5c6 [<c011bad9>] do_page_fault+0x3f0/0x5c6 [<c0213168>] sysrq_handle_crash+0x0/0x8 [<c011dc49>] try_to_wake_up+0x288/0x293 [<c011dc49>] try_to_wake_up+0x288/0x293 [<c0120f06>] autoremove_wake_function+0xd/0x2d [<c011f1d6>] __wake_up_common+0x36/0x51 [<c011f21a>] __wake_up+0x29/0x3c [<c011b6e9>] do_page_fault+0x0/0x5c6 [<c02ddf2f>] error_code+0x2f/0x38 [<c0213168>] sysrq_handle_crash+0x0/0x8 [<c0213300>] __handle_sysrq+0x62/0xd9 [<c020dee0>] kbd_event+0x83/0xb0 [<c026fa75>] input_event+0x331/0x351 [<c026de64>] hidinput_hid_event+0x1d0/0x208 [<c026a431>] hid_process_event+0x28/0x52 [<c026a707>] hid_input_field+0x2ac/0x2f9 [<c026a7cd>] hid_input_report+0x79/0x98 [<c026a882>] hid_irq_in+0x96/0xf2 [<c02618ce>] usb_hcd_giveback_urb+0x14/0x3e [<f893f01e>] ehci_urb_done+0x70/0x7b [ehci_hcd] [<f893f252>] qh_completions+0x229/0x2ab [ehci_hcd] [<f8941ab9>] scan_periodic+0xff/0x246 [ehci_hcd] [<f8942423>] ehci_work+0x3b/0x8f [ehci_hcd] [<f89425b2>] ehci_irq+0x13b/0x152 [ehci_hcd] [<c026191e>] usb_hcd_irq+0x26/0x4b [<c01074f2>] handle_IRQ_event+0x25/0x4f [<c01074cd>] handle_IRQ_event+0x0/0x4f [<c0107a8d>] do_IRQ+0x157/0x1f2 ======================= [<c02dde10>] common_interrupt+0x18/0x20 [<c01040e8>] mwait_idle+0x33/0x42 [<c01040a0>] cpu_idle+0x26/0x3b Badness in i8042_panic_blink at drivers/input/serio/i8042.c:940 [<c021f2c5>] i8042_panic_blink+0xc0/0x1a0 [<c0122dca>] panic+0x11b/0x166 [<c01060d0>] die+0x164/0x16b [<c0123649>] vprintk+0x136/0x14a [<c011b6e9>] do_page_fault+0x0/0x5c6 [<c011bad9>] do_page_fault+0x3f0/0x5c6 [<c0213168>] sysrq_handle_crash+0x0/0x8 [<c011dc49>] try_to_wake_up+0x288/0x293 [<c011dc49>] try_to_wake_up+0x288/0x293 [<c0120f06>] autoremove_wake_function+0xd/0x2d [<c011f1d6>] __wake_up_common+0x36/0x51 [<c011f21a>] __wake_up+0x29/0x3c [<c011b6e9>] do_page_fault+0x0/0x5c6 [<c02ddf2f>] error_code+0x2f/0x38 [<c0213168>] sysrq_handle_crash+0x0/0x8 [<c0213300>] __handle_sysrq+0x62/0xd9 [<c020dee0>] kbd_event+0x83/0xb0 [<c026fa75>] input_event+0x331/0x351 [<c026de64>] hidinput_hid_event+0x1d0/0x208 [<c026a431>] hid_process_event+0x28/0x52 [<c026a707>] hid_input_field+0x2ac/0x2f9 [<c026a7cd>] hid_input_report+0x79/0x98 [<c026a882>] hid_irq_in+0x96/0xf2 [<c02618ce>] usb_hcd_giveback_urb+0x14/0x3e [<f893f01e>] ehci_urb_done+0x70/0x7b [ehci_hcd] [<f893f252>] qh_completions+0x229/0x2ab [ehci_hcd] [<f8941ab9>] scan_periodic+0xff/0x246 [ehci_hcd] [<f8942423>] ehci_work+0x3b/0x8f [ehci_hcd] [<f89425b2>] ehci_irq+0x13b/0x152 [ehci_hcd] [<c026191e>] usb_hcd_irq+0x26/0x4b [<c01074f2>] handle_IRQ_event+0x25/0x4f [<c01074cd>] handle_IRQ_event+0x0/0x4f [<c0107a8d>] do_IRQ+0x157/0x1f2 ======================= [<c02dde10>] common_interrupt+0x18/0x20 [<c01040e8>] mwait_idle+0x33/0x42 [<c01040a0>] cpu_idle+0x26/0x3b Badness in i8042_panic_blink at drivers/input/serio/i8042.c:943 [<c021f33a>] i8042_panic_blink+0x135/0x1a0 [<c0122dca>] panic+0x11b/0x166 [<c01060d0>] die+0x164/0x16b [<c0123649>] vprintk+0x136/0x14a [<c011b6e9>] do_page_fault+0x0/0x5c6 [<c011bad9>] do_page_fault+0x3f0/0x5c6 [<c0213168>] sysrq_handle_crash+0x0/0x8 [<c011dc49>] try_to_wake_up+0x288/0x293 [<c011dc49>] try_to_wake_up+0x288/0x293 [<c0120f06>] autoremove_wake_function+0xd/0x2d [<c011f1d6>] __wake_up_common+0x36/0x51 [<c011f21a>] __wake_up+0x29/0x3c [<c011b6e9>] do_page_fault+0x0/0x5c6 [<c02ddf2f>] error_code+0x2f/0x38 [<c0213168>] sysrq_handle_crash+0x0/0x8 [<c0213300>] __handle_sysrq+0x62/0xd9 [<c020dee0>] kbd_event+0x83/0xb0 [<c026fa75>] input_event+0x331/0x351 [<c026de64>] hidinput_hid_event+0x1d0/0x208 [<c026a431>] hid_process_event+0x28/0x52 [<c026a707>] hid_input_field+0x2ac/0x2f9 [<c026a7cd>] hid_input_report+0x79/0x98 [<c026a882>] hid_irq_in+0x96/0xf2 [<c02618ce>] usb_hcd_giveback_urb+0x14/0x3e [<f893f01e>] ehci_urb_done+0x70/0x7b [ehci_hcd] [<f893f252>] qh_completions+0x229/0x2ab [ehci_hcd] [<f8941ab9>] scan_periodic+0xff/0x246 [ehci_hcd] [<f8942423>] ehci_work+0x3b/0x8f [ehci_hcd] [<f89425b2>] ehci_irq+0x13b/0x152 [ehci_hcd] [<c026191e>] usb_hcd_irq+0x26/0x4b [<c01074f2>] handle_IRQ_event+0x25/0x4f [<c01074cd>] handle_IRQ_event+0x0/0x4f [<c0107a8d>] do_IRQ+0x157/0x1f2 ======================= [<c02dde10>] common_interrupt+0x18/0x20 [<c01040e8>] mwait_idle+0x33/0x42 [<c01040a0>] cpu_idle+0x26/0x3b Badness in i8042_panic_blink at drivers/input/serio/i8042.c:945 [<c021f38d>] i8042_panic_blink+0x188/0x1a0 [<c0122dca>] panic+0x11b/0x166 [<c01060d0>] die+0x164/0x16b [<c0123649>] vprintk+0x136/0x14a [<c011b6e9>] do_page_fault+0x0/0x5c6 [<c011bad9>] do_page_fault+0x3f0/0x5c6 [<c0213168>] sysrq_handle_crash+0x0/0x8 [<c011dc49>] try_to_wake_up+0x288/0x293 [<c011dc49>] try_to_wake_up+0x288/0x293 [<c0120f06>] autoremove_wake_function+0xd/0x2d [<c011f1d6>] __wake_up_common+0x36/0x51 [<c011f21a>] __wake_up+0x29/0x3c [<c011b6e9>] do_page_fault+0x0/0x5c6 [<c02ddf2f>] error_code+0x2f/0x38 [<c0213168>] sysrq_handle_crash+0x0/0x8 [<c0213300>] __handle_sysrq+0x62/0xd9 [<c020dee0>] kbd_event+0x83/0xb0 [<c026fa75>] input_event+0x331/0x351 [<c026de64>] hidinput_hid_event+0x1d0/0x208 [<c026a431>] hid_process_event+0x28/0x52 [<c026a707>] hid_input_field+0x2ac/0x2f9 [<c026a7cd>] hid_input_report+0x79/0x98 [<c026a882>] hid_irq_in+0x96/0xf2 [<c02618ce>] usb_hcd_giveback_urb+0x14/0x3e [<f893f01e>] ehci_urb_done+0x70/0x7b [ehci_hcd] [<f893f252>] qh_completions+0x229/0x2ab [ehci_hcd] [<f8941ab9>] scan_periodic+0xff/0x246 [ehci_hcd] [<f8942423>] ehci_work+0x3b/0x8f [ehci_hcd] [<f89425b2>] ehci_irq+0x13b/0x152 [ehci_hcd] [<c026191e>] usb_hcd_irq+0x26/0x4b [<c01074f2>] handle_IRQ_event+0x25/0x4f [<c01074cd>] handle_IRQ_event+0x0/0x4f [<c0107a8d>] do_IRQ+0x157/0x1f2 ======================= [<c02dde10>] common_interrupt+0x18/0x20 [<c01040e8>] mwait_idle+0x33/0x42 [<c01040a0>] cpu_idle+0x26/0x3b IMO, we should consider bz#481207 solved, since Andy patch properly fixed the issue with tg3 driver. The above Panic seems to be something else.
Committed in 83.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
The results show no problem on the machine with the tg3 driver and kernel-2.6.9.89.EL(largesmp) [root@hp-rx8640-02 ~]# uname -a Linux hp-rx8640-02.rhts.bos.redhat.com 2.6.9-78.22.EL #1 SMP Tue Dec 16 11:56:53 EST 2008 ia64 ia64 ia64 GNU/Linux the log message shows: Apr 23 22:06:16 ndnc-1 netdump[1522]: Got too many timeouts waiting for memory page for client 10.16.65.38, ignoring it Apr 23 22:06:19 ndnc-1 netdump[1522]: Got too many timeouts waiting for SHOW_STATUS for client 10.16.65.38, rebooting it [root@ndnc-1 10.16.65.38-2009-04-23-22:02]# du * 12 log 584 vmcore-incomplete [root@hp-rx8640-02 ~]# uname -a Linux hp-rx8640-02.rhts.bos.redhat.com 2.6.9-89.ELlargesmp #1 SMP Mon Apr 20 10:41:55 EDT 2009 ia64 ia64 ia64 GNU/Linux [root@ndnc-1 10.16.65.38-2009-04-23-23:35]# ls log vmcore
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1024.html