Bug 665110
Summary: | System panic in pskb_expand_head When arp_validate option is specified in bonding ARP monitor mode | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Neal Kim <nkim> | ||||||
Component: | kernel | Assignee: | Neil Horman <nhorman> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Jan Tluka <jtluka> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | urgent | ||||||||
Version: | 6.0 | CC: | anton, dhoward, dtian, dwu, fhrbata, james.brown, jeder, jwest, jwilson, moshiro, myamazak, nhorman, nkim, peterm, plyons, sbest, skito, tao, tpnoonan | ||||||
Target Milestone: | rc | Keywords: | ZStream | ||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | kernel-2.6.32-112.el6 | Doc Type: | Bug Fix | ||||||
Doc Text: |
Bonding, when operating in the ARP monitoring mode, made erroneous assumptions regarding the ownership of ARP frames when it received them for processing. Specifically, it was assumed that the the bonding driver code was the only execution context which had access to the ARP frames network buffer data. As a result, an operation was attempted on the said buffer (specifically, to modify the size of the data buffer) which was forbidden by the kernel when a buffer was shared among several execution contexts. The result of such an operation on a shared buffer could lead to data corruption. Consequently, trying to prevent the corruption, the kernel panicked. This shared state in the network buffer could be forced to occur, for example, when running the tcpdump utility to monitor traffic on the bonding interface. Every buffer the bond interface received would be shared between the driver and the tcpdump process, thus, resulting in the aforementioned kernel panic. With this update, for the particular affected path in the bonding driver, each inbound frame is checked whether it is in the shared state. In case a buffer is shared, a private copy is made for exclusive use by the bonding driver, thus, preventing the kernel panic.
|
Story Points: | --- | ||||||
Clone Of: | 607114 | Environment: | |||||||
Last Closed: | 2011-05-19 12:49:43 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 607114 | ||||||||
Bug Blocks: | 671342 | ||||||||
Attachments: |
|
Comment 2
Andy Gospodarek
2010-12-22 19:38:00 UTC
I think maybe the best thing to do here is start instrumenting the kernels rx path to check skb->users at various points between ixgbe and bond_rcv to see if we ever get a shared skb that we can trace the origin of. Do we have this problem re-created anywhere in RH that we can use for debugging? regarding the workaround, I cant see anyway the bringup order would have an effect on the state of the rx path in the ixgbe driver. As such this workaround shouldn't have any affect on the problem. The only way I could clearly see it having an effect is if multiple different network cards were used in the bond and the workaround documented in comment 12 caused a different NIC to be used as the active interface, resulting in the ixgbe rx path not getting used. This workaround can be used if need be, but I wouldn't see it as a guarantee that the problem will not happen again. This problem still needs to be investigated to its root cause and fixed properly, meaning if the workaround is implemented we need to get this reproduced internally. Can you attach a sysreport for this system? If not, I would like at *least* the output of 'lspci' and 'lspci -vvv' Thanks! Created attachment 473393 [details]
sosreport
Hello Andy, sosreport attached. I am available to assist in any way if need be. I stumbled on something while looking at this. When this happens, are you by any chance running tcpdump, or another application that might bind an AF_PACKET protocol socket to the bonded interface or one of its slaves. This line suggests that you might have been: [<ffffffff8104fff9>] ? __wake_up_common+0x59/0x90 [<ffffffff81407f7a>] __pskb_pull_tail+0x2aa/0x360 [<ffffffffa0244530>] bond_arp_rcv+0x2c0/0x2e0 [bonding] [<ffffffff814a0857>] ? packet_rcv+0x377/0x440 <==============HERE [<ffffffff8140f21b>] netif_receive_skb+0x2db/0x670 [<ffffffff8140f788>] napi_skb_finish+0x58/0x70 (In reply to comment #16) > Created attachment 473393 [details] > sosreport Neal is this from your system (that cannot reproduce the issue) or from the customer (that can reproduce the issue)? (In reply to comment #19) > (In reply to comment #16) > > Created attachment 473393 [details] > > sosreport > > Neal is this from your system (that cannot reproduce the issue) or from the > customer (that can reproduce the issue)? Andy, That sosreport is from the customer (able to reproduce). (In reply to comment #20) > (In reply to comment #19) > > (In reply to comment #16) > > > Created attachment 473393 [details] > > > sosreport > > > > Neal is this from your system (that cannot reproduce the issue) or from the > > customer (that can reproduce the issue)? > > Andy, > > That sosreport is from the customer (able to reproduce). [agospoda@gospo ~]$ tar -tzvf /tmp/sosreport-jharan.00395193-20101221172913-d56b.tar.xz | grep lspci gzip: stdin: not in gzip format tar: Child returned status 1 tar: Error exit delayed from previous errors [agospoda@gospo ~]$ tar -tjvf /tmp/sosreport-jharan.00395193-20101221172913-d56b.tar.xz | grep lspci bzip2: (stdin) is not a bzip2 file. tar: Child returned status 2 tar: Error exit delayed from previous errors hi red hat, do you have the hardware that you need to debug this defect? if not, what hw do you need? thanks Comment 18, I need an answer to that question. Its important. Created attachment 473464 [details] RHEL6.0-82599EB-packet-split-disable.patch Neil's question from comment #18 needs to be addressed and after that is done I would like this patch to be tried. Created attachment 473534 [details] patch to preform skb sharing check in bond_arp_rcv I've got a theory on whats going on as well. If we assume that tcpdump (or some other app was running that attached an AF_PACKET socket to the bonded interface), then It would seem what would happen is that: 1) The AF_PACKET socket adds a packet reception hook to the ptype_all list via dev_add pack (most applications default to using AF_PACKET with ETH_P_ALL). 2) The bonding interface, when used with arp monitoring, adds a packet hook to the ptype_base list (it only looks for ETH_P_ARP). 3) Since netif_receive_skb always interrogates the pytpe_all list prior to the ptype_base list, the AF_PACKET packet hook (tpacket_rcv if the default mmap packet interface is used) gets called with the skb first 4) tpacket_rcv, since this is a gro frame likely falss into this if clause: if (macoff + snaplen > po->rx_ring.frame_size) { at which point it does an skb_shared check, which fails, so it keeps the the skb exactly as it was, instead opting to preform an skb_get on the skb, which bumps its users reference count to 3 (1 from the alloc, +1 in deliver_skb, +1 from this skb_get). 5) after queueing the frame, tpacket_rcv, calls kfree_skb, which reduces its skb->users count back down to 2 6) next, on return, netif_receive_skb, interrogates the ptype_base list, which causes the bonding packet hook to get received (bond_arp_rcv) 7) bond_arp_receive attempts to call pskb_may_pull on the skb, which, because it doesn't have sufficient space to expand, calls pskb_expand_head, which triggers the observed BUG() panic that is triggered by skb_shared() which checks for a skb->users count greater than 1. I think the solution to this problem is to effectively do what tpacket_rcv does, if any operations are to be preformed on the skb that require exclusive access to the skb, we need to first be sure that we are the only user of the skb. The above patch should be able to handle that. Heres a test build with that patch incorporated: http://brewweb.devel.redhat.com/brew/taskinfo?taskID=3034571 (In reply to comment #18) > I stumbled on something while looking at this. When this happens, are you by > any chance running tcpdump, or another application that might bind an AF_PACKET > protocol socket to the bonded interface or one of its slaves. This line > suggests that you might have been: > [<ffffffff8104fff9>] ? __wake_up_common+0x59/0x90 > [<ffffffff81407f7a>] __pskb_pull_tail+0x2aa/0x360 > [<ffffffffa0244530>] bond_arp_rcv+0x2c0/0x2e0 [bonding] > [<ffffffff814a0857>] ? packet_rcv+0x377/0x440 <==============HERE > [<ffffffff8140f21b>] netif_receive_skb+0x2db/0x670 > [<ffffffff8140f788>] napi_skb_finish+0x58/0x70 Customer has answered back with the following answers/questions: 1.) We are not running any tcpdump. The system panics during boot up even before Linux gives us console access. 2.) None of our applications uses AF_PACKET. Is it possible that one of the RedHat applications does that? 3.) It might be relevant but we had seen when working with Broadcom NICs on RHEL 5.4 that the interfaces are put into promiscuous mode and are left there even though there is no tcpdump being run. This was seen occasionally and we were not sure what might have been causing that. ifconfig output showed that the interface was in promiscuous mode. Ok, I think you should have them test the build anyway, heres why: For the problem, as I describe it in the bz, we need to have something listening for ETH_P_ALL packets. That can be an AF_PACKET socket, or an AF_RAW socket I don't see anything that might hold such a socket open in the sosreport. But just because I don't see it doesn't mean its not there. But weather its there or not is somewhat moot, because if it were to be there, than this problem would be observed and my patch fixes it. If the problem stops, we can then go looking for the cause, not that it matters that much, because you should be able to run tcpdump (or some other app that uses AF_PACKET/RAW) without crashing your system I have made the test kernel available to the customer. I will keep you updated once I heard back from them. Thanks again! That system is going to have to be set up to work with, as there is no link on any of the systems ethernet interfaces other than eth2. We can certainly do that, but given that the customer has a test kernel in hand at the moment, I'm more curious to know what the result of that kernel is in their environment. I've requested that dhafeman connect a second link on that box so we can configure bonding properly. Please update us with the result of the test kernel asap Customer is still working on testing out the new kernel. I will let you know the results once I hear back from them. Customer is hitting a roadblock with a missing kernel-firmware package. [root@qa-fusion-ch07-bl11 ~]# rpm -ivh /var/tmp/kernel 2.6.32-97.el6.test.x86_64.rpm error: Failed dependencies: kernel-firmware >= 2.6.32-97.el6.test is needed by kernel-2.6.32-97.el6.test.x86_64 From what I recall this was not part of the build in brew. Am I missing something? The kernel-firmware package is built in the noarch build pass. But in most cases, it should be just fine for testing purposes to rpm -ivh --nodeps that kernel and use an older kernel-firmware package, as the actual firmware blobs tend to change very little. (In reply to comment #37) > Customer is hitting a roadblock with a missing kernel-firmware package. > > [root@qa-fusion-ch07-bl11 ~]# rpm -ivh /var/tmp/kernel > 2.6.32-97.el6.test.x86_64.rpm > error: Failed dependencies: > kernel-firmware >= 2.6.32-97.el6.test is needed by > kernel-2.6.32-97.el6.test.x86_64 > > From what I recall this was not part of the build in brew. Am I missing > something? Have the customer install without deps. The kernel-firmware contents haven't changed between -97 and -71. Customer has come back with their test results, and no change. They are still getting a crash on boot. KERNEL: /usr/lib/debug/lib/modules/2.6.32-97.el6.test.x86_64/vmlinux DUMPFILE: /var/crash/127.0.0.1-2011-01-18-13:33:41/vmcore [PARTIAL DUMP] CPUS: 16 DATE: Tue Jan 18 13:33:24 2011 UPTIME: 00:01:44 LOAD AVERAGE: 0.18, 0.11, 0.04 TASKS: 341 NODENAME: qa-fusion-ch07-bl11 RELEASE: 2.6.32-97.el6.test.x86_64 VERSION: #1 SMP Fri Jan 14 10:32:07 EST 2011 MACHINE: x86_64 (2533 Mhz) MEMORY: 48 GB PANIC: "kernel BUG at net/core/skbuff.c:815!" PID: 0 COMMAND: "swapper" TASK: ffff880c64f2aab0 (1 of 16) [THREAD_INFO: ffff880664a42000] CPU: 9 STATE: TASK_RUNNING (PANIC) crash> where No stack. gdb: gdb request failed: where crash> bt PID: 0 TASK: ffff880c64f2aab0 CPU: 9 COMMAND: "swapper" #0 [ffff8800283437f0] machine_kexec at ffffffff8102edbb #1 [ffff880028343850] crash_kexec at ffffffff810b1078 #2 [ffff880028343920] oops_end at ffffffff814caba0 #3 [ffff880028343950] die at ffffffff8100f33b #4 [ffff880028343980] do_trap at ffffffff814ca474 #5 [ffff8800283439e0] do_invalid_op at ffffffff8100cee5 #6 [ffff880028343a80] invalid_op at ffffffff8100bf5b [exception RIP: pskb_expand_head+54] RIP: ffffffff81403416 RSP: ffff880028343b30 RFLAGS: 00010202 RAX: 0000000000000002 RBX: ffff880661811080 RCX: 0000000000000020 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880661811080 RBP: ffff880028343b80 R8: ffffffff81ba3f80 R9: ffff880661811164 R10: ffff880c61d476c0 R11: 0000000000000400 R12: 0000000000000000 R13: 0000000000000180 R14: ffff880c61d47000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffff880028343b88] __pskb_pull_tail at ffffffff8140587a #8 [ffff880028343bd8] bond_arp_rcv at ffffffffa0237580 #9 [ffff880028343c48] __netif_receive_skb at ffffffff8140d0fb #10 [ffff880028343cb8] netif_receive_skb at ffffffff8140d5e8 #11 [ffff880028343cf8] napi_skb_finish at ffffffff8140d6e8 #12 [ffff880028343d18] napi_gro_receive at ffffffff8140db99 #13 [ffff880028343d38] ixgbe_clean_rx_irq at ffffffffa0128b5b #14 [ffff880028343df8] ixgbe_clean_rxtx_many at ffffffffa0129566 #15 [ffff880028343e68] net_rx_action at ffffffff8140dd63 #16 [ffff880028343ec8] __do_softirq at ffffffff8106bcb7 #17 [ffff880028343f38] call_softirq at ffffffff8100c2cc #18 [ffff880028343f50] do_softirq at ffffffff8100df35 #19 [ffff880028343f70] irq_exit at ffffffff8106bab5 #20 [ffff880028343f80] do_IRQ at ffffffff814ce8e5 --- <IRQ stack> --- #21 [ffff880664a43dc8] ret_from_intr at ffffffff8100bad3 [exception RIP: intel_idle+218] RIP: ffffffff812ad3fa RSP: ffff880664a43e78 RFLAGS: 00000206 RAX: 0000000000000000 RBX: ffff880664a43ed8 RCX: 0000000000000000 RDX: 00000000000174f9 RSI: 0000000000000000 RDI: 0000000005b0efca RBP: ffffffff8100bace R8: 0000000000000000 R9: 00000000000000c8 R10: 0000001856a6fea9 R11: 00000000fffd037f R12: ffffffff814cc775 R13: ffff880664a43e18 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: ffffffffffffff52 CS: 0010 SS: 0018 #22 [ffff880664a43ee0] cpuidle_idle_call at ffffffff813dc1f7 #23 [ffff880664a43f00] cpu_idle at ffffffff81009e96 That makes absolutely no sense. This kernel adds a call to skb_share_check immediately prior to the call to pskb_may_pull. Both of those functions use skb_shared, which checks skb->users for equality to 1. So the implication here is that skb->users changed value between the two calls to skb_shared, implying that the skb is being manipulated by 2 cpus in parallel, or that skb_share_check isn't doing what its supposed to. The trace above shows that you got a kdump out of this, could you upload the vmcore somewhere and point me to it please? ok, I have good news and bad news: The bad news is that I messed this up, it was completely my fault. I have my build tree here, and see the patch in it, but somehow during the build process the patch got dropped out. I'm investigating how that happened, but right now I want to focus first on getting a correct build out. Regardless, this was a personal mistake and I apologize. I'm going to restart the build, and double check that the patch gets included to ensure this doesn't happen again. The good news is that this explains why the above test kernel failed in exactly the same way. That which was supposed to prevent the problem wasn't in place. I'll have another build in the works with evidence of the patch's inclusion shortly. http://brewweb.devel.redhat.com/brew/taskinfo?taskID=3048245 New build, I've made certain that the patch is in place this time. I have delivered the new kernel to the customer for testing. I will let you know the results asap. I also have an update regarding the reproducer. I am currently working on an older generation IBM HS21 blade, but with the same Intel Corporation 82599EB 10-Gigabit KX4 dual-port nic. I have configured bonding and arp monitoring mode with success on the same kernel version the customer is running, 2.6.32-71.el6.x86_64. My reproducer system comes up normally without a kernel panic. I will keep you updated as I continue testing. I may try to see if I can get a hold of one of the newer HS22 blades. Neal, To reproduce the problem, you need run an application which uses AF_PACKET socket, like arping, with bonding and arp monitoring configured. The customer has come back with their initial test results, and I am happy to report that the patched kernel fixes this issue. My customer is still planning to do some load testing, but all seems well so far. Thanks to everyone for your due diligence and hard work on fixing this bug. I believe our next step is to get a supported Hotfix package out to the customer? I will keep everyone posted if any new developments arise. Thanks again! Mark, Yes, that looks like the exact same problem, and thank you, your comment 46 gives us the link to the AF_PACKET socket usage that we were speculating about in comment 30. I think this also needs to go upstream. Neal, as I understand the hotfix process currently: https://docspace.corp.redhat.com/docs/DOC-47999 Thats something you can just bless and move on with. Andy and I will make any final adjustments to this patch and get it to the right places asap. posted upstream and to rhkl This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Patch(es) available on kernel-2.6.32-112.el6 Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Bonding, when operating in the ARP monitoring mode, made erroneous assumptions regarding the ownership of ARP frames when it received them for processing. Specifically, it was assumed that the the bonding driver code was the only execution context which had access to the ARP frames network buffer data. As a result, an operation was attempted on the said buffer (specifically, to modify the size of the data buffer) which was forbidden by the kernel when a buffer was shared among several execution contexts. The result of such an operation on a shared buffer could lead to data corruption. Consequently, trying to prevent the corruption, the kernel panicked. This shared state in the network buffer could be forced to occur, for example, when running the tcpdump utility to monitor traffic on the bonding interface. Every buffer the bond interface received would be shared between the driver and the tcpdump process, thus, resulting in the aforementioned kernel panic. With this update, for the particular affected path in the bonding driver, each inbound frame is checked whether it is in the shared state. In case a buffer is shared, a private copy is made for exclusive use by the bonding driver, thus, preventing the kernel panic. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0542.html |