Bug 607114 - System panic in pskb_expand_head When arp_validate option is specified in bonding ARP monitor mode
System panic in pskb_expand_head When arp_validate option is specified in bon...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.5
All Linux
low Severity medium
: rc
: ---
Assigned To: Andy Gospodarek
Liang Zheng
:
Depends On:
Blocks: 665110
  Show dependency treegraph
 
Reported: 2010-06-23 05:37 EDT by Mark Wu
Modified: 2014-06-29 19:02 EDT (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 665110 (view as bug list)
Environment:
Last Closed: 2011-07-21 06:23:28 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
lspci (69.93 KB, application/octet-stream)
2011-01-18 04:23 EST, Mark Wu
no flags Details
dmidecode (41.77 KB, application/octet-stream)
2011-01-18 04:24 EST, Mark Wu
no flags Details
backported upstream fix (1.49 KB, patch)
2011-01-25 11:04 EST, Andy Gospodarek
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 33923 None None None Never
Red Hat Product Errata RHSA-2011:1065 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.7 kernel security and bug fix update 2011-07-21 05:21:37 EDT

  None (edit)
Description Mark Wu 2010-06-23 05:37:00 EDT
Description of problem:
System panics in some cases when bonding driver is used .
When arp_validate option is specified in bonding ARP monitor mode, bonding driver register new receive function bond_arp_rcv to pt_base[] with
protocol = 0x0806.

Let's think about the situation one more receive function is registered.
For example function packet_rcv is registered by arping command.
 
On this case , packet_rcv,bond_arp_rcv,arp_rcv(original one) ,
those tree functions are invoked orderly when arp packet is received.

When this delivery happens , the member "users" in skb struct is added
except for the last call , in which case it is for arp_rcv function.

See the source of netif_receive_skb and deliver_skb in detail.

In function "packet_recv" , the received skb is cloned and
the cloned bit on original skb is set.

Next is the process in "bond_arp_rcv".

In bond_arp_rcv, if we use some kind of NIC driver like ixgbe,
it tries to collect data of ARP packet into header area, (pskb_may_pull), because ixgbe build only mac addresses in header area
  and all the other data are put outside , referred from
skb_shared_info area.

The actual process for collection is done in __pskb_pull_tail.
In  it , if the targeted skb is cloned , it try to make new header
as described below (pskb_expand_head)


        if (eat > 0 || skb_cloned(skb)) {
                if (pskb_expand_head(skb, 0, eat > 0 ? eat + 128 : 0,
                                       GFP_ATOMIC))
 
  
But , in pskb_expand_head , if the skb is shared ,which means member users > 1,
the system panics:
int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
		     gfp_t gfp_mask)
{
        ...
	if (skb_shared(skb))
		BUG();

Version-Release number of selected component (if applicable):
RHEL 5.5

How reproducible:
- prepare NIC driver whose logic meets the conditions described below.
        We know ixgbe driver meets the condition.  
- introduce bonding driver
- specify arp_validate option
- invoke arping command 

Steps to Reproduce:
1.
2.
3.
  
Actual results:
System panic

Expected results:


Additional info:
Comment 3 Andy Gospodarek 2010-07-12 14:44:30 EDT
Mark, I have not looked into this before today as I as been out of the office.  

I will take a look at this explanation and see if I agree with the assessment.

There are some systems in beaker with ixgbe cards (and you can search by driver) so you may want to check some out and to try and reproduce this if you like.
Comment 5 Andy Gospodarek 2010-12-21 16:34:27 EST
Can someone please post the full back-trace from the BUG halt?
Comment 12 Mark Wu 2011-01-18 04:23:58 EST
Created attachment 474009 [details]
lspci
Comment 13 Mark Wu 2011-01-18 04:24:36 EST
Created attachment 474010 [details]
dmidecode
Comment 16 Andy Gospodarek 2011-01-25 11:04:57 EST
Created attachment 475197 [details]
backported upstream fix
Comment 17 Andy Gospodarek 2011-01-25 11:06:07 EST
Comment on attachment 475197 [details]
backported upstream fix

backport of upstream fix:

commit b30532515f0a62bfe17207ab00883dd262497006
Author: Neil Horman <nhorman@tuxdriver.com>
Date:   Thu Jan 20 09:02:31 2011 +0000

    bonding: Ensure that we unshare skbs prior to calling pskb_may_pull
Comment 18 Andy Gospodarek 2011-01-28 09:30:15 EST
This patch can probably make it's way into RHEL5.7 if we get testing feedback.
Comment 19 RHEL Product and Program Management 2011-02-01 12:01:59 EST
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 20 Andy Gospodarek 2011-02-04 16:23:49 EST
Updated test kernels that contain a patch for this issue are available here:

http://people.redhat.com/agospoda/#rhel5
Comment 31 Jarod Wilson 2011-05-13 18:18:37 EDT
Patch(es) available in kernel-2.6.18-261.el5
You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5
Detailed testing feedback is always welcomed.
Comment 34 errata-xmlrpc 2011-07-21 06:23:28 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-1065.html

Note You need to log in before you can comment on or make changes to this bug.