Bug 508297 - RTNL: assertion failed due to bonding notify.
Summary: RTNL: assertion failed due to bonding notify.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.4
Hardware: All
OS: Linux
low
low
Target Milestone: rc
: 5.4
Assignee: Stanislaw Gruszka
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-06-26 13:22 UTC by Stanislaw Gruszka
Modified: 2018-12-02 15:43 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-09-02 09:02:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Full dmesg. (33.90 KB, application/octet-stream)
2009-06-26 13:25 UTC, Stanislaw Gruszka
no flags Details
Patch with proposed fix. (1.31 KB, patch)
2009-06-26 13:28 UTC, Stanislaw Gruszka
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:1243 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.4 kernel security and bug fix update 2009-09-01 08:53:34 UTC

Description Stanislaw Gruszka 2009-06-26 13:22:29 UTC
Description of problem:
Warning and Call Traces in dmesg (possible data corruption ?)

Version-Release number of selected component (if applicable):
Found in kernel 2.6.18-153.el5.

Steps to Reproduce:
1. /etc/init.d/networking stop
2. Configure bonding in balance-rr mode 
3. /etc/init.d/networking start
  
Actual results:
Warnings in dmesg about failed assertion.

Expected results:
No warnings.

Comment 1 Stanislaw Gruszka 2009-06-26 13:24:15 UTC
When I run bonding (with BONDING_OPTS="mode=balance-rr arp_interval=100 
arp_ip_target=10.34.1.154) on my system, I have call traces due to RTNL
assertion.

RTNL: assertion failed at net/core/fib_rules.c (388)

Call Trace:
 [<ffffffff802357f4>] fib_rules_event+0x3d/0xff
 [<ffffffff80067eaa>] notifier_call_chain+0x20/0x32
 [<ffffffff88663501>] :bonding:bond_select_active_slave+0xf6/0x10f
 [<ffffffff886659ae>] :bonding:bond_loadbalance_arp_mon+0x1a3/0x1da
 [<ffffffff8866580b>] :bonding:bond_loadbalance_arp_mon+0x0/0x1da 
 [<ffffffff8004dbfc>] run_workqueue+0x94/0xe4
 [<ffffffff8004a460>] worker_thread+0x0/0x122
 [<ffffffff800a02fa>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8004a550>] worker_thread+0xf0/0x122
 [<ffffffff8008ccc7>] default_wake_function+0x0/0xe
 [<ffffffff800a02fa>] keventd_create_kthread+0x0/0xc4
 [<ffffffff800a02fa>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80033062>] kthread+0xfe/0x132
 [<ffffffff8005efb1>] child_rip+0xa/0x11
 [<ffffffff800a02fa>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032f64>] kthread+0x0/0x132
 [<ffffffff8005efa7>] child_rip+0x0/0x11

RTNL: assertion failed at net/ipv4/devinet.c (986)

Call Trace:
 [<ffffffff8025c095>] inetdev_event+0x48/0x282
 [<ffffffff80067eaa>] notifier_call_chain+0x20/0x32
 [<ffffffff88663501>] :bonding:bond_select_active_slave+0xf6/0x10f
 [<ffffffff886659ae>] :bonding:bond_loadbalance_arp_mon+0x1a3/0x1da
 [<ffffffff8866580b>] :bonding:bond_loadbalance_arp_mon+0x0/0x1da
 [<ffffffff8004dbfc>] run_workqueue+0x94/0xe4
 [<ffffffff8004a460>] worker_thread+0x0/0x122
 [<ffffffff800a02fa>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8004a550>] worker_thread+0xf0/0x122
 [<ffffffff8008ccc7>] default_wake_function+0x0/0xe
 [<ffffffff800a02fa>] keventd_create_kthread+0x0/0xc4
 [<ffffffff800a02fa>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80033062>] kthread+0xfe/0x132
 [<ffffffff8005efb1>] child_rip+0xa/0x11
 [<ffffffff800a02fa>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032f64>] kthread+0x0/0x132
 [<ffffffff8005efa7>] child_rip+0x0/0x11

This happens because natdev_bonding_change() is called without rtnl_lock() 
from bond_loadbalance_arp_mon() -> bond_select_active_slave(). That was added
in:

commit 47c4d639ac64ad423235c622306bc0bcba62b2d9
Author: Andy Gospodarek <gospo>
Date:   Thu Apr 23 14:44:45 2009 -0400

    [net] bonding: support for bonding of IPoIB interfaces

Comment 2 Stanislaw Gruszka 2009-06-26 13:25:06 UTC
Created attachment 349548 [details]
Full dmesg.

Comment 3 Stanislaw Gruszka 2009-06-26 13:28:19 UTC
Created attachment 349549 [details]
Patch with proposed fix.

Patch move netdev_bonding_change() to bond_change_active_slave() 
and call it only if mode is active-backup, so prevent running this
function from bond_loadbalance_arp_mon(). This is the same way as is
done is mainline.

Comment 4 RHEL Program Management 2009-06-26 13:52:56 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 6 Don Zickus 2009-07-07 15:05:53 UTC
in kernel-2.6.18-157.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 9 errata-xmlrpc 2009-09-02 09:02:01 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html


Note You need to log in before you can comment on or make changes to this bug.