Bug 1005567 - Can't enslave nic to bonding because of setting promiscuity failed
Summary: Can't enslave nic to bonding because of setting promiscuity failed
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 19
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Neil Horman
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-09-08 13:25 UTC by Mark Wu
Modified: 2013-10-09 15:07 UTC (History)
11 users (show)

Fixed In Version: kernel-3.11.3-301.fc20
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-10-06 01:28:58 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
system messages (7.37 KB, text/plain)
2013-09-08 13:25 UTC, Mark Wu
no flags Details
systemtap script (360 bytes, text/plain)
2013-09-08 13:32 UTC, Mark Wu
no flags Details
systemtap log (24.87 KB, text/x-log)
2013-09-08 13:32 UTC, Mark Wu
no flags Details
debug patch (390 bytes, patch)
2013-09-09 15:31 UTC, Neil Horman
no flags Details | Diff
log messages generated by the debug patch (20.21 KB, text/plain)
2013-09-11 02:14 UTC, Mark Wu
no flags Details
ip link list (9.38 KB, text/x-log)
2013-09-14 07:36 UTC, Mark Wu
no flags Details
experimental patch (1.11 KB, patch)
2013-09-14 07:38 UTC, Mark Wu
no flags Details | Diff
patch to consistently set promiscuity on vlan open (408 bytes, patch)
2013-09-23 18:05 UTC, Neil Horman
no flags Details | Diff

Description Mark Wu 2013-09-08 13:25:46 UTC
Created attachment 795345 [details]
system messages

Description of problem:
If a vlan interface is added over a bonding device, the slaves can't be added back to bonding after running ifdown/ifup bonding.


Version-Release number of selected component (if applicable):
kernel-3.10.10-200.fc19.x86_64
initscripts-9.47-1.fc19

How reproducible:
Always

Steps to Reproduce:
1. create two dummy interfaces:
ip link add dev dummy1 type dummy
ip link add dev dummy2 type dummy
2. configure a bond device as follow: 
# cat ifcfg-bond0 
DEVICE=bond0
TYPE=Ethernet
MTU=1500
ONBOOT="yes"

#cat ifcfg-dummy1 
DEVICE=dummy1
TYPE=Ethernet
SLAVE=yes
MASTER=bond0
MTU=1500
ONBOOT="yes"

#cat ifcfg-dummy2 
DEVICE=dummy2
TYPE=Ethernet
SLAVE=yes
MASTER=bond0
MTU=1500
ONBOOT="yes"

# ifup bond0 

3. create a vlan interface over the bonding:

# ip link add link bond0 name bond0.10 type vlan id 10
# ip link set dev bond0.10 up

4. Run ifdown and ifup
# ifdown bond0
# ifup bond0
It's observed that dummy2 can't be enslaved to bond0 with the following log message:
bonding: bond0: Adding slave dummy2.
dummy2: promiscuity touches roof, set promiscuity failed. promiscuity feature of device might be broken.

5. Run ifdown and ifup again:
# ifdown bond0
# ifup bond0
It's observed that dummy1 can't be enslaved to bond0 either with the following log message:
bonding: bond0: Adding slave dummy1.
dummy1: promiscuity touches roof, set promiscuity failed. promiscuity feature of device might be broken.
bonding: bond0: Adding slave dummy2.
dummy2: promiscuity touches roof, set promiscuity failed. promiscuity feature of device might be broken.

Actual results:


Expected results:
The slaves should be added after ifdown and ifup.

Additional info:

Comment 1 Mark Wu 2013-09-08 13:32:00 UTC
Created attachment 795346 [details]
systemtap script

Comment 2 Mark Wu 2013-09-08 13:32:50 UTC
Created attachment 795347 [details]
systemtap log

Comment 3 Mark Wu 2013-09-08 13:40:02 UTC
From the systemtap log, it looks of dummy2's promiscuity was set to -1 before it's enslaved to bond0 again in the first round of 'ifdown/ifup'. So dummy2's promiscuity reached 0 when it's enslaved, and then kernel assumed it overflowed.
But I can't find why dummy2's promiscuity was unset one more time.

Comment 4 Neil Horman 2013-09-09 15:23:17 UTC
This condition happens because we have an unbalanced promiscuity count somewhere.  I imagine on ifdown we make an extra call do dev_set_promiscuity somewhere that unbalances us, an on the ifup we transition dev->promiscuity from a non-zero to a zero state on a positive increment, which should be illegal.  i'll give you a debug patch to run.

Comment 5 Neil Horman 2013-09-09 15:31:12 UTC
Created attachment 795668 [details]
debug patch

You're systemtap script is good, but Its missing some of the info we need here.  The critical error occurs when we decrement our promiscuity value for a device to negative value (since promiscuity is unsigned however, its really just a zero to a non-zero transition).  When that happens on a negative inc value, we have our error.  If you could run this patch please and provide the output (feel free to convert this into a stap script if you like), then we should be able to track down our culprit I think

Comment 6 Mark Wu 2013-09-11 02:14:34 UTC
Created attachment 796206 [details]
log messages generated by the debug patch

Comment 7 Mark Wu 2013-09-11 02:48:35 UTC
I suspect the problem happened in __bond_release_one.
When the last slave, i.e., dummy2 is released in the flow of 'ifdown bond0', a NETDEV_CHANGEADDR event is emitted to its vlan device.

__bond_release_one:
    if (bond->slave_cnt == 0) {
        call_netdevice_notifiers(NETDEV_CHANGEADDR, bond->dev);
        call_netdevice_notifiers(NETDEV_RELEASE, bond->dev);
    }

vlan_device_event
  -> vlan_sync_address
    -> dev_uc_add (run into the case of 'vlan address was equal to the old address and is different from the new address')
      -> __dev_set_rx_mode
        -> __dev_set_promiscuity (promiscuity 1 on dev bond0. Please note that at this time, all the slaves have been detached already, so it will not change slave's promiscuity)

 __bond_release_one continue to run and reach the dev_set_promiscuity on the slave:
    if (!USES_PRIMARY(bond->params.mode)) {     (a)
        /* unset promiscuity level from slave */
        if (bond_dev->flags & IFF_PROMISC)  (b)
            dev_set_promiscuity(slave_dev, -1);

(a) the bonding mode is default, i.e., load balance, so the condition holds true.
(b) I supsect flag IFF_PROMISC is caused by the callback vlan_device_event, because I don't see it call 'dev_set_promiscuity(slave_dev, -1)' on dummy1. That shows the flag IFF_PROMISC is set during releasing dummy2.

I think we don't expect the condition (b) also include the case caused by vlan_device_event. So that could be the culprit. 


Please see the following calltrace, which is copied from the attched systemtap log.


cmdline /bin/bash /etc/sysconfig/network-scripts/ifdown-eth ifcfg-dummy2
Setting promiscuity 1 on dev bond0
 0xffffffff8152fbe0 : __dev_set_promiscuity+0x0/0x180 [kernel]
 0xffffffff816419a9 : kretprobe_trampoline+0x0/0x57 [kernel]
 0xffff88007a79f1ea
 0xffffffff81539563 : dev_uc_add+0x53/0x70 [kernel] (inexact)
 0xffffffffa005e7ca : vlan_device_event+0x60a/0x660 [8021q] (inexact)
 0xffffffff8164360c : notifier_call_chain+0x4c/0x70 [kernel] (inexact)
 0xffffffff810862a6 : raw_notifier_call_chain+0x16/0x20 [kernel] (inexact)
 0xffffffff8152ea4d : call_netdevice_notifiers+0x2d/0x60 [kernel] (inexact)
 0xffffffffa0004ed1 : __bond_release_one+0x3b1/0x550 [bonding] (inexact)
 0xffffffff81635431 : printk+0x67/0x69 [kernel] (inexact)
 0xffffffffa0005080 : bond_release+0x10/0x20 [bonding] (inexact)
 0xffffffffa000e0f7 : bonding_store_slaves+0x127/0x1b0 [bonding] (inexact)
 0xffffffff813d72e8 : dev_attr_store+0x18/0x30 [kernel] (inexact)
 0xffffffff812097f3 : sysfs_write_file+0xd3/0x150 [kernel] (inexact)
 0xffffffff81197d6d : vfs_write+0xbd/0x1e0 [kernel] (inexact)
 0xffffffff81198739 : sys_write+0x49/0xa0 [kernel] (inexact)
 0xffffffff81647c59 : system_call_fastpath+0x16/0x1b [kernel] (inexact)
Setting promiscuity return: 0

cmdline /bin/bash /etc/sysconfig/network-scripts/ifdown-eth ifcfg-dummy2
Setting promiscuity -1 on dev dummy2
 0xffffffff8152fbe0 : __dev_set_promiscuity+0x0/0x180 [kernel]
 0xffffffff816419a9 : kretprobe_trampoline+0x0/0x57 [kernel]
 0xffff88007440f000
 0xffffffffa0005010 : __bond_release_one+0x4f0/0x550 [bonding] (inexact)
 0xffffffff81635431 : printk+0x67/0x69 [kernel] (inexact)
 0xffffffffa0005080 : bond_release+0x10/0x20 [bonding] (inexact)
 0xffffffffa000e0f7 : bonding_store_slaves+0x127/0x1b0 [bonding] (inexact)
 0xffffffff813d72e8 : dev_attr_store+0x18/0x30 [kernel] (inexact)
 0xffffffff812097f3 : sysfs_write_file+0xd3/0x150 [kernel] (inexact)
 0xffffffff81197d6d : vfs_write+0xbd/0x1e0 [kernel] (inexact)
 0xffffffff81198739 : sys_write+0x49/0xa0 [kernel] (inexact)
 0xffffffff81647c59 : system_call_fastpath+0x16/0x1b [kernel] (inexact)
Setting promiscuity return: 0

Comment 8 Mark Wu 2013-09-11 02:49:19 UTC
So this problem could be related commit 2af73d4b2afe826d23e83f3747f850eefbd867ff

Comment 9 Neil Horman 2013-09-11 19:59:14 UTC
yeah, I think you're right, we probably need to somehow record and backout any promiscuity changes from the slaves at detach time to prevent any sync issues like this.  I'll work up a patch

Comment 10 Mark Wu 2013-09-12 00:48:17 UTC
Neil,
Thanks a lot for your help!  I am glad to verify it when the patch is ready.

Comment 11 Mark Wu 2013-09-12 00:53:13 UTC
BTW, I don't understand why the promiscuity status is represented as a uint? What about using a bool value for it? Then we just need flip the flag for promiscuity setting.  Is there anything this idea will break in the existing usage of changing promiscuity? Thanks!

Comment 12 Neil Horman 2013-09-12 17:29:00 UTC
Regarding your idea, yes, promiscuity is something that needs to be counted.  If multiple users of an interface request device promiscuity in various orders (i.e. if A and B want to set IFF_PROMISC on interface X, and then B stops using interface X, B will attempt to clear IFF_PROMISC, which will affect A adversely).  the model we want is for IFF_PROMISC to be set when the first user asks for it, and for it to be cleared _only_ when the last user asks to clear it.

Comment 13 Neil Horman 2013-09-12 19:48:28 UTC
Ok, I think I have an idea for the fix - I think, as you note above, whats happening is that we're doing something bad in vlan_device_event.  Specifically I think whats happening is that we're calling vlan_sync_addr when we havent really changed our hardware address.  If we send that event, vlan_sync_address will call dev_uc_del which in turn will call __dev_set_rx_mode causing the bonds promiscuity to get decremented, along with any remaining slaves

Then we finish __bond_release_one to fix up its promiscuity, and all is well.

However, on the next iteration of __bond_release_one, we do the decrement of the promiscuity of the remaining slave that we are releasing, but since it was decremented in the CHANGEADDR call above we underflow and start our problem.

I'm not 100% convinced of this order of events, but I'm going to re-create the problem here to see if I can sort it out.

Comment 14 Neil Horman 2013-09-13 13:58:31 UTC
hmm, so it appears that I'm not able to recreate the problem as you describe it, which suggests that I don't fully understand the problem.  Could you please send me the output of:

ip link show

before and after each time you ifdown or ifup the bond?

Thanks!

Comment 15 Mark Wu 2013-09-14 07:36:32 UTC
Created attachment 797555 [details]
ip link list

Comment 16 Mark Wu 2013-09-14 07:38:09 UTC
Created attachment 797556 [details]
experimental patch

The problem disappeared with the experimental patch.

Comment 17 Neil Horman 2013-09-23 17:46:26 UTC
No, thats just a band-aid patch.  Its uses an potentially uninitalized variable, and it relies of the state of the promisc flag to determine if it should decrement the promisc flag.  I see how it prevents the problem from happening, but you can't just paper over an imbalance in the promiscuity balancing.

That said, looking at your ip link list output, I think I see the problem.  We're changing the bond mac address on the last slave release prior to sending the NETDEV_CHANGEADDR notification, which causes a decrement in the promiscuity for which we have no corresponding increment.  I think what happening is that on vlan_dev_open, we see that the vlan_dev is the same as the real dev mac, so no dev_uc_add action occurs, which means no promiscuity changes occur on device open.  On bond shutdown however, we change the mac of the interface then call CHANGEADDR, which causes the vlan device to do a vlan_sync_address, causing a promiscuity change.

The fix for this really needs to be in the vlan code

Comment 18 Neil Horman 2013-09-23 18:05:46 UTC
Created attachment 801842 [details]
patch to consistently set promiscuity on vlan open

here you go, this isn't a final patch but it should confirm that this is the right area to fix.  it will ensure that we always set promiscuity on open (whcih is really bad, but will keep our refcounting in sync.  Please test it and let me know how it goes.

Also, would you mind testing an upstream kernel on this?  Like the head of the net-next tree?  looking at __dev_set_rx_mode it seems like this case should already be handled, so I'm having trouble seeing whats going wrong.

Comment 19 Mark Wu 2013-09-26 15:22:22 UTC
Neil,
Thanks for the explanation. I understood what you said about how the problem happened. In the patch I actually tested,  I had fixed the uninitalized issue by change initialize it as zero.  Yes, it relies of the state of the promisc flag to determine if it should decrement the promisc flag. I think it makes sense.


For your test patch, it seems dev_set_rx_mode is not exported, I use 'dev_set_promiscuity(real_dev, 1)' instead. With this change, the problem disappear. Do you think it can cover the same test purpose as dev_set_rx_mode?

For the net-next tree, I will test it later.

Comment 20 Neil Horman 2013-09-26 15:48:30 UTC
Its exported just fine, I see it externed in netdevice.h and its got an EXPORT_SYMBOL macro right below the function definition.  Are you getting an error when you're building?

Comment 21 Mark Wu 2013-09-27 01:04:09 UTC
Yes, I got the following build warning and failed to insert the module:
WARNING: "dev_set_rx_mode" [net/8021q/8021q.ko] undefined!

It's declared in netdevice.h, but I didn't see the EXPORT_SYMBOL on it either in fedora kernel or latest upstream master code. Am I missing something? Could you please check it again?  Thanks!

Comment 22 Neil Horman 2013-09-27 13:39:49 UTC
Shoot, I apologize, I was reading it wrong (trying to do too many things at once apparently).  You're right, using dev_set_promiscuity there makes sense now.  I'll update the patch, see if I can't make it a bit more efficient and send it upstream. Thanks!

Comment 23 Neil Horman 2013-09-27 16:08:56 UTC
And I'm actually going to apologize to you again.  I was about to post this, and while taking a last look at the logs I noted that the problem isn't in the bonding driver but one of its (former) slaves. Specifically whats happening is that when we release the last slave, we send the NETDEV_CHANGEADDR notification, which causes the vlan to note that the mac addresses between it and the bond driver were synced, and now are not, causing the vlan device to do a dev_uc_add, and dev_set_rx_mode, which puts us into procmisc mode (in the bond driver).  We weren't in promisc mode prior to the release, but we're checking the current state when calling dev_set_promiscuity at the end of __bond_release_one.  You're origional patch that cached the state of the promisc flags was the right way to go all along, I just wanst thinking through it all.  I'll post a variant of that patch (cleaning up the use without set warning) in just a minute.

Comment 24 Neil Horman 2013-09-27 16:41:21 UTC
http://marc.info/?l=linux-netdev&m=138029908908114&w=2

Posted

Comment 25 Fedora Update System 2013-10-03 13:43:19 UTC
kernel-3.11.3-201.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/kernel-3.11.3-201.fc19

Comment 26 Fedora Update System 2013-10-03 13:45:34 UTC
kernel-3.11.3-301.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/kernel-3.11.3-301.fc20

Comment 27 Fedora Update System 2013-10-04 02:02:26 UTC
Package kernel-3.11.3-301.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.11.3-301.fc20'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-18268/kernel-3.11.3-301.fc20
then log in and leave karma (feedback).

Comment 28 Fedora Update System 2013-10-06 01:28:58 UTC
kernel-3.11.3-201.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 29 Fedora Update System 2013-10-09 14:14:20 UTC
kernel-3.11.3-301.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.