Bug 516289 - bonding: backport code to allow user-controlled output slave detection.
Summary: bonding: backport code to allow user-controlled output slave detection.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.4
Hardware: All
OS: Linux
high
medium
Target Milestone: rc
: ---
Assignee: Neil Horman
QA Contact: Network QE
URL:
Whiteboard:
: 518362 (view as bug list)
Depends On:
Blocks: 533192 554476 557291 601849
TreeView+ depends on / blocked
 
Reported: 2009-08-07 20:06 UTC by Neil Horman
Modified: 2018-11-14 19:59 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
: 601849 (view as bug list)
Environment:
Last Closed: 2011-01-13 20:51:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
proposed new bonding patch for upstream (10.92 KB, patch)
2010-04-20 18:11 UTC, Neil Horman
no flags Details | Diff
updated patch (10.93 KB, patch)
2010-04-21 19:42 UTC, Neil Horman
no flags Details | Diff
bonding-queue-gospo-2-tested.patch (15.51 KB, patch)
2010-04-26 21:38 UTC, Andy Gospodarek
no flags Details | Diff
bonding-queue-gospo-4.patch (17.62 KB, patch)
2010-04-29 01:11 UTC, Andy Gospodarek
no flags Details | Diff
updated patch (18.75 KB, patch)
2010-05-06 20:14 UTC, Neil Horman
no flags Details | Diff
0001--bonding-add-keep_all-parameter.patch (8.49 KB, patch)
2010-05-10 18:24 UTC, Andy Gospodarek
no flags Details | Diff
0002--bonding-allow-user-controlled-output-slave-select.patch (14.67 KB, patch)
2010-05-10 20:01 UTC, Andy Gospodarek
no flags Details | Diff
0002--bonding-allow-user-controlled-output-slave-select.patch (14.68 KB, application/octet-stream)
2010-05-11 00:01 UTC, Andy Gospodarek
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0017 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.6 kernel security and bug fix update 2011-01-13 10:37:42 UTC

Description Neil Horman 2009-08-07 20:06:26 UTC
Description of problem:
This is to track a customer request to have a bonding mode which allows for both ports to be used in the bond in an active/active configuration with failover support.  Andy Gospodarek has implemented this upstream with his mark mode code:

http://marc.info/?l=linux-netdev&m=124119957702606&w=2


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Neil Horman 2009-08-12 20:19:29 UTC
test build with a backport of andys upstream patch, for use when it gets accepted up there:
http://brewweb.devel.redhat.com/brew/taskinfo?taskID=1926673

Comment 3 Jeremy West 2009-08-20 04:29:04 UTC
*** Bug 518362 has been marked as a duplicate of this bug. ***

Comment 7 Neil Horman 2010-04-13 20:47:57 UTC
Ok, this appears dead in the water upstream, but I think the concerns that the maintainer had can be worked around without too much of an issue.  Basically we can use the same patch but modify it such that:

1) The tc utility is used to classify traffic and the skbedit action is used to set the mark on the frame

2) The bonding code itslef is modified such that the available modes simply check the mark for output affinity rather than introducing a new mode entirely.

I've got a modification on the patch started, and will have it up soon

Comment 8 Andy Gospodarek 2010-04-16 16:04:19 UTC
I spoke with Neil about this earlier this week and the best approach is probably to overload one of the existing modes in a way that the hashing algorithm used for selecting the output port uses the skb mark.  My suggestion would be to apply this to the tlb and possibly alb modes as a new option rather than creating a new mode.  Much like the original patch for marking mode the basic code change should be easy, but special care will need to be taken to make sure ARP works out correctly (one of the main problems with the original marking mode) and that these modes can operate well when using multiple switches.

Comment 10 Neil Horman 2010-04-20 18:11:04 UTC
Created attachment 407886 [details]
proposed new bonding patch for upstream

I've got this owrking on upstream kernels here.  i'm going to discuss this with gospo, then propose it upstream

Comment 11 Neil Horman 2010-04-21 19:42:05 UTC
Created attachment 408155 [details]
updated patch

found a bug in testing where I trip on a null pointer, this updated patch fixes it

Comment 12 Andy Gospodarek 2010-04-23 20:58:24 UTC
I tested Neil's patch today and it looks like the thought to use queue_maps set via tc is a good one.  It works around the problems I saw with the skb mark implementation.  He and I discussed a few things and I will create, test, and attach to this BZ an updated patch based on our conversation.

Comment 13 Andy Gospodarek 2010-04-26 21:38:44 UTC
Created attachment 409299 [details]
bonding-queue-gospo-2-tested.patch

Here is an updated patch based on the work done by Neil.  I moved some of structures around a bit since the queue ids will become configurable rather than having them hardcoded on init as the first patch did.  This also exports a new file in /sys/class/net/bond0/bonding/ called queue_id.  Output of the list of current queues is working, but I need to work on the string manipulation magic to allow the list of queues added to take correctly.

The other part that needs work in a new sysfs parameter to ignore the check in __skb_bond_should_drop so this can work with active-backup interfaces.

Comment 14 Andy Gospodarek 2010-04-27 02:48:04 UTC
I've got a workaround for the __skb_bond_should_drop issue, so I just need to fix-up the bits to set the queue_ids for each interface and address any hotpath issues Neil has.

Comment 15 Andy Gospodarek 2010-04-29 01:11:26 UTC
Created attachment 409998 [details]
bonding-queue-gospo-4.patch

This should incorporate the needed changes that were discussed in this bug and over email.  Not fully tested, but this should get us close to what we need for full upstream support.  Three new bonding module options are added (the one to modify skb_bond_should_drop should be spilt out since it is an independent feature) and two of those are configurable via sysfs.

Comment 16 Neil Horman 2010-05-06 20:14:20 UTC
Created attachment 412154 [details]
updated patch

updated version of the patch, tested, with the following changes:

1) minor cleanups (fix checkpatch warnings/errors)

2) Updated documentation, detailing use of the sysfs interface

3) removed kfree_skb from bond_override to prevent double free of skbs

4) modified sysfs interface to produce and consume interface:qid format data, only accepting 1 par per write, to simplify the code

5) changed read_lock_bh to write_lock_bh in sysfs store_queue_id to prevent concurrent writes from tramping each other.

Comment 17 Andy Gospodarek 2010-05-06 21:03:04 UTC
The patch in comment #16 is close to what I have been testing as well.  The only issues I see immediately are as follows:

1) The write_lock_bh(&bond->lock) in bonding_store_queue_id isn't needed to prevent simultaneous writing of the slave's queue_id.  The rtnl_lock takes care of serialization for us.  Just having a read_lock should be sufficient as it will prevent slave removal while running.

2) We need to be allowed to set the queue_id to zero as this effectively disables steering based on skb->queue_mapping.  The check in bonding_store_queue_id for that should probably be removed.

3) I need to test the keep_all flag a bit more as I'm not sure it is actually needed for active-backup to work correctly.  Initial testing did not indicate it was.

Isn't it interesting that a vast majority of the effort is being extended to essentially work on the kernel's UI? :)

Other than that it looks like this is almost ready to post after a bit more testing.

Comment 18 Neil Horman 2010-05-07 13:24:36 UTC
1) I think the write_lock_bh is needed, as the sysfs interface doesn't appear to hold the rtnl_lock for us.  See bonding_store_bonds for an example.  If we need to hold both the rtnl_lock and the bond lock, then yes, I agree a read lock is sufficient, but I don't think both are needed.

2) We can set the queue_id to zero with this patch   We check to make sure we don't assign duplicate queue ids but this code:
else if (qid && (qid == slave->queue_id)) {
                      goto err_no_cmd_unlock;
...

Only preforms the check if the requested qid is non-zero, so we can reset all the slave devs to a qid of zero.  The skb->queue_mapping check is in the override function, and still needs to be there to prevent inadvertent overrides when all the queues are set to zero.

3) Concur, I've not done any testing of the keep_all flag yet.

It is kind of interesting where all the work is going in this feature.  The output path changes are extreemly lightweight, which is rather nice.

Comment 19 Andy Gospodarek 2010-05-07 13:41:27 UTC
1.  The sysfs layer doesn't hold rtnl, but we take it at the beginning of bonding_store_queue_id and release it at the end.

I don't understand your reference to bonding_store_bonds.  That function will add and remove bonds and takes rtnl as well.

2.  Sorry I was talking about this line:

	if (!qid || (qid > bond->params.tx_queues))
		goto err_no_cmd;

If qid is 0, we skip setting it.

Comment 20 Neil Horman 2010-05-07 14:03:48 UTC
1) Ah, my bad, your right, I was looking for rtnl_lock, and missed the trylock.  sorry, the bond lock can be a read lock, you're correct.

2) you're misreading that line. its not !qid, its just a test of qid.  If qid is zero, we skip the second conditional in the and, and keep iterating the for loop, which makes multiple qids of zero acceptable.

Comment 21 Andy Gospodarek 2010-05-10 18:24:23 UTC
Created attachment 412921 [details]
0001--bonding-add-keep_all-parameter.patch

patch 1/2

Comment 22 Andy Gospodarek 2010-05-10 20:01:34 UTC
Created attachment 412952 [details]
0002--bonding-allow-user-controlled-output-slave-select.patch

patch 2/2

both of these are against today's net-next-2.6 tree.

Comment 23 Neil Horman 2010-05-10 20:12:08 UTC
ACK to both patches, lets send them upstream.

Comment 24 Andy Gospodarek 2010-05-11 00:01:57 UTC
Created attachment 413007 [details]
0002--bonding-allow-user-controlled-output-slave-select.patch

patch 2/2 with small bugfix found while testing

Comment 27 Andy Gospodarek 2010-06-07 13:06:21 UTC
Patches are upstream:

commit ebd8e4977a87cb81d93c62a9bff0102a9713722f
Author: Andy Gospodarek <andy>
Date:   Wed Jun 2 08:39:21 2010 +0000

    bonding: add all_slaves_active parameter

commit bb1d912323d5dd50e1079e389f4e964be14f0ae3
Author: Andy Gospodarek <andy>
Date:   Wed Jun 2 08:40:18 2010 +0000

    bonding: allow user-controlled output slave selection

We will need to make sure we have a recent enough version of the 'tc' to make this work as well.

Comment 28 Neil Horman 2010-06-07 20:14:48 UTC
We'll also need to modify the RHEL5 patch so that we encoude output slave in the mark field, rather than in the queue_mapping field, as RHEL5 has no tx multiqueue capabilities

Comment 31 Neil Horman 2010-09-03 18:05:17 UTC
http://brewweb.devel.redhat.com/brew/taskinfo?taskID=2734080

Test build in brew.  I'll test it out on dell-pe1950-07

Comment 33 Jarod Wilson 2010-09-21 20:59:01 UTC
in kernel-2.6.18-223.el5
You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 37 errata-xmlrpc 2011-01-13 20:51:24 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html


Note You need to log in before you can comment on or make changes to this bug.