Bug 183602
Summary: | Bonding Interface keeps alle Links in status updelay | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Hansjoerg Maurer <hansjoerg.maurer> |
Component: | kernel | Assignee: | Andy Gospodarek <agospoda> |
Status: | CLOSED NOTABUG | QA Contact: | Brian Brock <bbrock> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 4.0 | CC: | jbaron, linville, peterm |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-04-07 14:58:21 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Hansjoerg Maurer
2006-03-02 08:25:40 UTC
I want to make sure I understand your problem correctly, so I would like to ask a question to clarify. Does your network connectivity come back 5 minutes after performing # ifdown bond0 # ifup bond0 or does it never come back at all? Your logs indicate that the bonds come back up, so after 5 minutes I would expect the traffic to continue to flow. I need to know if your complaint is that the traffic never comes back or if the complaint is that it takes 5 minutes for the traffic to come back. Hi the traffic comes back after 5 minutes. After ifup bond0 both interfaces (eth0 and eth2) of the trunk show an active link but stay in updelay for 300s. According to the docs, updelay should be ignored if there is no active link in the bond. The problems is not only related to the ifup / ifdown command. It occurs, if we plug in and out both interfaces, too. In our configuration we have two bonding interfaces bond0 with eth0 and eth2 bond1 with eth2 and eth3 both located on two dualport NICS and connected to two switches. If you need additional infos, testing, let me now. I looked at the code and the function bond_change_active_slave (L1895) (I put in some write statements) is not called in this case if (((!bond->curr_active_slave) || (bond->curr_active_slave->dev->flags & IFF_NOARP)) && (new_slave->link != BOND_LINK_DOWN)) { but unfortunatly I am not really good in C and I don't understand the networking concepts behind the code, which make me unable to work on it Greetings Hansjörg For an ifup/ifdown command sequence the updelay function is working as designed. It should take 5 minutes for the links to become active again when testing this way. When both links go down it seems the value for 'updelay' will only be ignored if both slave links go down (shown as down by the file /proc/net/bonding/bond0). I will need to find some tg3 cards before I can test this, so I will let you know my findings later this week. Both the latest RHEL and upstream kernels seem to have code that behaves in the manner you are describing. I have test RHEL kernels and see updelay being honored and I have examined the upsteam code to determine the behaviour will be the same there. I'm not sure the information in the documentation is the desired behaviour. I do not think it is wise for the kernel to decide in some cases that updelay should be ignored completely and bring the interface up anyway. I will investigate this upstream, but I don't think this is something we will put into our kernel without it being upstream. Please let me know why this feature would be important to you and I can them decide how continue -- either a code change to ignore the updelay as the documentation states or a documentation update to remove the section that says updelay will be ignored. Thank you for your replies The reason we use the updelay parameter is, that when we boot up an ethernet switch it takes some time (about 3 minutes) from showing a link to actually forwarding pakets. The updelay parameter is problematic when - pluging in and out both network interfaces of a running failover trunk within the updelay phase (e.G. to change cabeling) - doing a ifdown bond0; ifup bond0 brings the interface not until the updelay is over (e.g. after changing theetwork konfiguration) The behavior the docs describe (see below) would solve both issues. The updelay parameter should in my opinion prevent the bond to use a a link which recently comes up and is not ready, while there is another link in up state available and working. If bond comes to a situation, where no active links are available (after an ifdown bond0; ifup bond0 or after a plugging of both ports,) I do not see any benefit of the updelay parameter. - if all links stay in updelay => no traffic is possible - if, when no actice link is available any more and updelay is ommited, in this case, the bond would start working again immediatly after the switch works => network downtime is reduced to a minimum or even no downtime would occur if a plug out and in again the firts slave port and then the second slave port within the updelay phase. Or do you see a case, where it would make sense, that updelay keeps BOTH slaves of a bond in state down for 5 minutes. Its the worts case for a bond and e.g. a HA system and in this case network should return as fast as possible, which means when the first switch starts forwarding packets (and I think this is the point the docs diescribe) and not when updelay is over for all slaves. I hope it was possible for me to explain the issue. If it is not possible to solve this issue, the following part of the doc should be replaced by a warning, that using updelay can keep a bond interface down (alle Slaves in state down) even if both ports are working properly "Note that when a bonding interface has no active links, the driver will immediately reuse the first link that goes up, even if updelay parameter was specified. " Without taking a position on the design of the bonding driver, I would suggest that you disable spanning tree (or enable portfast or the equivalent) on the ports in question on the switch. That should eliminate your need for the updelay parameter. we have already done this. The problem is, when booting one switch (both ports are connected to two differnet foundry switches), the switch show a link up very early (e.g. when testing the ports) without forwarding. If we don't use updelay, bonding will try to activate this port and resuse it (with resulting heartbeat und drbd errors ...) This bug has seen no activity in the last year, so I can only presume it is no longer a problem. If there is still an issue that needs to be resolved, please re-open this bug and I will be happy to help resolve it. Thank you. |