Bug 1289962 - Ovs bonding with balance-slb causes network disconnection to instances due to port flapping
Ovs bonding with balance-slb causes network disconnection to instances due to...
Status: CLOSED INSUFFICIENT_DATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openvswitch (Show other bugs)
7.0 (Kilo)
Unspecified Unspecified
high Severity high
: ---
: 7.0 (Kilo)
Assigned To: Thadeu Lima de Souza Cascardo
Ofer Blaut
: ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-12-09 07:24 EST by Sadique Puthen
Modified: 2017-05-25 03:43 EDT (History)
19 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-09-29 15:26:55 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 2805851 None None None 2017-05-25 03:43 EDT

  None (edit)
Description Sadique Puthen 2015-12-09 07:24:42 EST
Description of problem:

Cleartrip has a provider external network to which instances are directly attached. ping to instances fails intermittently. Eg, if we ping to the instance from outside (same network or different network), ping will work for 10 minutes. Then it will stop for 5 minutes. Then it will again start working.

Systems are Cisco UCS where external network connection is an ovs bonded interface with balance-slb.

We identified that this is due to port flapping for the test instance mac address on the br-ex bridge.

Instance mac address is fa:16:3e:7f:29:25. ovs-ofctl show br-ex shows port mac association as below while working.

4 - phy-br-ex
5 - eth4
6 - eth3

4 133 fa:16:3e:7f:29:25

When it does not work, it shows as below.

5 133 fa:16:3e:7f:29:25

When we kept a watch on "ovs-appctl fdb/show br-ex", we saw the port keeps flapping to 5 when it stops working and to 4 when it resumes working.

We then brought down eth4 and didn't observe this when one of the slave was taken out of the bond.

Attaching sosreport now. tcpdump from eth3 and eth4 that covers non-working and working scenarios is being attached by customer and will be made available.

plotnetcfg output is included in the sosreport.
Comment 4 Assaf Muller 2015-12-09 09:58:04 EST
Reassigning to openvswitch component for triaging.
Comment 5 Sadique Puthen 2015-12-09 12:54:52 EST
Rashid,

This is a Urgent issue which now blocks the Openstack project from moving ahead. We would like to get someone on Bomgar to investigate why this flapping happens  if possible.

We will have to move to active-backup (which is not useful here as bonding is done only to improve throughput) tomorrow to get the tests going if don't have any other work around to make it working with balance-slb.

Any advice and more details required. We are also trying to reproduce internally using plain ovs + balance-slb bonding and KVM vms to see if we can reproduce internally.
Comment 6 Dan Sneddon 2015-12-09 12:59:33 EST
(In reply to Sadique Puthen from comment #0)

> Instance mac address is fa:16:3e:7f:29:25. ovs-ofctl show br-ex shows port
> mac association as below while working.
> 
> 4 - phy-br-ex
> 5 - eth4
> 6 - eth3
> 
> 4 133 fa:16:3e:7f:29:25
> 
> When it does not work, it shows as below.
> 
> 5 133 fa:16:3e:7f:29:25

OVS balance-slb works by tying a VLAN to a particular port at any given moment. The VLANs are rebalanced at regular intervals, in order to keep all ports in the bond active.

This sounds like the configuration is not exactly the same between ports 4 and 5. If port 4 has a VLAN trunked that port 5 does not, then when OVS tries to balance that VLAN on port 5 the packets will be dropped. When the VLAN bounces back to port 4, the switch accepts that VLAN and the packets go through.

Please have the customer double-check the VLAN trunking configuration on all ports inside the bond, and ensure that the configuration is identical.
Comment 7 Thadeu Lima de Souza Cascardo 2015-12-09 13:07:12 EST
So, from what I can see from the pcap files, you are pinging from 172.16.13.51 to 172.19.13.23. I guess you started capturing those files with a difference of about 83 seconds, since I see echo request 2600 coming to eth3 at arount 202s, then request 2601 coming to eth4 at around 285s. Does this sound right?

If that's right, then what I observe is that it takes 5 minutes (300 seconds), around 584s on eth4 for the target to respond again.

That is consistent with the default idle time of MAC learning. If you are using SLB, OVS drops packets with a destination MAC address coming from a port that is not the port it knows this MAC address goes to. With the expiration taking 300s, that's how long it will take to start accepting packest again.

There is a way to change that time. Can you experiment and change it to 120s, for example, and see if the time it takes for ping to respond changes? If that does, you can find an acceptable value that won't flood your network too much.

You may set the parameter like this:

ovs-vsctl set bridge br-ex other_config:mac-aging-time=120

Cascardo.
Comment 8 Sadique Puthen 2015-12-09 13:17:10 EST
> OVS balance-slb works by tying a VLAN to a particular port at any given
> moment. The VLANs are rebalanced at regular intervals, in order to keep all
> ports in the bond active.
> 
> This sounds like the configuration is not exactly the same between ports 4
> and 5. If port 4 has a VLAN trunked that port 5 does not, then when OVS
> tries to balance that VLAN on port 5 the packets will be dropped. When the
> VLAN bounces back to port 4, the switch accepts that VLAN and the packets go
> through.

port 4 is phy-br-ex which is internal ovs port and not bond slave which has no associated vlan configuration on switch/blade. Note that flapping here is between port 4 (phy-br-ex) and 5 (eth4). It's not between port 5 (eth4) and 6 (eth3). Only port 5 and 6 have a vlan configuration on switch.

> Please have the customer double-check the VLAN trunking configuration on all
> ports inside the bond, and ensure that the configuration is identical.

During testing, we brought down port 5 using ifdown and the only active slave was port 6. This caused the communication to work without any issues using port 6. We haven't brought down port 6 and make port 5 the only active slave yet. Will doing this (bring down port 6 and make port 5 only active slave) confirm the theory of misconfigured vlan on switch or blade?
Comment 9 Sadique Puthen 2015-12-09 13:27:57 EST
(In reply to Thadeu Lima de Souza Cascardo from comment #7)
> So, from what I can see from the pcap files, you are pinging from
> 172.16.13.51 to 172.19.13.23. I guess you started capturing those files with
> a difference of about 83 seconds, since I see echo request 2600 coming to
> eth3 at arount 202s, then request 2601 coming to eth4 at around 285s. Does
> this sound right?

This is correct. We started running ping first. We then did tcpdump on eth3. Then we had to ssh to the node again to run tcpdump on eth4. This ssh time might have taken 83 seconds.

> 
> If that's right, then what I observe is that it takes 5 minutes (300
> seconds), around 584s on eth4 for the target to respond again.
> 
> That is consistent with the default idle time of MAC learning. If you are
> using SLB, OVS drops packets with a destination MAC address coming from a
> port that is not the port it knows this MAC address goes to. With the
> expiration taking 300s, that's how long it will take to start accepting
> packest again.

This is correct, but shouldn't we concentrate on what triggers the ovs to learn wrong mac - port association in the first place which causes this network drop? Doesn't the expiration of 300s comes after this? 

> There is a way to change that time. Can you experiment and change it to
> 120s, for example, and see if the time it takes for ping to respond changes?
> If that does, you can find an acceptable value that won't flood your network
> too much.

I can test this. But does this solve the network issue instead of reducing the 300 seconds network outage to 120 seconds on each time it learns the wrong port for the mac address?

Or did I misunderstand anything here?

> 
> You may set the parameter like this:
> 
> ovs-vsctl set bridge br-ex other_config:mac-aging-time=120
> 
> Cascardo.
Comment 10 Thadeu Lima de Souza Cascardo 2015-12-09 13:57:34 EST
(In reply to Sadique Puthen from comment #9)
> (In reply to Thadeu Lima de Souza Cascardo from comment #7)
> > So, from what I can see from the pcap files, you are pinging from
> > 172.16.13.51 to 172.19.13.23. I guess you started capturing those files with
> > a difference of about 83 seconds, since I see echo request 2600 coming to
> > eth3 at arount 202s, then request 2601 coming to eth4 at around 285s. Does
> > this sound right?
> 
> This is correct. We started running ping first. We then did tcpdump on eth3.
> Then we had to ssh to the node again to run tcpdump on eth4. This ssh time
> might have taken 83 seconds.
> 
> > 
> > If that's right, then what I observe is that it takes 5 minutes (300
> > seconds), around 584s on eth4 for the target to respond again.
> > 
> > That is consistent with the default idle time of MAC learning. If you are
> > using SLB, OVS drops packets with a destination MAC address coming from a
> > port that is not the port it knows this MAC address goes to. With the
> > expiration taking 300s, that's how long it will take to start accepting
> > packest again.
> 
> This is correct, but shouldn't we concentrate on what triggers the ovs to
> learn wrong mac - port association in the first place which causes this
> network drop? Doesn't the expiration of 300s comes after this? 
> 

The MAC address at the request comes as this: 54:7f:ee:a2:e2:c1 > fa:16:3e:7f:29:25. It's learning by the source address, 54:7f:ee:a2:e2:c1 that this comes from eth3, port 6. Then, suddenly, it appears at eth4, port 5. Since they belong to a balance-slb bond, it's dropped, because it expects it from port 6, eth3.

That's how balance-slb operates, to prevent loops where packets sent to a given port may appear at the other, for example. So, it's not that it has learned the wrong MAC/port association. It's just that it has learned that the other port was used, and then, a different port is now used and the entry needs to expire.


> > There is a way to change that time. Can you experiment and change it to
> > 120s, for example, and see if the time it takes for ping to respond changes?
> > If that does, you can find an acceptable value that won't flood your network
> > too much.
> 
> I can test this. But does this solve the network issue instead of reducing
> the 300 seconds network outage to 120 seconds on each time it learns the
> wrong port for the mac address?
> 
> Or did I misunderstand anything here?
> 

It will just reduce the outage. That's the limitation with using balance-slb. You can reduce it further, but observe if that does not cause any other problems on your network.

And of course, if you are using the bonding for throughput and not availability and one of the ports is flappy, you need to fix that, which is the root cause for your problems.

Anyway, if the test works, we can confirm the theory and discuss the problem further.

Thanks.
Cascardo.

> > 
> > You may set the parameter like this:
> > 
> > ovs-vsctl set bridge br-ex other_config:mac-aging-time=120
> > 
> > Cascardo.
Comment 11 Sadique Puthen 2015-12-09 14:37:36 EST
(In reply to Thadeu Lima de Souza Cascardo from comment #10)
> (In reply to Sadique Puthen from comment #9)

> The MAC address at the request comes as this: 54:7f:ee:a2:e2:c1 >
> fa:16:3e:7f:29:25. It's learning by the source address, 54:7f:ee:a2:e2:c1
> that this comes from eth3, port 6. Then, suddenly, it appears at eth4, port
> 5. Since they belong to a balance-slb bond, it's dropped, because it expects
> it from port 6, eth3.
> 
> That's how balance-slb operates, to prevent loops where packets sent to a
> given port may appear at the other, for example. So, it's not that it has
> learned the wrong MAC/port association. It's just that it has learned that
> the other port was used, and then, a different port is now used and the
> entry needs to expire.

My understanding was that, since we do ovs bonding and native bond added to ovs, ovs is expected to identify the same packet came through eth4, port5 and ignore it.

Does this all say balance-slb is incompatible with Openvswitch bridges and they cannot work together?  
 
> It will just reduce the outage. That's the limitation with using
> balance-slb. You can reduce it further, but observe if that does not cause
> any other problems on your network.

I will test this during IST time once customer comes online, but reducing the outage to even 2 seconds is not an acceptable solution for this project as it's a mission critical workload.

> And of course, if you are using the bonding for throughput and not
> availability and one of the ports is flappy, you need to fix that, which is
> the root cause for your problems.

I am unable to get what you mean by one of the ports is flappy. How can I prove this to customer?
 
> Anyway, if the test works, we can confirm the theory and discuss the problem
> further.
> 
> Thanks.
> Cascard
Comment 12 Sadique Puthen 2015-12-09 14:39:51 EST
Correction:

My understanding was that, since we do ovs bonding and native bond added to ovs, ovs is expected to identify the same packet came through eth4, port5 and ignore it.

Read as:

My understanding was that, since we do ovs bonding and not a native bond added to ovs bridge, ovs is expected to identify the same packet came through eth4, port5 and ignore it.
Comment 13 Dan Sneddon 2015-12-09 14:48:12 EST
(In reply to Sadique Puthen from comment #12)
> Correction:
> My understanding was that, since we do ovs bonding and not a native bond
> added to ovs bridge, ovs is expected to identify the same packet came
> through eth4, port5 and ignore it.

I believe this is correct. OVS balance-slb mode will choose one member of the bond for a VLAN at a time. It will only send frames from that VLAN out the chosen member, and will block frames sent to that VLAN on other members of the bond.

OVS will also rebalance the VLANs, which may cause a VLAN to switch from one port to another. OVS should not do this so often that the switch registers it as a port flap. The settings can be tuned on both the switch and in the OVS bond.

In order to change the timing of the balance-slb rebalancing, use the following options:

"bond-mode=balance-slb other_config:bond-rebalance-interval=10000"

(Where 10000 is the number of milliseconds between rebalancing)

This can be disabled by setting the bond-rebalance-interval to 0, in which case no rebalancing will occur.
Comment 14 Sadique Puthen 2015-12-09 15:01:24 EST
Dan,

One more question. If we disable rebalancing completely with multiple provider vlans through the same bridge, do we get them equally distributed between interfaces to get load balancing? Eg, If there are four vlans, then vlan1 and 2 always on eth3. vlan3 and 4 always on eth4. Or can they all end up on the  same interface? If the former, disabling rebalancing may be an acceptable solution for this customer.
Comment 15 Dan Sneddon 2015-12-09 15:15:03 EST
The behavior of OVS balance-slb with rebalancing disabled hasn't been evaluated. My guess is that it it would assign the VLANs round robin, so vlan 1 -> member 1, vlan2 -> member 2, etc.
Comment 16 Thadeu Lima de Souza Cascardo 2015-12-09 15:36:08 EST
Sadique, what does the topology look like from the other end, that is, from 172.16.13.51 to the machine running OVS, what network setup do you have on that guest/host, what bridges, bonds, interfaces and switches you have until you get to the OVS bond?

We can certainly see that something is causing the traffic to change from coming from one port to coming from the other port. The destination MAC address is the same. So, why is the switch sending it through the other port? I am looking at the pcap files to see if I find a clue. But knowing the topology would help.

Thanks.
Cascardo.
Comment 17 Sadique Puthen 2015-12-10 02:16:40 EST
At this time customer has switched to active-backup to resume testing of their application. This happened before we could suggest disabling rebalancing.

They are unable to take a downtime before next Wednesday to test our suggestions and continue using balance-slb. They will disable rebalancing on next Wednesday and provide feedback.

We are trying to get their network topology to understand this better. I will reduce severity of this bz for now.
Comment 18 Sadique Puthen 2015-12-18 07:37:17 EST
Dan,

Customer has done enough tests now with 'bond-rebalance-interval=0' and haven't seen any issues with the bond.

Do we need to make this (disable reblancing) the default config with Director deployed overcloud? If no, what is the right solution here that does never cause packet drop?
Comment 19 Sadique Puthen 2015-12-21 09:40:19 EST
Dan,

Comments? Let us make sure that director disables re balancing so that we don't nee to troubleshoot the same problem with another strategic customer who may hit this during deployment.
Comment 20 Dan Sneddon 2015-12-21 15:30:58 EST
(In reply to Thadeu Lima de Souza Cascardo from comment #16)
> Sadique, what does the topology look like from the other end, that is, from
> 172.16.13.51 to the machine running OVS, what network setup do you have on
> that guest/host, what bridges, bonds, interfaces and switches you have until
> you get to the OVS bond?
> 
> We can certainly see that something is causing the traffic to change from
> coming from one port to coming from the other port. The destination MAC
> address is the same. So, why is the switch sending it through the other
> port? I am looking at the pcap files to see if I find a clue. But knowing
> the topology would help.
> 
> Thanks.
> Cascardo.

The way balance-slb balances traffic is to divide the VLANs assigned to the bond among the available members of the bond. Every X milliseconds (determined by the value of "other-config:bond-rebalance-interval=<X>" in the ovs_options: section of the bonding config), the bond will rebalance, causing outbound packets to potentially change from one bond slave to another. The switch will see this as the hosts on that VLAN moving from one port to another. This shouldn't cause any packet loss, since OVS continues to allow inbound traffic on any slave.

It sounds to me like in this case the rebalancing was happening fast enough that the switch thought there was port flapping. A host moving back and forth between 2 or more ports is often a sign of a loop or other problem, so some switches see this as an error condition and may try to mitigate the perceived problem. Perhaps the switches can be reconfigured to be less sensitive about what it considers a port flap, or perhaps we could increase the bond-rebalance-interval so that rebalancing occurred less often.

Sadique, what make and model of switches are in use here? I can try to look up the configuration for port flap detection.

I don't know that we want to make changes on the customer network to these settings, especially since bond-rebalance-interval set to zero seems to be working. We should certainly do more testing ourselves of the balance-slb mode.
Comment 22 Sadique Puthen 2015-12-22 02:16:37 EST
Dan,

> It sounds to me like in this case the rebalancing was happening fast enough
> that the switch thought there was port flapping. A host moving back and
> forth between 2 or more ports is often a sign of a loop or other problem, so
> some switches see this as an error condition and may try to mitigate the
> perceived problem. Perhaps the switches can be reconfigured to be less
> sensitive about what it considers a port flap, or perhaps we could increase
> the bond-rebalance-interval so that rebalancing occurred less often.

To reiterate, it may not be the physical switch at fault here and the flapping does not happen on the switch port end. The flapping happens inside ovs bridge br-ex. The instance mac address need to be mapped into the port number of phy-br-ex inside br-ex bridge. When rebalancing happens, it's the instance mac address association flaps to one of the slave interface port number from phy-br-ex port number.

Shouldn't ovs bridge be intelligent enough to understand there is a rebalancing happens and I should not be flapping mac -> port association of an instance to the slave interface?
 
> Sadique, what make and model of switches are in use here? I can try to look
> up the configuration for port flap detection.

Those two slaves are virtual nics on a Cisco ucs connected to cisco fabric and then to physical switches. I have asked more details from consultant to better understand it.

> I don't know that we want to make changes on the customer network to these
> settings, especially since bond-rebalance-interval set to zero seems to be
> working. We should certainly do more testing ourselves of the balance-slb
> mode.

We cannot change this in customer environment as he is running his production with balance-slb disabled and he is happy with that.

Abhilash, did you get the network diagram for these ucs blades that we requested sometime back?
Comment 24 Flavio Leitner 2015-12-23 14:08:19 EST
(In reply to Sadique Puthen from comment #0)
[...]
> We identified that this is due to port flapping for the test instance mac
> address on the br-ex bridge.
> 
> Instance mac address is fa:16:3e:7f:29:25. ovs-ofctl show br-ex shows port
> mac association as below while working.
> 
> 4 - phy-br-ex
> 5 - eth4
> 6 - eth3
> 
> 4 133 fa:16:3e:7f:29:25
> 
> When it does not work, it shows as below.
> 
> 5 133 fa:16:3e:7f:29:25
> 
> When we kept a watch on "ovs-appctl fdb/show br-ex", we saw the port keeps
> flapping to 5 when it stops working and to 4 when it resumes working.

The instances MAC address should not flap at all regardless of the SLB re-balancing because the instance itself is not moving.  Of course, if the instance's MAC address moves to port 5 (eth4), that instance becomes unreachable until its MAC entry gets flushed out of the table.

What seems to be happening is that a looped back packet on eth4 is coinciding with MAC table flush/entry expiration.  When that happens, the OVS bond is vulnerable to learn the instance's MAC from that looped back packet on eth4, which causes the issue.

Can we monitor the bridge's fdb to see if it's flushing or expiring at the time of the swap?

Thanks,
fbl
Comment 25 Sadique Puthen 2015-12-24 02:26:19 EST
 
> Can we monitor the bridge's fdb to see if it's flushing or expiring at the
> time of the swap?
> 
> Thanks,
> fbl

I am not sure how to monitor this. What we saw is it's flapping when we keep watching it after every 5 minutes and then flaps back to the correct port after the same amount of time. I am not sure when it flaps when it expires. What we saw is it flaps when the rebalancing happens and disabling rebalancing has fixed it. We cannot do further tests at this time on customer environment as it's moved to production with rebalancing disabled.
Comment 26 Sadique Puthen 2015-12-24 04:31:47 EST
> Can we monitor the bridge's fdb to see if it's flushing or expiring at the
> time of the swap?
> 
> Thanks,
> fbl

You can watch the Bomgar playback at https://remotesupport.redhat.com/session_download?lsid=l%3D5ebf669a48204599b9879569cba3f179%3Bh%3D088dc96124b0b6bd7cf14c2f47edf3df16e1df75%3Bt%3Dsd%3Bm%3Drecording&dl_action=recording&view=1

Start from 21:00 and from 50:00 to see what happens.
Comment 28 Flavio Leitner 2016-01-05 15:43:17 EST
Hi,

I think turning rebalacing off is just masking the root cause because rebalancing just means TXing on a specific slave. It shouldn't cause any disruption at all.

However, a gratuitous ARP can poison the forwarding db(fdb) causing the whole issue. Therefore, my suggestion is to look at the traffic dump from each bond slaves and check for ARP being received right before reproduce the issue.  I am quite sure you will find one in eth4.

If so, we need to find the source of the packet.  OVS bond sends those packets out when the active slave changes for instance - either by command line or because the link goes down, but OVS bond blocks looped back ARP packets during 5 seconds to avoid exactly this problem.  Anyway, you can monitor active slave using ovs-appctl bond/show command to see if it happens or not at least.

But there are other few reasons unrelated to OVS bond to send out a gratuitous ARP packet.  We need to trace that down.  Since disabling rebalacing seems to mask the issue, maybe a genuine ARP packet looped back on a different networking path during some time in the networking poisoning the fdb.

Another alternative way to confirm the issue is to simply add a flow to drop ARP coming on eth4 using the instance MAC address. I guess that is what OSP ARP spoofing protection is about, not sure though.

Thanks
fbl
Comment 30 Flavio Leitner 2016-01-28 07:18:34 EST
Hi Sadique,

Can you update us with the current status?
Thanks,
fbl
Comment 31 Sadique Puthen 2016-04-19 01:14:18 EDT
At this time, I am not sure customer is going to allow us to do further troubleshooting as they have moved their full production with rebalancing turned off.

Note You need to log in before you can comment on or make changes to this bug.