Bug 1279161

Summary: Bridge is not forwarding frames towards a connected tap device
Product: Red Hat Enterprise Linux 6 Reporter: Ido Barkan <ibarkan>
Component: kernelAssignee: Jakub Sitnicki <jsitnick>
kernel sub component: Bonding QA Contact: Amit Supugade <asupugad>
Status: CLOSED DUPLICATE Docs Contact:
Severity: high    
Priority: high CC: aloughla, atragler, danken, ibarkan, jarod, jpirko, jsitnick, kzhang, mleitner, network-qe, rkhan, sukulkar, tgraf, tspeetje, vyasevic, yliberma
Version: 6.7   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1288828 1408958 (view as bug list) Environment:
Last Closed: 2016-11-17 16:04:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1288828, 1370193    
Attachments:
Description Flags
binary tcpdump output
none
trace for eth0 (first bond member)
none
trace for eth1 (second bond member) none

Description Ido Barkan 2015-11-08 09:47:29 UTC
Description of problem:
A virtual machine cannot connect to the outside of the host via bridge
The vm is connected to the physical NIC of the host via a bridge that is known to libvirt. The bridge will forward arp requests of the vm, but will not forward back to arp replies.
Following is the network configutration on the host and then, tcpdum on the tap device and the bridge (10.35.16.244 is the guest ip).

network configuration of the host:
[root@ucs1-b200-2 ~]# virsh -r list
 Id    Name                           State
----------------------------------------------------
 4     jenkins-automation-rpm-vm32    running
 
 [root@ucs1-b200-2 ~]# ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP qlen 1000
    link/ether 00:25:b5:0a:00:09 brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP qlen 1000
    link/ether 00:25:b5:0a:00:09 brd ff:ff:ff:ff:ff:ff
4: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
    link/ether 00:25:b5:0a:00:09 brd ff:ff:ff:ff:ff:ff
5: rhevm: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN 
    link/ether 00:25:b5:0a:00:09 brd ff:ff:ff:ff:ff:ff
7: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN 
    link/ether 12:10:42:3b:9e:1b brd ff:ff:ff:ff:ff:ff
11: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 500
    link/ether fe:1a:4a:23:12:a0 brd ff:ff:ff:ff:ff:ff

[root@ucs1-b200-2 ~]# brctl show
bridge name	bridge id		STP enabled	interfaces
;vdsmdummy;		8000.000000000000	no		
rhevm		8000.0025b50a0009	no		bond0
            							vnet0
            							
[root@ucs1-b200-2 ~]# ip -4 a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    inet 127.0.0.1/8 scope host lo
5: rhevm: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN 
    inet 10.35.19.149/22 brd 10.35.19.255 scope global rhevm
[root@ucs1-b200-2 ~]# brctl show

[root@ucs1-b200-2 ~]# brctl showmacs rhevm | grep fe:1a:4a:23:12:a0
  2	fe:1a:4a:23:12:a0	yes		   0.00
  

[root@ucs1-b200-2 ~]# tcpdump -n -i vnet0 "(host 10.35.16.244) and (icmp or arp)"
tcpdump: WARNING: vnet0: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vnet0, link-type EN10MB (Ethernet), capture size 65535 bytes
11:12:11.943033 ARP, Request who-has 10.35.19.120 tell 10.35.16.244, length 28
11:12:11.943065 ARP, Request who-has 10.35.19.120 tell 10.35.16.244, length 42
11:12:12.942992 ARP, Request who-has 10.35.19.120 tell 10.35.16.244, length 28
11:12:12.943022 ARP, Request who-has 10.35.19.120 tell 10.35.16.244, length 42
11:12:13.057004 ARP, Request who-has 10.35.19.254 tell 10.35.16.244, length 28
11:12:13.057037 ARP, Request who-has 10.35.19.254 tell 10.35.16.244, length 42
11:12:13.943049 ARP, Request who-has 10.35.19.120 tell 10.35.16.244, length 28

XXXXXX

[root@ucs1-b200-2 ~]# tcpdump -n -i rhevm "(host 10.35.16.244) and (icmp or arp)"
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on rhevm, link-type EN10MB (Ethernet), capture size 65535 bytes
11:12:50.072067 ARP, Request who-has 10.35.19.254 tell 10.35.16.244, length 28
11:12:50.072094 ARP, Request who-has 10.35.19.254 tell 10.35.16.244, length 42
11:12:50.072495 ARP, Reply 10.35.19.254 is-at 00:00:0c:07:ac:00, length 46
11:12:50.535085 ARP, Request who-has 10.35.19.120 tell 10.35.16.244, length 28
11:12:50.535106 ARP, Request who-has 10.35.19.120 tell 10.35.16.244, length 42
11:12:50.535372 ARP, Reply 10.35.19.120 is-at 00:1a:4a:23:13:cb, length 42



Version-Release number of selected component (if applicable):
[root@ucs1-b200-2 ~]# uname -r
2.6.32-573.7.1.el6.x86_64
[root@ucs1-b200-2 ~]# rpm -q libvirt
libvirt-0.10.2-54.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. migrate a guest to this host
2. try to ping 8.8.8 from the guest

Actual results:
no outgoing (nor incoming) connectivity

Expected results:
the bridge should have L2 connectivity


Additional info:
* ping directly to the bridge ip succeeds.
* the host is a UCS host (but as I see it, the problem is the bridge itself. and since the bridge sits on a bond, it does not even know about the physical nics).

Comment 2 Marcelo Ricardo Leitner 2015-11-09 17:39:20 UTC
Which bond mode are you using? Please ensure it's either load balance or LACP.
ARP replies are destined to original requester MAC but some bond modes will overwrite src mac for load balancing, which would cause the bridge to not forward the packets back to the guest.
(https://bugzilla.redhat.com/show_bug.cgi?id=1264029)

Comment 3 Ido Barkan 2015-11-12 05:53:55 UTC
(In reply to Marcelo Ricardo Leitner from comment #2)
> Which bond mode are you using? Please ensure it's either load balance or
> LACP.
> ARP replies are destined to original requester MAC but some bond modes will
> overwrite src mac for load balancing, which would cause the bridge to not
> forward the packets back to the guest.
> (https://bugzilla.redhat.com/show_bug.cgi?id=1264029)

[root@ucs1-b200-2 ~]# cat /sys/class/net/bond0/bonding/mode 
802.3ad 4

This is LACP right?

Comment 4 Ido Barkan 2015-11-12 05:56:30 UTC
The guest kernel is 2.6.32-504el6.x86_64

Comment 5 Marcelo Ricardo Leitner 2015-11-12 11:42:08 UTC
(In reply to Ido Barkan from comment #3)
> [root@ucs1-b200-2 ~]# cat /sys/class/net/bond0/bonding/mode 
> 802.3ad 4
> 
> This is LACP right?

Yes, that's LACP, good.

Please attach the binary captures for rhevm and vnet0 for the two tests, pinging rhevm address directly and also an external one. Try to get both captures at the same time.

Comment 6 Ido Barkan 2015-11-15 09:00:01 UTC
also supplying info for vyasevic (also added to cc) requested by email:
> Actually since you mentioned that you are using UCS, there are
> vlans involved.  UCS will by default tag any 'untagged' traffic
> with vlan id 0.  This used to cause us all sorts of problems in
> rhel6 and vlan code had to be updated to handle it properly...
> 
> As a pure work-around, you might be able simply 'modprobe 8021q'
> to make this work.

Host:
[root@ucs1-b200-2 ~]# lsmod | grep 8021q
8021q                  20362  0

Guest:
on the guest this did not solve the problem.

Comment 7 Ido Barkan 2015-11-15 09:18:15 UTC
(In reply to Marcelo Ricardo Leitner from comment #5)
> (In reply to Ido Barkan from comment #3)
> > [root@ucs1-b200-2 ~]# cat /sys/class/net/bond0/bonding/mode 
> > 802.3ad 4
> > 
> > This is LACP right?
> 
> Yes, that's LACP, good.
> 
> Please attach the binary captures for rhevm and vnet0 for the two tests,
> pinging rhevm address directly and also an external one. Try to get both
> captures at the same time.
adding the results of:
[root@ucs1-b200-2 ~]# tcpdump -n -i rhevm -w - "(host 10.35.16.244) and (icmp or arp)" | tee /tmp/trace.txt

- while running from the guest:
'ping 10.35.19.149 & ping 8.8.8.8'

Comment 8 Ido Barkan 2015-11-15 09:20:41 UTC
Created attachment 1094352 [details]
binary tcpdump output

Host( recording traffic):
[root@ucs1-b200-2 ~]# tcpdump -n -i rhevm -w - "(host 10.35.16.244) and (icmp or arp)" | tee /tmp/trace.txt

Guest (generating traffic): 
'ping 10.35.19.149 & ping 8.8.8.8'

Comment 9 Vlad Yasevich 2015-11-17 11:50:24 UTC
(In reply to Ido Barkan from comment #4)
> The guest kernel is 2.6.32-504el6.x86_64

That's is the base rhel6.6 kernel that has vlan issues.  If you look at
the arp packets you've provided (attachment 1094352 [details]), you'll see that they
are in fact tagged with vlan id 0.  Loading the vlan module on the guest
will work around this issue, but I think there might still be other issues.

You might consider updating the guest kernel.

-vlad

Comment 10 Marcelo Ricardo Leitner 2015-11-17 12:18:12 UTC
I'm thinking a bit differently. For each ARP request, we have 3 packets in there:

a.The request itself from the guest, 42 bytes long
b.The request again, 60 bytes long
c.The reply, 64 bytes long.

Considering it was captured only in rhvm bridge, this packet on B shouldn't exist and it's size and timing makes me believe that this is either the NIC mirroring the packet back at us, or there is a network loop somewhere in there.

https://access.redhat.com/solutions/750553
https://access.redhat.com/solutions/774743

As the bridge is being fooled, it's thinking that the guest is on the branch that the reply came from, so it does nothing/no forwarding.

You may confirm this situation by doing a capture on the NIC itself this time. (The captures so far were either rhevm or vnet0, which doesn't cover this situation.)

Note that this doesn't exclude this possible issue with processing vlan 0.

  Marcelo

Comment 11 Vlad Yasevich 2015-11-17 14:42:05 UTC
I think Marcello is right.  If I had to guess, I'd say there is something wrong
with the bond.  I am thinking that initial arp is being looped back to the other
bond port and thus fools the bridge.  You could try removing one of the devices from the bond and see if connectivity is restored.  You may still vlan 0 issues on the guest is you are truly running 2.6.32-504.el6.x86_64.

-vlad

Comment 12 Ido Barkan 2015-11-18 07:29:26 UTC
Created attachment 1095876 [details]
trace for eth0 (first bond member)

trying to confirm Marcello's theory from Comment 10:

on the host:
[root@ucs1-b200-2 ~]# tcpdump -n -i eth0 -w - "(host 10.35.16.244) and (icmp or arp)" > /tmp/trace-eth0.txt

on the guest:
# ping 10.35.19.149 & ping 8.8.8.8

Comment 13 Ido Barkan 2015-11-18 07:31:04 UTC
Created attachment 1095877 [details]
trace for eth1 (second bond member)

trying to confirm Marcello's theory from Comment 10:

on the host:
[root@ucs1-b200-2 ~]# tcpdump -n -i eth1 -w - "(host 10.35.16.244) and (icmp or arp)" > /tmp/trace-eth1.txt

on the guest:
# ping 10.35.19.149 & ping 8.8.8.8

Comment 14 Ido Barkan 2015-11-18 07:39:16 UTC
(In reply to Vlad Yasevich from comment #9)
> (In reply to Ido Barkan from comment #4)
> > The guest kernel is 2.6.32-504el6.x86_64
> 
> That's is the base rhel6.6 kernel that has vlan issues.  If you look at
> the arp packets you've provided (attachment 1094352 [details]), you'll see
> that they
> are in fact tagged with vlan id 0.  Loading the vlan module on the guest
> will work around this issue, but I think there might still be other issues.
> 
> You might consider updating the guest kernel.
> 
> -vlad

Vlad, I already tried to modprobe 8021q which did not remedy the issue (in Comment 6), but that is maybe because we have 2 issues here- duplicated packets + vlan tag 0.

But assuming you guys confirm that the bond/nic is mirroring the arp packets back at the bridge (using the additional physical NIC traces), what is the workaround for the packet mirroring?
 - change the bond mode? remove the bond? updated kernel/drivers?

Comment 15 Marcelo Ricardo Leitner 2015-11-18 12:32:31 UTC
Cool, yes, the arp request is being reflected by the switch or the NIC.
Did this server ever worked? Sounds like a bad switch config, its ports are not grouped. (Or that SR-IOV thing I shared earlier)
If you switch to active/backup, it should work, as bonding will drop all incoming packets from inactive slave.
Before that, please provide cat /proc/net/bonding/bondX
it will contain information that allow us to diagnose the LACP link.

Comment 16 Vlad Yasevich 2015-11-18 14:27:46 UTC
(In reply to Ido Barkan from comment #14)
> (In reply to Vlad Yasevich from comment #9)
> > (In reply to Ido Barkan from comment #4)
> > > The guest kernel is 2.6.32-504el6.x86_64
> > 
> > That's is the base rhel6.6 kernel that has vlan issues.  If you look at
> > the arp packets you've provided (attachment 1094352 [details]), you'll see
> > that they
> > are in fact tagged with vlan id 0.  Loading the vlan module on the guest
> > will work around this issue, but I think there might still be other issues.
> > 
> > You might consider updating the guest kernel.
> > 
> > -vlad
> 
> Vlad, I already tried to modprobe 8021q which did not remedy the issue (in
> Comment 6), but that is maybe because we have 2 issues here- duplicated
> packets + vlan tag 0.

In comment 6 you mentioned that you did this on the host.  What I am saying is that after we figure out bonding issue, you may have to do this in the guest
or upgrade the guest kernel.

> 
> But assuming you guys confirm that the bond/nic is mirroring the arp packets
> back at the bridge (using the additional physical NIC traces), what is the
> workaround for the packet mirroring?
>  - change the bond mode? remove the bond? updated kernel/drivers?

You have a few options:
 1) Remove one of the bond members.  That would resolve the issue, but you'd
    loose fault tolerance.
 2) Change bond mode.  Mode 1 (active-backup) should work correctly.
 3) Try to understand why the issue persists, by providing all the information
    from /proc/net/bonding/bond0 (like Marcelo asked).  We don't really know if
    this is something that's wrong in the bond driver or not.

-vlad

Comment 17 Ido Barkan 2015-11-19 04:51:17 UTC
(In reply to Marcelo Ricardo Leitner from comment #15)
> Cool, yes, the arp request is being reflected by the switch or the NIC.
> Did this server ever worked? Sounds like a bad switch config, its ports are
> not grouped. (Or that SR-IOV thing I shared earlier)
> If you switch to active/backup, it should work, as bonding will drop all
> incoming packets from inactive slave.
> Before that, please provide cat /proc/net/bonding/bondX
> it will contain information that allow us to diagnose the LACP link.

[root@ucs1-b200-2 ~]# cat /proc/net/bonding/bond0 
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
	Aggregator ID: 1
	Number of ports: 1
	Actor Key: 1
	Partner Key: 1
	Partner Mac Address: 00:00:00:00:00:00

Slave Interface: eth0
MII Status: up
Speed: 10240 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:25:b5:0a:00:09
Aggregator ID: 1
Slave queue ID: 0

Slave Interface: eth1
MII Status: up
Speed: 10240 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:25:b5:0b:00:09
Aggregator ID: 2
Slave queue ID: 0

Comment 18 Marcelo Ricardo Leitner 2015-11-19 12:19:54 UTC
Those "Aggregator ID"s should be matching. It's saying that the switch thinks each port belongs to a different aggregated port, which is not what bonding is expecting. This is very likely a switch config issue. Please engage which whoever maintains the switch and ask to check that.


Vlad, on bonding side, we could detect such issue and print a warning on kernel log and reject packets from the slaves that aren't using the right AggID (by right read: the first one to come up). What do you think?

Comment 19 Ido Barkan 2015-11-23 11:06:17 UTC
Hi guys, thanks for you findings:
So we have done a few tests:

1- changed bond to mode 1 (to stop a possible loop)
-> el6 guest: no connectivity

2- modprobe 8021q in el6 guest
-> el6 guest: connectivity restored!
-> el7 guest (just migrated to the host): has connectivity!
* el7 guest kernel is 3.10.0-123.el7.x86_64

3- el6 guest: rmmod 8021q
-> el6 guest: connectivity broken
* until now: there are no surprises it corresponds directly to the 2 problems (0 vlan tag + a network loop)

4- restored bond to mode 4 +  modprobe 8021q in el6 guest
-> el6 guest: connectivity _still_ working
-> el7 guest _still_ has connectivity
* this is a surprise!

5- leaving the setup some time (a day or so)
-> el6 guest: no connectivity
-> el7 guest: _still_ has connectivity

* this is what I expected, but I don't understand still how this is time
  related. Maybe i has something with learning/aging in bridge tables or
  the network loop takes some time to appear.
* and how come the el7 guest still operates after there is a loop?

Comment 20 Marcelo Ricardo Leitner 2015-11-23 15:47:45 UTC
Please s/network loop/bad switch config/ . The loop ends up happening causes because the switch thinks that each of your NICs belong to a different bonding, so that both ends up receiving broadcasts, even those generated by this system.

Comment 21 Vlad Yasevich 2015-11-24 19:03:54 UTC
(In reply to Ido Barkan from comment #19)
> 
> 5- leaving the setup some time (a day or so)
> -> el6 guest: no connectivity
> -> el7 guest: _still_ has connectivity
> 
> * this is what I expected, but I don't understand still how this is time
>   related. Maybe i has something with learning/aging in bridge tables or
>   the network loop takes some time to appear.
> * and how come the el7 guest still operates after there is a loop?

Can you provide the info from proc/net/bonding/bond0 on rhel7?

-vlad

Comment 22 Yaniv Liberman 2015-11-26 11:31:28 UTC
Hi Vlad,

Bonding is configured only in the host, not in the guests.

Comment 23 Marcelo Ricardo Leitner 2015-11-27 11:39:28 UTC
Yaniv, I think Vlad wants to check how bonding negotiated the aggregation during rhel7 test. Note that even if the aggregation is broken, there is a time dependency on it. Just like the broadcast packets from the switch confuses the bridge, any packet from the guest will fix it by "confusing" it again. That is, if the unwanted broadcast gets in late, it's not a problem. Also, if you stop issuing broadcasts for any reason, it won't trigger the issue..

Anyway, have you got in contact with the switch administrator to fix that config?
And RHEL6 kernel, did you update it, so that you wouldn't need to load 8021q module?

Comment 25 Zhenjie Chen 2016-01-20 02:12:06 UTC
Seems a dup of Bug 1258446 - [RHEL6.7][kernel][bonding][bridging] KVM virt-guests can no longer pxe boot. 

kernel-2.6.32-573.8.1.el6 should fixed the issue.

Comment 26 Yaniv Liberman 2016-01-20 09:38:49 UTC
(In reply to Zhenjie Chen from comment #25)
> Seems a dup of Bug 1258446 - [RHEL6.7][kernel][bonding][bridging] KVM
> virt-guests can no longer pxe boot. 
> 
> kernel-2.6.32-573.8.1.el6 should fixed the issue.

I see.

Thank you for the input.

I'll look into this, and let you know if it worked.

Comment 27 Marcelo Ricardo Leitner 2016-01-27 11:19:26 UTC
Hi, please reply to comment 23 and 26 when possible. Thanks

Comment 28 Yaniv Liberman 2016-01-28 07:48:01 UTC
(In reply to Marcelo Ricardo Leitner from comment #23)
> Yaniv, I think Vlad wants to check how bonding negotiated the aggregation
> during rhel7 test. Note that even if the aggregation is broken, there is a
> time dependency on it. Just like the broadcast packets from the switch
> confuses the bridge, any packet from the guest will fix it by "confusing" it
> again. That is, if the unwanted broadcast gets in late, it's not a problem.
> Also, if you stop issuing broadcasts for any reason, it won't trigger the
> issue..
> 
> 1. Anyway, have you got in contact with the switch administrator to fix that
> config?
> 2. And RHEL6 kernel, did you update it, so that you wouldn't need to load 8021q
> module?

1. Yes, we reconfigured the relevant UCS servers to 1 NIC and enabled fabric failover, which means that whenever 1 NIC dies, the other one will take its place immediately. According to the UCS technician, this doesn't affect performance all too much. The UCS servers were reinstalled to RHEL 6.7, network drivers have been updated, bonding was removed.

2. I updated the kernel version in RHEL 6.7 to what Zhenjie Chen (comment 25) suggested, version 2.6.32-573.8.1.el6 (including the kernel-firmware package).

(In reply to Marcelo Ricardo Leitner from comment #27)
> Hi, please reply to comment 23 and 26 when possible. Thanks

Done.

After the reconfiguration and upgrades we've made, we're still testing to see if this solved the problems we were experiencing.

All we have to do now is to reactivate them in RHEV-TLV.

Reference: https://engineering.redhat.com/rt/Ticket/Display.html?id=388168

I'll update you guys whenever there's anything significant to update.

Comment 29 Yaniv Liberman 2016-01-31 13:50:10 UTC
Hey guys,

In continuation to comments 26 and 28, I upgraded the kernel and kernel-firmware versions to version 2.6.32-573.12.1.el6.x86_64 in all 3 UCS servers (I couldn't find kernel-2.6.32-573.8.1.el6.x86_64 anywhere, and besides 2.6.32-573.12.1.el6.x86_64 is a newer version so I don't think there'd be a problem).

The network in the UCS servers was reconfigured, bonding was removed, network drivers were updated (more details in comment 28).

The problem still reoccurs.

RHEL 6.7 and below VMs lose connectivity when migrated to the UCS servers, but RHEL 7.0 and above VMs still maintain their connectivity.

The OS in the UCS servers is RHEL 6.7.

I then migrated the 2 VMs (RHEL 6.6 and RHEL 7.0) to a different host, a Dell PowerEdge C6220, where the OS is RHEL 6.7, too, and the kernel version is 2.6.32-573.8.1.el6.x86_64, and both VMs maintain their connectivity there.

I have no idea what's going on at this point... I was told by the UCS technician that was here to reconfigure the UCS servers and remove the bonding config that the firmware versions in our UCS setup (servers, fabric interconnects, etc.) are very outdated, so maybe that's the root cause for these problems... I can't think of anything else, really, even though it was, AFAIR, working properly before.

A RHEL 6.x<->UCS incompatibility, perhaps?

Please let me know your thoughts on this.

Comment 31 Vlad Yasevich 2016-03-08 11:53:41 UTC
(In reply to Yaniv Liberman from comment #29)
> Hey guys,
> 
> In continuation to comments 26 and 28, I upgraded the kernel and
> kernel-firmware versions to version 2.6.32-573.12.1.el6.x86_64 in all 3 UCS
> servers (I couldn't find kernel-2.6.32-573.8.1.el6.x86_64 anywhere, and
> besides 2.6.32-573.12.1.el6.x86_64 is a newer version so I don't think
> there'd be a problem).
> 

You state that the kernel was upgraded on UCS servers.  What about the VMs?
Are they still running old rhel6 kernels or were they upgraded to newer -573.12.1 kernels as well.

The issue with UCS is that it is notorious for adding vlan id 0 to packets
that are sent to the host.  If such packets are for the host itself, to consume,
the vlan id 0 has no impact.  If, however, these packets are to be forwarded to the VM, this vlan id 0 is forwarded.  This is where old rhel6 kernels have issues.  There are 2 ways to solve them:
   1) load the 802.1q module on the guest so that vlan header can be processed
      correctly.  without this module loaded, the vlan header remain in the
      packet and cause packets to be dropped.
   2) upgrade the kernel in VM.  There was a large effort to improve VLAN
      vlan handling in rhel6.  As a result, with newer kernels, you no longer
      need to load the module to correctly process VLAN 0 packets.


Hope this explains why when you migrate your VMs outside of UCS environment, everything works as it should.

Comment 33 Jakub Sitnicki 2016-09-02 12:53:16 UTC
(In reply to Vlad Yasevich from comment #31)
> (In reply to Yaniv Liberman from comment #29)
> > Hey guys,
> > 
> > In continuation to comments 26 and 28, I upgraded the kernel and
> > kernel-firmware versions to version 2.6.32-573.12.1.el6.x86_64 in all 3 UCS
> > servers (I couldn't find kernel-2.6.32-573.8.1.el6.x86_64 anywhere, and
> > besides 2.6.32-573.12.1.el6.x86_64 is a newer version so I don't think
> > there'd be a problem).
> > 
> 
> You state that the kernel was upgraded on UCS servers.  What about the VMs?
> Are they still running old rhel6 kernels or were they upgraded to newer
> -573.12.1 kernels as well.

Yaniv, do you happen to have the info on the kernel the VMs are/were running that Vlad was asking for?

Comment 34 Yaniv Liberman 2016-09-06 07:54:35 UTC
(In reply to Jakub Sitnicki from comment #33)
> (In reply to Vlad Yasevich from comment #31)
> > (In reply to Yaniv Liberman from comment #29)
> > > Hey guys,
> > > 
> > > In continuation to comments 26 and 28, I upgraded the kernel and
> > > kernel-firmware versions to version 2.6.32-573.12.1.el6.x86_64 in all 3 UCS
> > > servers (I couldn't find kernel-2.6.32-573.8.1.el6.x86_64 anywhere, and
> > > besides 2.6.32-573.12.1.el6.x86_64 is a newer version so I don't think
> > > there'd be a problem).
> > > 
> > 
> > You state that the kernel was upgraded on UCS servers.  What about the VMs?
> > Are they still running old rhel6 kernels or were they upgraded to newer
> > -573.12.1 kernels as well.
> 
> Yaniv, do you happen to have the info on the kernel the VMs are/were running
> that Vlad was asking for?

I don't. Sorry.

Comment 35 Jakub Sitnicki 2016-09-14 09:16:39 UTC
Tentative devel_ack+. We're committed to finding the root cause of this issue.

Comment 36 Jakub Sitnicki 2016-10-10 13:38:53 UTC
Hi Yaniv,

Let me recap what I gather from the previous investigation carried out by you, Ido, Marcelo, and Vlad:

1) the issue is not related to bonding, you have reconfigured the affected machines not to use bonding and the problem persists

2) the issue does not reproduce on RHEL7, kernel 3.10.0-123.el7.x86_64 was tested

3) the issue reproduces on RHEL6.7 guest where hypervisor is running on a UCS server that is connected to a UCS switch

4) some UCS switch firmware versions are known for tagging traffic sent to the host with VLAN id 0

5) it has not been confirmed yet that loading the 802.1q module in a guest running 2.6.32-573.12.1.el6.x86_64 kernel solves the problem

6) guest running RHEL6.7 with kernel newer than 573.12.1.el6 have not been tested

7) guest running RHEL6.8 or RHEL6.9 kernels has not been tested either

To do anything here I will need to confirm if the problem still happens with the latest RHEL6.9 kernel. Currently it is kernel-2.6.32-661.el6 and can be grabbed from:

http://download-node-02.eng.bos.redhat.com/brewroot/packages/kernel/2.6.32/661.el6/

Yaniv, is the environment where the problem was happening still available?
If so, could you deploy a RHEL6.8 guest there, update the kernel to 2.6.32-661.el6 and check if the problem persists?

I will try to reproduce it in a virtual setup but I don't know yet if mimicking the quirky UCS switch behavior is doable.

I'm not sure if I should be asking you or Ido about further testing. Please let me know if you're no longer involved with this bug.

Comment 37 Jakub Sitnicki 2016-10-12 07:53:17 UTC
Ido, Yaniv is not responding so maybe you can help with with checking if the latest RHEL6.9 kernel (-661.el6) still has the issue when the guest is running on a UCS server?

I'm afraid we're very short on time to find the root cause and backport a fix here because the Kernel Patch Submission Deadline for RHEL6.9 is set for Tue, 2016-10-18.

So far I haven't been able to reproduce the issue myself.

Comment 38 Yaniv Liberman 2016-10-13 06:10:55 UTC
Hi Jakub,

1. We were on Public Holiday in Monday and Tuesday.
2. Ido does not work in Red Hat any more.

Our RHV environment was upgraded to version 4.0.

Please let me know if this matters ASAP, as we are on Public Holiday again next week until October 24.

I need to know if it's possible to check this in RHV 4.0 because if not, we don't have enough time to set up a RHV 3.# / RHEL 6.# environment before we enter our Public Holiday.

Thanks,
Yaniv

Comment 39 Dan Kenigsberg 2016-10-13 06:36:35 UTC
(In reply to Yaniv Liberman from comment #38)
> I need to know if it's possible to check this in RHV 4.0

Yes, it is. We (RHV) care about this issue only if it reproduces on RHV-4 on top of el7. Nothing on the bridge/tap/qemu level has changed between rhev-3.6 and 4.

Comment 40 Yaniv Liberman 2016-10-13 07:47:54 UTC
(In reply to Dan Kenigsberg from comment #39)
> (In reply to Yaniv Liberman from comment #38)
> > I need to know if it's possible to check this in RHV 4.0
> 
> Yes, it is. We (RHV) care about this issue only if it reproduces on RHV-4 on
> top of el7. Nothing on the bridge/tap/qemu level has changed between
> rhev-3.6 and 4.

OK.

I installed RHEL 7.2 (3.10.0-327.el7.x86_64) on a UCS server.

We'll add it to our RHV 4.0 environment as soon as possible, and then install a RHEL 6.9 VM there to check if the problem is reproduced.

Comment 41 Jakub Sitnicki 2016-10-13 09:32:13 UTC
(In reply to Yaniv Liberman from comment #38)
> 1. We were on Public Holiday in Monday and Tuesday.
> 2. Ido does not work in Red Hat any more.

Yaniv, my bad, I didn't know. Looking forward to your test result with RHV 4.0 and RHEL 6.9 VM. I'm still working on a reproducer but maybe you will be able to confirm if RHEL 6.9 still has the bug first. Either way, in the end we would need to test in UCS environment.

Dan, thank you for chipping in and answering Yaniv questions.

Comment 42 Yaniv Liberman 2016-10-13 11:08:01 UTC
(In reply to Jakub Sitnicki from comment #41)
> (In reply to Yaniv Liberman from comment #38)
> > 1. We were on Public Holiday in Monday and Tuesday.
> > 2. Ido does not work in Red Hat any more.
> 
> Yaniv, my bad, I didn't know. Looking forward to your test result with RHV
> 4.0 and RHEL 6.9 VM. I'm still working on a reproducer but maybe you will be
> able to confirm if RHEL 6.9 still has the bug first. Either way, in the end
> we would need to test in UCS environment.

No worries.

I've just been told we don't have RHEL 6.9...

Do we have any alternatives or...?

Please advise.

Comment 43 Jakub Sitnicki 2016-10-13 14:08:35 UTC
(In reply to Yaniv Liberman from comment #42)
> (In reply to Jakub Sitnicki from comment #41)
> > (In reply to Yaniv Liberman from comment #38)
> > > 1. We were on Public Holiday in Monday and Tuesday.
> > > 2. Ido does not work in Red Hat any more.
> > 
> > Yaniv, my bad, I didn't know. Looking forward to your test result with RHV
> > 4.0 and RHEL 6.9 VM. I'm still working on a reproducer but maybe you will be
> > able to confirm if RHEL 6.9 still has the bug first. Either way, in the end
> > we would need to test in UCS environment.
> 
> No worries.
> 
> I've just been told we don't have RHEL 6.9...
> 
> Do we have any alternatives or...?
> 
> Please advise.

We should also be able to test with RHEL 6.8 and upgrade the kernel to latest version from 6.9, if that is an option.

Comment 44 Yaniv Liberman 2016-10-13 14:15:54 UTC
I see. Alright.

As we agreed on IRC, I'll try that when we come back from the Holidays.

I'll keep you posted.

Thanks!

Comment 45 Jakub Sitnicki 2016-10-13 15:42:17 UTC
Yaniv, could you please also gather info on what NICs are being used on the UCS server to provide connectivity to the guests?

# ethtool -i <dev>
# lspci -s <bus-info from ethtool output> -vmm

Thanks,
Jakub

Comment 46 Jakub Sitnicki 2016-10-20 15:40:08 UTC
Yaniv,

I've managed to simulate the quirky Cisco UCS firmware behavior and can confirm that Vlad's suggested solution from comment #31 is what you need when no bonding is involved:

> There are 2 ways to solve them:
>    1) load the 802.1q module on the guest so that vlan header can be
> processed
>       correctly.  without this module loaded, the vlan header remain in the
>       packet and cause packets to be dropped.
>    2) upgrade the kernel in VM.  There was a large effort to improve VLAN
>       vlan handling in rhel6.  As a result, with newer kernels, you no longer
>       need to load the module to correctly process VLAN 0 packets.

As per my tests, when RHEL 6.6 VM gets an ARP reply or an ICMP Echo reply with with VLAN 0 tag then outcome:

* kernel-2.6.32-504.el6 (base RHEL 6.6 kernel) up to 2.6.32-504.21.1.el6, 8021q module not loaded - tagged packets don't get untagged, ping doesn't work,
* kernel-2.6.32-504.el6 (base RHEL 6.6 kernel) up to 2.6.32-504.21.1.el6, 8021q module loaded - tagged packets get untagged, ping works,
* 2.6.32-504.22.1.el6 and above, 8021q module not loaded - tagged packets untagged, ping works.

Hence at the moment I'm inclined to close this as a duplicate of BZ 1135347 - the backport of the new vlan model that Vlad has been referring to. These changes have been backported into 6.6 z-stream in version.

Comment 47 Jakub Sitnicki 2016-10-20 15:42:13 UTC
(In reply to Jakub Sitnicki from comment #46)
> Hence at the moment I'm inclined to close this as a duplicate of BZ 1135347
> - the backport of the new vlan model that Vlad has been referring to. These
> changes have been backported into 6.6 z-stream in version.

.. in version 2.6.32-504.22.1.el6.

Comment 48 Yaniv Liberman 2016-10-26 07:20:14 UTC
(In reply to Jakub Sitnicki from comment #45)
> Yaniv, could you please also gather info on what NICs are being used on the
> UCS server to provide connectivity to the guests?
> 
> # ethtool -i <dev>
> # lspci -s <bus-info from ethtool output> -vmm
> 
> Thanks,
> Jakub

Just to clear this needinfo flag, even though it's probably irrelevant at this point:
--
[root@ucs1-b200-1 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp6s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:25:b5:0c:00:01 brd ff:ff:ff:ff:ff:ff
    inet 10.35.19.148/22 brd 10.35.19.255 scope global dynamic enp6s0
       valid_lft 42760sec preferred_lft 42760sec
    inet6 2620:52:0:2310:225:b5ff:fe0c:1/64 scope global noprefixroute dynamic 
       valid_lft 2591713sec preferred_lft 604513sec
    inet6 fe80::225:b5ff:fe0c:1/64 scope link 
       valid_lft forever preferred_lft forever

[root@ucs1-b200-1 ~]# ethtool -i enp6s0
driver: enic
version: 2.1.1.83
firmware-version: 2.1(2a)
bus-info: 0000:06:00.0
supports-statistics: yes                                                                                                                                                   
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

[root@ucs1-b200-1 ~]# lspci -s 0000:06:00.0 -vmm
Slot:   06:00.0
Class:  Ethernet controller
Vendor: Cisco Systems Inc
Device: VIC Ethernet NIC
SVendor:        Cisco Systems Inc
SDevice:        VIC 1240 MLOM Ethernet NIC
Rev:    a2
--


(In reply to Jakub Sitnicki from comment #46)
> As per my tests, when RHEL 6.6 VM gets an ARP reply or an ICMP Echo reply
> with with VLAN 0 tag then outcome:
> 
> * kernel-2.6.32-504.el6 (base RHEL 6.6 kernel) up to 2.6.32-504.21.1.el6,
> 8021q module not loaded - tagged packets don't get untagged, ping doesn't
> work,
> * kernel-2.6.32-504.el6 (base RHEL 6.6 kernel) up to 2.6.32-504.21.1.el6,
> 8021q module loaded - tagged packets get untagged, ping works,
> * 2.6.32-504.22.1.el6 and above, 8021q module not loaded - tagged packets
> untagged, ping works.
> 
> Hence at the moment I'm inclined to close this as a duplicate of BZ 1135347
> - the backport of the new vlan model that Vlad has been referring to. These
> changes have been backported into 6.6 z-stream in version.

I see. So kernel version 2.6.32-504.el6 fixes this problem in RHEL 6.6?

If so, then I guess it's safe to assume that in RHEL 6.8-9 it's already working properly, yes?

I haven't tested this yet to confirm, though, but I think it makes sense that it'd work.

Comment 50 Jakub Sitnicki 2016-10-26 08:27:28 UTC
(In reply to Yaniv Liberman from comment #48)
> (In reply to Jakub Sitnicki from comment #45)
> (In reply to Jakub Sitnicki from comment #46)
> > As per my tests, when RHEL 6.6 VM gets an ARP reply or an ICMP Echo reply
> > with with VLAN 0 tag then outcome:
> > 
> > * kernel-2.6.32-504.el6 (base RHEL 6.6 kernel) up to 2.6.32-504.21.1.el6,
> > 8021q module not loaded - tagged packets don't get untagged, ping doesn't
> > work,
> > * kernel-2.6.32-504.el6 (base RHEL 6.6 kernel) up to 2.6.32-504.21.1.el6,
> > 8021q module loaded - tagged packets get untagged, ping works,
> > * 2.6.32-504.22.1.el6 and above, 8021q module not loaded - tagged packets
> > untagged, ping works.
> > 
> > Hence at the moment I'm inclined to close this as a duplicate of BZ 1135347
> > - the backport of the new vlan model that Vlad has been referring to. These
> > changes have been backported into 6.6 z-stream in version.
> 
> I see. So kernel version 2.6.32-504.el6 fixes this problem in RHEL 6.6?

2.6.32-504.22.1.el6 or later from RHEL6.6 z-stream. 2.6.32-504.el6 is buggy.
 
> If so, then I guess it's safe to assume that in RHEL 6.8-9 it's already
> working properly, yes?
> 
> I haven't tested this yet to confirm, though, but I think it makes sense
> that it'd work.

Yes, it safe to assume. I've tested 2.6.32-663.el6.x86_64 (recentish RHEL6.9 development version) and VLAN 0 tagged frames are handled by the stack.

So, are you okay with me closing this one? Or would you like me keep it open  until you can confirm that lastest guests with 6.6 z-stream work in UCS environment?

Comment 51 Yaniv Liberman 2016-10-26 08:40:03 UTC
Thanks! This is good news.

Please keep this open for the time being.

Comment 52 sushil kulkarni 2016-11-17 16:04:20 UTC
This issue was fixed in bz1135347. Marking as dupe.

*** This bug has been marked as a duplicate of bug 1135347 ***