Bug 514492 - Cannot use tagged vlan inside DomU
Summary: Cannot use tagged vlan inside DomU
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xen
Version: 5.3
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Miroslav Rezanina
QA Contact: Virtualization Bugs
URL:
Whiteboard:
: 503206 507147 (view as bug list)
Depends On: 638539
Blocks: 699616
TreeView+ depends on / blocked
 
Reported: 2009-07-29 11:23 UTC by Kirby Zhou
Modified: 2011-10-21 09:24 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-10-21 09:24:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Logs from issues with VLAN (1.19 KB, application/octet-stream)
2010-07-08 12:06 UTC, Michal Novotny
no flags Details
network-bridge with vlan support (9.20 KB, text/plain)
2010-08-03 09:08 UTC, Miroslav Rezanina
no flags Details

Description Kirby Zhou 2009-07-29 11:23:49 UTC
Description of problem:

I cannot use tagged vlan inside DomU.

Version-Release number of selected component (if applicable):

RHEL-5.3
xen-3.0.3-80.el5_3.3
bridge-utils-1.1-2
kernel-xen-2.6.18-128.1.16.el5
vconfig-1.9-2.1
/lib/modules/2.6.18-128.1.16.el5xen/kernel/drivers/net/bnx2.ko=1.7.9-1

How reproducible:

100%

Steps to Reproduce:
1. Install RHEL-Xen Dom0 with 2 ethernet NIC

2. use /etc/xen/network-bridge-ym instead of /etc/xen/network-bridge in 'xend-config.sxp'
dom0]# cat network-bridge-ym  
#!/bin/bash
/etc/xen/scripts/network-bridge $* vifnum=0 bridge=xenbr0 netdev=eth0
/etc/xen/scripts/network-bridge $* vifnum=1 bridge=xenbr1 netdev=eth1

3. install a DomU with 2 xen bridege:
dom0#] cat /etc/xen/op-f5test-djt 
...
vif = [ "mac=00:16:3e:3e:e8:ed,bridge=xenbr0", 
        "mac=00:16:3e:3e:e8:ef,bridge=xenbr1" ]

4. Configure vlan inside DomU
domu]# cat ifcfg-eth1.475 
DEVICE=eth1.475
PHYSDEV=eth1
VLAN=yes
BOOTPROTO=static
IPADDR=220.181.125.31
NETMASK=255.255.255.0
ONBOOT=yes
domu]# ifup eth1.475
Added VLAN with VID == 475 to IF -:eth1:-
domu]# ifconfig eth1.475
eth1.475  Link encap:Ethernet  HWaddr 00:16:3E:3E:E8:EF  
          inet addr:220.181.125.31  Bcast:220.181.125.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

4. ping 220.181.125.254 and do tcpdump
domu]# ping 220.181.125.254
PING 220.181.125.254 (220.181.125.254) 56(84) bytes of data.
From 220.181.125.31 icmp_seq=1 Destination Host Unreachable
From 220.181.125.31 icmp_seq=2 Destination Host Unreachable

domu]# tcpdump -i eth1 -evvn 
tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 96 bytes
19:17:05.415749 00:16:3e:3e:e8:ef > Broadcast, ethertype 802.1Q (0x8100), length 46: vlan 475, p 0, ethertype ARP, arp who-has 220.181.125.254 tell 220.181.125.31
19:17:05.416942 00:0f:e2:d3:77:5c > 00:16:3e:3e:e8:ef, ethertype ARP (0x0806), length 60: arp reply 220.181.125.254 is-at 00:0f:e2:d3:77:5c
19:17:06.415831 00:16:3e:3e:e8:ef > Broadcast, ethertype 802.1Q (0x8100), length 46: vlan 475, p 0, ethertype ARP, arp who-has 220.181.125.254 tell 220.181.125.31
19:17:06.417024 00:0f:e2:d3:77:5c > 00:16:3e:3e:e8:ef, ethertype ARP (0x0806), length 60: arp reply 220.181.125.254 is-at 00:0f:e2:d3:77:5c

domu]# arp
Address                  HWtype  HWaddress           Flags Mask            Iface
220.181.125.254                  (incomplete)                              eth1.475  

dom0]# tcpdump -i vif2.1 -evvn
tcpdump: WARNING: vif2.1: no IPv4 address assigned
tcpdump: listening on vif2.1, link-type EN10MB (Ethernet), capture size 96 bytes
19:22:59.511002 00:16:3e:3e:e8:ef > Broadcast, ethertype 802.1Q (0x8100), length 46: vlan 475, p 0, ethertype ARP, arp who-has 220.181.125.254 tell 220.181.125.31
19:22:59.512179 00:0f:e2:d3:77:5c > 00:16:3e:3e:e8:ef, ethertype ARP (0x0806), length 60: arp reply 220.181.125.254 is-at 00:0f:e2:d3:77:5c
19:23:00.508037 00:16:3e:3e:e8:ef > Broadcast, ethertype 802.1Q (0x8100), length 46: vlan 475, p 0, ethertype ARP, arp who-has 220.181.125.254 tell 220.181.125.31
19:23:00.509247 00:0f:e2:d3:77:5c > 00:16:3e:3e:e8:ef, ethertype ARP (0x0806), length 60: arp reply 220.181.125.254 is-at 00:0f:e2:d3:77:5c


Actual results:

You can easiy find that: arp-reply packat arrived at eth1@DomU, but cannot be processed by eth1.475@DomU. I donot the reason. Maybe the 802.1Q tag has been eaten by Dom0's xenbr1, maybe the 802.1Q tag has been eaten by DomU.

Expected results:

DomU can process 802.1Q tagged vlan correctly.

Additional info:

dom0]# lspci | fgrep Broadcom
02:00.0 PCI bridge: Broadcom EPB PCI-Express to PCI-X Bridge (rev c3)
03:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 12)
04:00.0 PCI bridge: Broadcom EPB PCI-Express to PCI-X Bridge (rev c3)
05:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 12)

Comment 2 Michal Novotny 2010-06-28 13:54:39 UTC
Well, I did try testing it and *maybe* I don't understand VLANs correctly but my configuration is using 2 bridges for both the guests, xenbr0 and virbr0, with the following setup:

domU1:
IP for xenbr0 = 10.34.26.225, MAC addr = 00:16:3E:6B:D5:C0
IP for virbr0 = 192.168.122.115, MAC addr = 00:16:3E:6B:D5:D0

Manual configuration for VLAN in /etc/sysconfig/network-scripts/ifcfg-eth1.475:
DEVICE=eth1.475
PHYSDEV=eth1
VLAN=yes
BOOTPROTO=static
IPADDR=192.168.122.116
NETMASK=255.255.255.0
ONBOOT=yes

domU2:
IP for xenbr0 = 10.34.27.243, MAC addr = 00:16:36:1A:77:27
IP for virbr0 = 192.168.122.106, MAC addr = 00:16:36:1A:77:28

Manual configuration for VLAN in /etc/sysconfig/network-scripts/ifcfg-eth1.476:
DEVICE=eth1.476
PHYSDEV=eth1
VLAN=yes
BOOTPROTO=static
IPADDR=192.168.122.107
NETMASK=255.255.255.0
ONBOOT=yes

domU# tcpdump -i eth1 -evvn
15:21:09.488374 00:16:36:1a:77:28 > Broadcast, ethertype ARP (0x0806), length 60: arp who-has 192.168.122.116 tell 192.168.122.106
15:21:10.488200 00:16:36:1a:77:28 > Broadcast, ethertype ARP (0x0806), length 60: arp who-has 192.168.122.116 tell 192.168.122.106

domU# tcpdump -i eth1
13:45:49.788663 IP 192.168.122.106.34421 > 192.168.122.1.domain:  53453+ PTR? 106.122.168.192.in-addr.arpa. (46)
13:45:49.788922 IP 192.168.122.1 > 192.168.122.106: ICMP host 192.168.122.1 unreachable - admin prohibited, length 82

domU# tcpdump -i eth1.475 -evvn 
eth1: Promiscuous mode enabled.
tcpdump: listening on eth1.475, link-type EN10MB (Ethernet), capture size 96 bytes

0 packets captured
0 packets received by filter
0 packets dropped by kernel
# 

dom0# tcpdump -i vif($DOM_ID).1 -evvn
15:22:20.198503 00:16:36:1a:77:28 > Broadcast, ethertype ARP (0x0806), length 42: arp who-has 192.168.122.116 tell 192.168.122.106
15:22:21.198399 00:16:36:1a:77:28 > Broadcast, ethertype ARP (0x0806), length 42: arp who-has 192.168.122.116 tell 192.168.122.106

dom0# tcpdump -i virbr0 -evvn
15:30:15.746080 00:16:3e:6b:d5:d0 > Broadcast, ethertype ARP (0x0806), length 42: arp who-has 192.168.122.107 tell 192.168.122.116
15:30:15.746364 00:16:36:1a:77:28 > 00:16:3e:6b:d5:d0, ethertype ARP (0x0806), length 42: arp reply 192.168.122.107 is-at 00:16:36:1a:77:28

Nevertheless, is my configuration correct and should it be working for without Xen? I did the testing using the similar setup on dom0 but I was unable to see it working fine now.

Thanks,
Michal

Comment 3 Kirby Zhou 2010-06-29 04:24:00 UTC
Of course, vlan 475 and vlan 476 can not contact with each other. If u can, it is not vlan.

Each vlan owns a vlan tag ( 475, 476, and etc. ), only machines with the same vlan tag are in the same vlan, and only machines in the same vlan can see each other. If A and B in different vlan, they must contact to each other through a gateway.

Comment 4 Michal Novotny 2010-06-29 10:42:45 UTC
Kirby,
are you saying that VLAN IDs have to be the same on both machines in order to make it possible contact each other and for 475 and 476 VLAN IDs they can never contact each other? I.e. that I need to setup the same VLAN ID on both the guests?

Thanks,
Michal

Comment 5 Kirby Zhou 2010-06-29 11:57:11 UTC
You got it, the 2 vlan id must match. And you can only ping from one domU to another, cannot ping from dom0 to domU, because dom0 is not in VLAN.

BTW: if you test 2 guest from 2 host, the ethernet switch between the 2 host must be configured with tagged vlan enabled. Maybe ad direct cable link without switch is much more easy to test.

Comment 6 Michal Novotny 2010-06-29 13:30:30 UTC
(In reply to comment #5)
> You got it, the 2 vlan id must match. And you can only ping from one domU to
> another, cannot ping from dom0 to domU, because dom0 is not in VLAN.
> 
> BTW: if you test 2 guest from 2 host, the ethernet switch between the 2 host
> must be configured with tagged vlan enabled. Maybe ad direct cable link without
> switch is much more easy to test.    

So VLAN interface shouldn't be accessible from dom0? That way my setup is wrong since I'm able to access the interface from within dom0.

Michal

Comment 7 Kirby Zhou 2010-06-29 17:37:26 UTC
In the expected situation, domU VLAN interface shouldn't be accessible from dom0 through a non-VLAN interface. And dom0's bridge should bypass the tagged VLAN packet to the outer network.

I suggest you just try to setup a simple situation for testing and understanding VLAN . There are 2 physical machines linked with cable and without switch, just configure the VLAN interface eth0.XXX on bare metal system without virtualization, and do not bind any ip on eth0. They will pass tagged VLAN packet to each other, iff the 2 system uses the same VLAN id, they can see each other. Sometimes switch may block tagged VLAN packet, so a direct cable link can simplify the situation.


You can check http://en.wikipedia.org/wiki/IEEE_802.1Q for more details.

Comment 8 Kirby Zhou 2010-06-29 17:46:35 UTC
My dom0's eth1 is connected to a 'Trunk Port' of switch which forwards and receives tagged frames. So dom0 can send and receive tagged frames through its eth1 nic.

Then, xenbr1 is created based on eth1 of dom0, and the eth1 of domU is a part of xenbr1. 

I want the domU can send and receive tagged frames through its eth1, but I failed.

Comment 9 Michal Novotny 2010-07-02 06:02:38 UTC
Kirby,
so basically what you need to make VLAN working for the guests is to set VLANs with same VLAN IDs to 2 guests and the second guest (B) should be accessible from the first one (A) using guest's B IP address, i.e. in my example using 192.168.122.107 from guest A (IP = 192.168.122.116) and the path should be:

 guest A ethX VLAN -> vif(A).0 -> xenbr -> vif(B).0 -> guest B ethX VLAN

and this should work fine. Is my understanding correct ? Why do I require 2 bridges at all ? Wouldn't one with appropriate VLAN settings on both guests be enough ?

Thanks,
Michal

Comment 10 Kirby Zhou 2010-07-02 07:25:42 UTC
You donot need 2 bridge. But you should tested on the actual network finally.


guest A ethX VLAN -> vif(A).0 -> xenbr -> eth0(Host A) -> ethernet -> eth0(Host B) -> xenbr -> vif(B).0 -> guest B ethX VLAN.

and

guest A ethX VLAN -> vif(A).0 -> xenbr -> eth0(Host A) -> ethernet -> host C ethX VLAN.


In some buggy situation, host A would send untagged packate to the outer ethernet.

Comment 11 Michal Novotny 2010-07-02 07:33:40 UTC
Well Kirby, the thing here is that my guests doesn't work with 2 bridges configured like in the comment #0 since it's being stuck indefinitely on the bringing up the second vNIC. Do I really need to use 2 physical machines now? This is pretty bad since I can't use them right now. I have access only to one physical machine right now.

Michal

Comment 12 Kirby Zhou 2010-07-02 09:15:40 UTC
Hi, Michal, You can first fix your 'comment #0' configuration with the identical vlan id. It should work as a expected behavior. Otherwise, there is a bug or feature missing.

And '2 bridges' is not necessary, and should not cause problem either.

Comment 13 Kirby Zhou 2010-07-02 09:22:34 UTC
Maybe a physical machine with 2 or more phyNIC is suitable for testing, you can connect 2 of its NIC as a loop.

eth0--switch ( for you ssh access )
eth1--eth2

Comment 14 Michal Novotny 2010-07-02 10:24:17 UTC
Kirby,
I did try to test setting up 2 bridges for both the Xen HVM guests and I was unable to see any VLAN packets being sent :( According to the Wikipedia page there should be 32 additional bits in the ethernet frame with ethertype set to 0x8100 but according to the dump there's no such type. Apparently I'm doing something wrong when setting up the configuration since the machines are accessible even through dom0 which shouldn't be right AFAIK. The configuration is same for both the guests, having xenbr0 (LAN bridge) and virbr0 (localhost DHCP/bridge). But I've noticed that the MAC address for both the devices (normal & VLAN) are the same:

eth1      Link encap:Ethernet  HWaddr 00:16:36:1A:72:20  
          inet addr:192.168.122.106  Bcast:192.168.122.255  Mask:255.255.255.0
          inet6 addr: fe80::216:36ff:fe1a:7220/64 Scope:Link

eth1.475  Link encap:Ethernet  HWaddr 00:16:36:1A:72:20  
          inet addr:192.168.122.110  Bcast:192.168.122.255  Mask:255.255.255.0
          inet6 addr: fe80::216:36ff:fe1a:7220/64 Scope:Link

Although I've tried to set the HWADDR in the VLAN configuration to some other (I changed the last 20 to 30) it did nothing and it's still ending with 20. Nevertheless the module for VLAN networking is loaded:

# dmesg | grep -i vlan
802.1Q VLAN Support v1.8 Ben Greear <greearb>
# lsmod | grep 8021
8021q                  57425  0 
# 

When I was starting the guests, they were showing following:

Bringing up interface eth1.475:  Added VLAN with VID == 475 to IF -:eth1:-
[  OK  ]

So I guess the setup should be ok. Or I am doing something wrong ?

Michal

Comment 15 Kirby Zhou 2010-07-07 08:07:31 UTC
Yes, the MAC address should be the same as default. And I can use 'ip link' to set its address, maybe you should set down the eth1.475 first?

[root@xen-727057^10.227 ~]# ip link
3: eth1: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:16:3e:72:70:58 brd ff:ff:ff:ff:ff:ff
4: eth1.475@eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop 
    link/ether 00:16:3e:72:70:59 brd ff:ff:ff:ff:ff:ff
[root@xen-727057^10.227 ~]# ip link set dev eth1.475 address  00:16:3e:72:70:58
[root@xen-727057^10.227 ~]# ip link
3: eth1: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:16:3e:72:70:58 brd ff:ff:ff:ff:ff:ff
4: eth1.475@eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop 
    link/ether 00:16:3e:72:70:58 brd ff:ff:ff:ff:ff:ff

I think you should remove the ip address which assigned to eth1 (192.168.122.106), or assign eth1.475 a different ip subnet of eth1.

My simple case:
DomA/eth1.475 at Host0
  modprobe 8021q
  vconfig add eth1 475
  ip addr add 9.9.9.9/24 dev eth1.475
  ip link set eth1.475 up
DomB/eth1.475 at Host0
  9.9.9.10
Host0/eth1.475
  9.9.9.11
Host1/eth1.475
  9.9.9.12

at DomB
        arping -I eth1.475   ping -I eth1.475
9.9.9.9      ok                ok
9.9.9.10    fail               ok
9.9.9.11    fail              fail
9.9.9.12    fail              fail


While arping Host1, do tcpdump at Host0 and Host1:


And there is a related bug: tcpdump cannot deal with 802.1q vlan tag.
https://bugzilla.redhat.com/show_bug.cgi?id=498981

Comment 16 Kirby Zhou 2010-07-07 08:15:39 UTC
Wait one mintue.
After doing 'vconfig rem peth1.475' at Host0, the bug seems disappeared with RHEL-5.5. Now I can ping 9.9.9.12 from both 9.9.9.9 and 9.9.9.10 (9.9.9.11 is dropped, now forget it).

Comment 17 Michal Novotny 2010-07-07 16:06:56 UTC
So is it working? In fact I was unable to make it working for VLAN isolation at all :(

Michal

Comment 18 Kirby Zhou 2010-07-07 16:54:23 UTC
Yes, it is working now.

In your situation, I think the problem is the ip subnet, you had made eth1@domu and eth1.475@domu in the same subnet (192.168.122/24). I suggest you unassign the ip of eth1@domu.

Comment 19 Michal Novotny 2010-07-08 11:34:07 UTC
(In reply to comment #15)
...
> 
> My simple case:
> DomA/eth1.475 at Host0
>   modprobe 8021q
>   vconfig add eth1 475
>   ip addr add 9.9.9.9/24 dev eth1.475
>   ip link set eth1.475 up
> DomB/eth1.475 at Host0
>   9.9.9.10
> Host0/eth1.475
>   9.9.9.11
> Host1/eth1.475
>   9.9.9.12
> 
> at DomB
>         arping -I eth1.475   ping -I eth1.475
> 9.9.9.9      ok                ok
> 9.9.9.10    fail               ok
> 9.9.9.11    fail              fail
> 9.9.9.12    fail              fail
> 
> 

Well, what do you mean by DomA, DomB, Host0 and Host1 ? Since both DomA and DomB are on the Host0 (which is the first host/physical machine I think) we don't need Host1 at all, do we?

Nevertheless I tried setting up different subnet for the VLAN network but I was not having luck since I was unable to make it ping any of the virtual machines. My configuration was one dom0 (subnet 192.168.122.1/24 for the eth1 device and subnet 192.168.121.1/24 for the VLAN with ID 475) running 2 HVM VMs in the same network (with IP addresses 192.168.121.110 and 192.168.121.116 - both connected to virbr0) with no VLAN on the host machine to isolate the host machine (dom0) from those guests on the VLAN level.

Doing the steps in the DomA/eth1.475 at Host0 paragraph but I'm unable to ping the machine using VLAN interface.

# tcpdump -evvn -i eth1.475 [for ping 9.9.9.9 - from 9.9.9.10]
07:27:32.145159 00:16:36:1a:72:20 > Broadcast, ethertype ARP (0x0806), length 42: arp who-has 9.9.9.9 tell 9.9.9.10
... this is being repeated until I terminate 'ping' resulting into "Destination Host Unreachable" ...

# arping -I eth1.475 9.9.9.9
ARPING 9.9.9.9 from 9.9.9.10 eth1.475
Sent 14 probes (14 broadcast(s))
Received 0 response(s)

and the tcpdump for this is:
07:30:22.991733 00:16:36:1a:72:20 > Broadcast, ethertype ARP (0x0806), length 42: arp who-has 9.9.9.9 (Broadcast) tell 9.9.9.10
... also repeated until I terminate 'arping' ...

> While arping Host1, do tcpdump at Host0 and Host1:
> 
> 
> And there is a related bug: tcpdump cannot deal with 802.1q vlan tag.
> https://bugzilla.redhat.com/show_bug.cgi?id=498981    

...

> Wait one mintue.
> After doing 'vconfig rem peth1.475' at Host0, the bug seems disappeared with
> RHEL-5.5. Now I can ping 9.9.9.12 from both 9.9.9.9 and 9.9.9.10 (9.9.9.11 is
> dropped, now forget it).    

Do you mean RHEL-5.5 host only or RHEL-5.5 both host and the guests ?

My host is RHEL-5.5 and both the guests are RHEL-5.3.

...

> Yes, it is working now.
>
> In your situation, I think the problem is the ip subnet, you had made 
> eth1@domu and eth1.475@domu in the same subnet (192.168.122/24). I suggest 
> you unassign the ip of eth1@domu.

I did try setting up the different subnet but no luck to make it working.

Michal

Comment 20 Kirby Zhou 2010-07-08 11:50:43 UTC
I use Host1 to verify that the VLAN tag of package is kept by Host0.

It seems the package is not sent to DomB.

Could you please do 'ip addr' at Dom0/DomA/DomB and 'brctl show' at Dom0?

Comment 21 Michal Novotny 2010-07-08 12:06:48 UTC
Created attachment 430326 [details]
Logs from issues with VLAN

Those are the logs from domA,domB and dom0 . The setup was done as described in previous comment. Any ideas?

Thanks,
Michal

Comment 22 Michal Novotny 2010-07-08 12:27:11 UTC
(In reply to comment #21)
> Created an attachment (id=430326) [details]
> Logs from issues with VLAN
> 
> Those are the logs from domA,domB and dom0 . The setup was done as described in
> previous comment. Any ideas?
> 
> Thanks,
> Michal    

Well, I did triage using 2 physical machines now but I was unable to make it working. First, I was thinking that the problem may be the absence of broadcast address so I added the address to be 9.9.9.255 but it's still not working.

host0# arp
Address                  HWtype  HWaddress           Flags Mask            Iface
9.9.9.12                         (incomplete)                              eth1.475

host1# arp
Address                  HWtype  HWaddress           Flags Mask            Iface
9.9.9.11                         (incomplete)                              eth0.475

The setup was done using:

   modprobe 8021q
   vconfig add eth1 475
   ip addr add 9.9.9.{11|12}/24 broadcast 9.9.9.255 dev eth1.475
   ip link set eth1.475 up

on those physical machines, one was having 9.9.9.11 and the second one was having 9.9.9.12 . I was running tcpdump on both the machines and when I did ping from 9.9.9.11 to 9.9.9.12 no packets were on tcpdump window of 9.9.9.12 and on 9.9.9.11 tcpdump window there was only the following repeated message until I terminated tcpdump:

14:24:45.015416 00:e0:81:b0:2f:dc > Broadcast, ethertype ARP (0x0806), length 42: arp who-has 9.9.9.11 tell 9.9.9.12

The arp output above was acquired using this configuration so I don't know what may be wrong :( I'm not having any experience with VLAN networks yet so I apologize but I'm unable to make VLAN working so your help would be appreciated Kirby.

Thanks,
Michal

Comment 23 Kirby Zhou 2010-07-08 12:51:01 UTC
I have seen a tap device in the dom0's log. So did you use HVM instead of PV guest? Maybe it is a problem.

And with 2 phy Machines, what about the link between them?

Comment 24 Michal Novotny 2010-07-08 12:58:04 UTC
(In reply to comment #23)
> I have seen a tap device in the dom0's log. So did you use HVM instead of PV
> guest? Maybe it is a problem.
> 
> And with 2 phy Machines, what about the link between them?    

Yeah, I'm using 2 HVM guests. Nevertheless I'm doing the same with 2 physical machines but it's not working at all - even for 2 physical machines. Is it possible that router/switch is dropping the VLAN frames from the packets since the setup from comment #22 is not working?

Thanks,
Michal

Comment 25 Kirby Zhou 2010-07-08 13:02:07 UTC
I have noticed in comment #5:

The switch between the 2 host must be configured with tagged vlan enabled.
A direct cable link without switch is much more easy to test.


Thanks
Kirby

Comment 26 Michal Novotny 2010-07-08 13:31:38 UTC
(In reply to comment #25)
> I have noticed in comment #5:
> 
> The switch between the 2 host must be configured with tagged vlan enabled.
> A direct cable link without switch is much more easy to test.
> 
> 
> Thanks
> Kirby    

You're right Kirby. I've been talking to our network guys and the VLAN tagging is disabled due to security reasons except the lab machines so I'm going to do the direct cable link connection or reserve 2 lab machines for the testing.

Thanks,
Michal

Comment 27 Michal Novotny 2010-07-08 15:40:23 UTC
(In reply to comment #26)
> (In reply to comment #25)
> > I have noticed in comment #5:
> > 
> > The switch between the 2 host must be configured with tagged vlan enabled.
> > A direct cable link without switch is much more easy to test.
> > 
> > 
> > Thanks
> > Kirby    
> 
> You're right Kirby. I've been talking to our network guys and the VLAN tagging
> is disabled due to security reasons except the lab machines so I'm going to do
> the direct cable link connection or reserve 2 lab machines for the testing.
> 
> Thanks,
> Michal    

I've been trying to reserve 2 lab machines for testing which took some time but still the same results - when pinging from 9.9.9.11 to 9.9.9.12:

11:33:11.688481 00:18:71:8c:16:22 > Broadcast, ethertype ARP (0x0806), length 42: arp who-has 9.9.9.12 tell 9.9.9.11
... repeated until I terminated ping ...

so unfortunately I'm stuck with this one. Those machines should be on the same switch according to it's location but I'm unable to confirm it since those machines are remote and I'm not having physical access to them. It's possible that those machines are not on the same switch but VLAN tagging should be enabled in the lab so I think direct connection is the only thing we could try now.

Michal

Comment 29 Miroslav Rezanina 2010-07-09 09:36:10 UTC
Retesting with direct connection shows that arp packets really can't reach the .475 vlan interface. 

However, with single bridge/guest device vlan is working corectly.

Comment 30 Michal Novotny 2010-07-09 11:54:03 UTC
(In reply to comment #29)
> Retesting with direct connection shows that arp packets really can't reach the
> .475 vlan interface. 
> 
> However, with single bridge/guest device vlan is working corectly.    

Thanks for your testing. Could you please provide me your configuration that seems to be working? Did you test with both physical and virtual machines? According to your comment it seems that physical machine with just one single bridge makes VLAN working correctly as well as virtual machine with just one guest device with VLAN set appropriately.

Michal

Comment 31 Michal Novotny 2010-07-09 14:17:11 UTC
Well, we've been investigating this further and according to our NICs detail page (Broadcom Corporation NetXtreme BCM5755 Gigabit Ethernet PCI Express) which is located at [1] we found out that both Intel and Broadcom NICs does strip the VLAN tags from the packets (and that's why we don't see the VLAN frames in Ethereal/Wireshark) and unfortunately for the configuration the B57udiag utility (running on FreeDOS live CD and downloadable from Broadcom site) is required which I found on VLAN stripping info page located at [2]. Unfortunately I was unable to access the CD-ROM device using the FreeDOS CD-ROM drivers, apparently my CD-ROM drive is not being supported and therefore I can't set IPMI and ASF modes as described in comment at [2]. Also, some information about setting this on Intel NICs will be necessary as well since our laptops are equipped with Intel Corporation 82567LM Gigabit Network Connection and the workstation is Broadcom Corporation NetXtreme BCM5755 Gigabit Ethernet PCI Express. Since the network administrators doesn't allow setting up tagged VLANs on the network the only testing we can do is the direct link connection so we can either use 2 similar physical machines (with the same NICs) or we have to find the way to disable VLAN Tag Stripping on both Intel and Broadcom NICs. I've been investigating the source code of Broadcom-TG3 driver and there's VLAN network registration done using the tg3_vlan_rx_register() function which is being called only when TG3_VLAN_TAG_USED definition is being set which updates RX_MODE_KEEP_VLAN_TAG bit in RX_MODE register. Unfortunately I can't confirm that this is being set and VLAN is being registered (there are no messages written to the /var/log/messages or dmesg AFAIK).

When I was googling I saw there should be some tool called baspcfg for Linux systems which is basically Broadcom Advanced Server Program configuration tool but I was unable to find a downloadable source/binary for this one since this should support setting up VLAN the correct way and with no VLAN Tag Stripping.

Michal

[1] http://www.broadcom.com/collateral/pg/5722-PG101-R.pdf, page 69 (VLAN Tag Strip)
[2] http://www.linuxforums.org/forum/debian-linux-help/108199-broadcom-57xx-vlan-info-striping.html

Comment 32 Kirby Zhou 2010-07-09 16:28:52 UTC
(In reply to comment #31)

FYI

BCM5755 said that:

The VLAN tag is automatically stripped from the IEEE 802.1q compliant packet at reception and then placed in a receive buffer descriptor’s two byte VLAN tag field. The flag field has the BD_FLAGS_VLAN_TAG bit set when a valid VLAN packet is received. Once the packet has been serviced by the host software, these fields should be zeroed out.

BCM5709 said that:

KEEP_VLAN_TAG 
Setting this bit forces the RX MAC to keep the VLAN tag in the data delivered to the RX MBUF area. This bit should only be set for debugging reasons. This bit affects all packets regardless of RX Parser packet sorting logic.

And My BCM5709 works with tagged VLAN without any special setting.

Comment 33 Michal Novotny 2010-07-09 17:24:15 UTC
Interesting, when I did setup VLAN bridge on dom0 and VLAN in one guest and when I tried pinging the host machine from the guest it was not returning the MAC address and there was following message all the time until I stopped ping in the guest:

17:54:37.191297 arp who-has 9.9.9.11 tell 9.9.9.12
17:55:16.464643 arp who-has 9.9.9.11 tell 9.9.9.12

But when I did arping instead of ping I was getting the MAC address correctly:

Unicast reply from 9.9.9.11 [00:23:7D:53:64:7E]  0.837ms
Unicast reply from 9.9.9.11 [00:23:7D:53:64:7E]  0.872ms

but MAC address 00:23:7D:53:64:7E seems to be assigned to eth1 device on the host instead of the VLAN bridge itself. The IP 9.9.9.11 does correspond to the VLAN interface on host machine but arp shows that device the interface with VLAN 475 does have incomplete address but the same IP address exists on ethernet device with no VLAN set and this is working fine (but only for arping).

Comment 34 Michal Novotny 2010-07-09 17:26:35 UTC
(In reply to comment #32)
> (In reply to comment #31)
> 
> FYI
> 
> BCM5755 said that:
> 
> The VLAN tag is automatically stripped from the IEEE 802.1q compliant packet at
> reception and then placed in a receive buffer descriptor’s two byte VLAN tag
> field. The flag field has the BD_FLAGS_VLAN_TAG bit set when a valid VLAN
> packet is received. Once the packet has been serviced by the host software,
> these fields should be zeroed out.
> 

Yeah, I'm having BCM5755 and the problem is that I don't know how to set the BD_FLAGS_VLAN_TAG since the b57udiag utility comes with FreeDOS on the liveCD but it can't see my DVD-ROM drive at all so I can't use the procedure as described on http://www.linuxforums.org/forum/debian-linux-help/108199-broadcom-57xx-vlan-info-striping.html page.

Michal

> BCM5709 said that:
> 
> KEEP_VLAN_TAG 
> Setting this bit forces the RX MAC to keep the VLAN tag in the data delivered
> to the RX MBUF area. This bit should only be set for debugging reasons. This
> bit affects all packets regardless of RX Parser packet sorting logic.
> 
> And My BCM5709 works with tagged VLAN without any special setting.

Comment 35 Kirby Zhou 2010-07-10 05:08:03 UTC
I mean that: You DO NOT need to set BD_FLAGS_VLAN_TAG. BCM5709 owns the same register linke BCM5755. I did nothing with it, and I can play VLAN.

However, no register WOULD keep the value while rebooting. So using FreeDOS to set the register for linux is not a good idea.

Comment 36 Miroslav Rezanina 2010-07-12 08:26:44 UTC
Ok...I found the problem.

If you use the ifcfg- script, vlan is created upon physical interface. However, as xend starts, this interface is renamed to peth. If we setup vlan after xend is started and virtual network interfaces setuped, vlan is working for guest. This is true even with two nic interfaces, but there are switched numbers in guest (guest eth1 == dom0 eth0). 

This means there's needed fix in network-bridge script to support vlan.

Comment 37 Michal Novotny 2010-07-12 09:18:15 UTC
(In reply to comment #35)
> I mean that: You DO NOT need to set BD_FLAGS_VLAN_TAG. BCM5709 owns the same
> register linke BCM5755. I did nothing with it, and I can play VLAN.
> 
> However, no register WOULD keep the value while rebooting. So using FreeDOS to
> set the register for linux is not a good idea.    

Oh, ok. Nevertheless I wasn't able to make it working even when trying direct link connection. Since Mirek was able to make it working for direct link connection and also that he found the problem he took this one and he's currently working on this one.

Michal

Comment 38 Kirby Zhou 2010-07-13 07:26:53 UTC
(In reply to comment #36)
> Ok...I found the problem.
> If you use the ifcfg- script, vlan is created upon physical interface. However,
> as xend starts, this interface is renamed to peth. If we setup vlan after xend
> is started and virtual network interfaces setuped, vlan is working for guest.
> This is true even with two nic interfaces, but there are switched numbers in
> guest (guest eth1 == dom0 eth0). 
> This means there's needed fix in network-bridge script to support vlan.    



So, there are at least 3 situations needs to deal with?

1. Dom0 DO NOT know the VLAN, and DomU takes VLAN by itself.
on dom0: xenbr0 = eth0@dom0+eth0@domu
on domu: eth0.475 is based on eth0.

2. Dom0 DO join the VLAN, and DomU takes VLAN by itself.
on dom0: xenbr0 = eth0@dom0+eth0@domu
on domu: eth0.475 is based on eth0.
on dom0: eth0.475 is based on what?

3. Dom0 DO join the VLAN, and DomU takes VLAN through Dom0.
on dom0: xenbr0 = eth0@dom0+eth0@domu
on dom0: xenbr1 = eth0.475@dom0+eth1@domu

Comment 39 Miroslav Rezanina 2010-07-13 08:18:44 UTC
(In reply to comment #38)
> So, there are at least 3 situations needs to deal with?

You're right. These are three possible situations. 

> 
> 1. Dom0 DO NOT know the VLAN, and DomU takes VLAN by itself.
> on dom0: xenbr0 = eth0@dom0+eth0@domu
> on domu: eth0.475 is based on eth0.
> 
I have to do additional testing. This could work as there's no vlan in dom0 that can be incorrectly connected.

> 2. Dom0 DO join the VLAN, and DomU takes VLAN by itself.
> on dom0: xenbr0 = eth0@dom0+eth0@domu
> on domu: eth0.475 is based on eth0.
> on dom0: eth0.475 is based on what?
> 

This is situation I tested. It works, if the dom0 eth0.475 is based on eth0 - virtual interface created by network-bridge script. If we use ifcfg- script in /etc/sysconfig/network-script, vlan is created before xend handling and so it is  based on peth0 device.

> 3. Dom0 DO join the VLAN, and DomU takes VLAN through Dom0.
> on dom0: xenbr0 = eth0@dom0+eth0@domu
> on dom0: xenbr1 = eth0.475@dom0+eth1@domu    

I will test this too, but this should be already working as network-bridge script take this option into account.

Comment 40 Miroslav Rezanina 2010-07-26 11:25:19 UTC
(In reply to comment #39)
> (In reply to comment #38)
> > So, there are at least 3 situations needs to deal with?
> 
> You're right. These are three possible situations. 
> 
> > 
> > 1. Dom0 DO NOT know the VLAN, and DomU takes VLAN by itself.
> > on dom0: xenbr0 = eth0@dom0+eth0@domu
> > on domu: eth0.475 is based on eth0.
> > 
> I have to do additional testing. This could work as there's no vlan in dom0
> that can be incorrectly connected.
> 

Yes, this is working correctly even without fixing this problem.

> > 2. Dom0 DO join the VLAN, and DomU takes VLAN by itself.
> > on dom0: xenbr0 = eth0@dom0+eth0@domu
> > on domu: eth0.475 is based on eth0.
> > on dom0: eth0.475 is based on what?
> > 
> 
> This is situation I tested. It works, if the dom0 eth0.475 is based on eth0 -
> virtual interface created by network-bridge script. If we use ifcfg- script in
> /etc/sysconfig/network-script, vlan is created before xend handling and so it
> is  based on peth0 device.
> 
> > 3. Dom0 DO join the VLAN, and DomU takes VLAN through Dom0.
> > on dom0: xenbr0 = eth0@dom0+eth0@domu
> > on dom0: xenbr1 = eth0.475@dom0+eth1@domu    
> 
> I will test this too, but this should be already working as network-bridge
> script take this option into account.    

This is working too.

Comment 41 Miroslav Rezanina 2010-08-03 09:08:06 UTC
Created attachment 436219 [details]
network-bridge with vlan support

This is a network-script containing support for vlan based on bridged interface.

Comment 42 Miroslav Rezanina 2010-08-03 09:10:02 UTC
Hi Kirby,
can you please test network-bridge script attached in comment #41. It should correctly setup network with vlans.

Comment 43 Miroslav Rezanina 2010-08-18 11:55:22 UTC
*** Bug 507147 has been marked as a duplicate of this bug. ***

Comment 44 Miroslav Rezanina 2010-08-18 12:01:52 UTC
*** Bug 503206 has been marked as a duplicate of this bug. ***

Comment 45 Kirby Zhou 2010-08-18 17:05:26 UTC
(In reply to comment #42)
> Hi Kirby,
> can you please test network-bridge script attached in comment #41. It should
> correctly setup network with vlans.

It seems works.
But sometimes lost some route info through vlan.if.

Because get_ip_info can not extract all route infos, it can only extract default gateway.

Comment 46 Miroslav Rezanina 2010-08-19 06:00:20 UTC
Yeah, that's possible. We suggest to do not use automatic scripting when complicated network setting is used - script must not know what is your expected behavior. We cannot guarantee correct information transfer in case you are using some extended setting (like special routing rules).

Comment 48 Miroslav Rezanina 2010-09-29 09:55:27 UTC
This is blocked by bz #638539. Till solving it patch cannot be accepted and committed.

Comment 50 Miroslav Rezanina 2010-11-30 06:29:17 UTC
As bug in kernel preventing vconfig rem, vconfig add be called without delay between them is not handled for 5.6, this is moved to 5.7.

Comment 52 RHEL Program Management 2011-01-11 20:17:12 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated in the
current release, Red Hat is unfortunately unable to address this
request at this time. Red Hat invites you to ask your support
representative to propose this request, if appropriate and relevant,
in the next release of Red Hat Enterprise Linux.

Comment 53 RHEL Program Management 2011-01-12 15:20:35 UTC
This request was erroneously denied for the current release of
Red Hat Enterprise Linux.  The error has been fixed and this
request has been re-proposed for the current release.

Comment 56 Miroslav Rezanina 2011-10-21 09:24:50 UTC
Closing this bz as there's still problems with default vlan handling. Anyway, vlans are part of more complicated setup that would be solved by modified setup script best fitting concrete configuration instead of basic default support.


Note You need to log in before you can comment on or make changes to this bug.