Hide Forgot
Description of problem: Instance launched with interface on flat provider(external) network, DHCP offer arrives to qbr on qvb but never on tap, in qbr the interface mac address is learned on wrong port. Communication on tenant(internal) network works perfectly fine. This behavior is seen on either flat or isolated network with controller and compute switch ports tagged under one vlan. # brctl showmacs qbr132fa062-6d port no mac addr is local? ageing timer 1 3e:a7:c0:86:39:70 yes 0.00 1 3e:a7:c0:86:39:70 yes 0.00 1 fa:16:3e:e1:85:7e no 0.70 2 fe:16:3e:e1:85:7e yes 0.00 2 fe:16:3e:e1:85:7e yes 0.00 And thus the packets are not further forwarded and are looped back to qvb. If the IP is set manually,the communication works, but is very unstable as the mac address switches randomly between the ports. # brctl showmacs qbr185db8c7-46 | grep "16:3e:93:3b:61" 1 fa:16:3e:93:3b:61 no 3.29 2 fe:16:3e:93:3b:61 yes 0.00 2 fe:16:3e:93:3b:61 yes 0.00 # brctl showmacs qbr185db8c7-46 | grep "16:3e:93:3b:61" 2 fa:16:3e:93:3b:61 no 0.32 2 fe:16:3e:93:3b:61 yes 0.00 2 fe:16:3e:93:3b:61 yes 0.00 # brctl showmacs qbr185db8c7-46 | grep "16:3e:93:3b:61" 2 fa:16:3e:93:3b:61 no 0.61 2 fe:16:3e:93:3b:61 yes 0.00 2 fe:16:3e:93:3b:61 yes 0.00 64 bytes from 172.19.243.117: icmp_seq=53 ttl=64 time=0.085 ms 64 bytes from 172.19.243.117: icmp_seq=54 ttl=64 time=0.086 ms 64 bytes from 172.19.243.117: icmp_seq=55 ttl=64 time=0.082 ms <--- 64 bytes from 172.19.243.117: icmp_seq=68 ttl=64 time=0.190 ms <--- 64 bytes from 172.19.243.117: icmp_seq=69 ttl=64 time=0.066 ms 64 bytes from 172.19.243.117: icmp_seq=70 ttl=64 time=0.095 ms 64 bytes from 172.19.243.117: icmp_seq=71 ttl=64 time=0.064 ms Workaround: set the bridge to work in "hub" mode instead of "switch" mode with setting ageing time to 0, and disable the mac learning table, so all the packets are forwarded to all the ports inside the bridge: # brctl setageing qbr132fa062-6d 0 Check: # brctl showstp qbr132fa062-6d | grep "ageing time" ageing time 0.00 Version-Release number of selected component (if applicable): openstack-neutron-7.0.4-3.el7ost.noarch openstack-neutron-bigswitch-agent-2015.3.8-1.el7ost.noarch openstack-neutron-bigswitch-lldp-2015.3.8-1.el7ost.noarch openstack-neutron-common-7.0.4-3.el7ost.noarch openstack-neutron-lbaas-7.0.0-2.el7ost.noarch openstack-neutron-metering-agent-7.0.4-3.el7ost.noarch openstack-neutron-ml2-7.0.4-3.el7ost.noarch openstack-neutron-openvswitch-7.0.4-3.el7ost.noarch python-neutron-7.0.4-3.el7ost.noarch python-neutron-lbaas-7.0.0-2.el7ost.noarch python-neutronclient-3.1.0-1.el7ost.noarch How reproducible: always Steps to Reproduce: 1. launch instance with interface on provider net 2. instance fails to get DHCP offer or communicate 3. Actual results: no or partial communication Expected results: interface gets IP and can communicate normally Additional info:
Please provide an SOS report from a node experiencing this issue. If you could verify it's the most up to date version of SOS report that would be helpful. Can you describe how was this environment installed, and please confirm that we're talking about the standard ML2+OVS?
We used the git://github.com/sosreport/sos.git sosreport. They are available in collab-shell under /cases/01684953. The environment was installed using director, 1 controller and 1 compute. Yes, ML2+OVS used.
(In reply to Ondrej from comment #6) > We used the git://github.com/sosreport/sos.git sosreport. They are available > in collab-shell under /cases/01684953. The environment was installed using > director, 1 controller and 1 compute. Yes, ML2+OVS used. Can you show the output for 'neutron net-show %s' for the problematic provider network?
(In reply to Ondrej from comment #6) > We used the git://github.com/sosreport/sos.git sosreport. They are available > in collab-shell under /cases/01684953. The environment was installed using > director, 1 controller and 1 compute. Yes, ML2+OVS used. Also looking at the SOS report on the compute node it looks like there are no VMs running? We'd need an SOS report taken *while the problem manifests*.
(In reply to Assaf Muller from comment #8) > (In reply to Ondrej from comment #6) > > We used the git://github.com/sosreport/sos.git sosreport. They are available > > in collab-shell under /cases/01684953. The environment was installed using > > director, 1 controller and 1 compute. Yes, ML2+OVS used. > > Also looking at the SOS report on the compute node it looks like there are > no VMs running? We'd need an SOS report taken *while the problem manifests*. I found sosreport-20160818-134310 which has 2 tap devices on the compute node. We still need to know which Neutron network is problematic and the output of its 'neutron net-show'.
Hi,providing the info: neutron net-show ext-net +---------------------------+--------------------------------------+ | Field | Value | +---------------------------+--------------------------------------+ | admin_state_up | True | | id | 0113ec13-832d-48ea-9cc6-f28b43252d69 | | mtu | 0 | | name | ext-net | | port_security_enabled | True | | provider:network_type | flat | | provider:physical_network | datacentre | | provider:segmentation_id | | | qos_policy_id | | | router:external | True | | shared | False | | status | ACTIVE | | subnets | 974f0fcd-71a1-4959-90db-58b5f2b03abe | | tenant_id | fedcdd99268f4cd8b218cec7949e8b0e | +---------------------------+--------------------------------------+
(In reply to Ondrej from comment #6) > We used the git://github.com/sosreport/sos.git sosreport. They are available > in collab-shell under /cases/01684953. The environment was installed using > director, 1 controller and 1 compute. Yes, ML2+OVS used. Can you also show 'neutron port-list' and 'neutron port-show' for every relevant VM and DHCP port?
(In reply to Ondrej from comment #6) > We used the git://github.com/sosreport/sos.git sosreport. They are available > in collab-shell under /cases/01684953. The environment was installed using > director, 1 controller and 1 compute. Yes, ML2+OVS used. Can you also attach an unfiltered tcpdump on the problematic tap device?