Description of problem: We are using RHOSP7. Following is the problem we found 1. dhcp name space is on overcloud-controller-0 [heat-admin@overcloud-controller-0 ~]$ sudo ip netns qdhcp-30caeaf5-ef0a-4d36-8e45-8b8e7cbc3a33 [heat-admin@overcloud-controller-0 ~]$ sudo ip netns exec qdhcp-30caeaf5-ef0a-4d36-8e45-8b8e7cbc3a33 ifconfig lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 0 (Local Loopback) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 tap2fb0d546-8b: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 1.1.1.2 netmask 255.255.255.0 broadcast 1.1.1.255 inet6 fe80::f816:3eff:fe16:9a01 prefixlen 64 scopeid 0x20<link> ether fa:16:3e:16:9a:01 txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 8 bytes 648 (648.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 2. The ovs bridges on overcloud-controller-0 is as following [heat-admin@overcloud-controller-0 ~]$ sudo ovs-vsctl show 81f41d74-2d75-4c8a-81ff-e6dfb4462bfe Bridge br-ex Port "vlan203" tag: 203 Interface "vlan203" type: internal Port "bond1" Interface "eth2" Interface "eth3" Port br-ex Interface br-ex type: internal Port "vlan202" tag: 202 Interface "vlan202" type: internal Port "vlan201" tag: 201 Interface "vlan201" type: internal Port phy-br-ex Interface phy-br-ex type: patch options: {peer=int-br-ex} Port "vlan100" tag: 100 Interface "vlan100" type: internal Bridge br-bond Port br-bond Interface br-bond type: internal Port phy-br-bond Interface phy-br-bond type: patch options: {peer=int-br-bond} Bridge br-int fail_mode: secure Port "tap2fb0d546-8b" tag: 1 Interface "tap2fb0d546-8b" type: internal Port int-br-bond Interface int-br-bond type: patch options: {peer=phy-br-bond} Port br-int Interface br-int type: internal Port int-br-ex Interface int-br-ex type: patch options: {peer=phy-br-ex} ovs_version: "2.3.1-git3282e51" 3. bond1 looks alright [heat-admin@overcloud-controller-0 ~]$ sudo ovs-appctl lacp/show bond1 ---- bond1 ---- status: active negotiated sys_id: a0:36:9f:67:d4:a8 sys_priority: 65534 aggregation key: 1 lacp_time: fast slave: eth2: current attached port_id: 1 port_priority: 65535 may_enable: true actor sys_id: a0:36:9f:67:d4:a8 actor sys_priority: 65534 actor port_id: 1 actor port_priority: 65535 actor key: 1 actor state: activity timeout aggregation synchronized collecting distributing partner sys_id: 5c:16:c7:02:00:01 partner sys_priority: 32768 partner port_id: 37 partner port_priority: 32768 partner key: 3 partner state: activity timeout aggregation synchronized collecting distributing slave: eth3: current attached port_id: 2 port_priority: 65535 may_enable: true actor sys_id: a0:36:9f:67:d4:a8 actor sys_priority: 65534 actor port_id: 2 actor port_priority: 65535 actor key: 1 actor state: activity timeout aggregation synchronized collecting distributing partner sys_id: 5c:16:c7:02:00:01 partner sys_priority: 32768 partner port_id: 237 partner port_priority: 32768 partner key: 3 partner state: activity timeout aggregation synchronized collecting distributing 4. eth3 of bond1 receives dhcp request [heat-admin@overcloud-controller-0 ~]$ sudo tcpdump -i eth3 -e -n -v port 67 or port 68 tcpdump: WARNING: eth3: no IPv4 address assigned tcpdump: listening on eth3, link-type EN10MB (Ethernet), capture size 65535 bytes 17:01:42.006142 fa:16:3e:47:60:d5 > Broadcast, ethertype 802.1Q (0x8100), length 336: vlan 50, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 318) 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:47:60:d5, length 290, xid 0xe959f348, Flags [none] Client-Ethernet-Address fa:16:3e:47:60:d5 Vendor-rfc1048 Extensions Magic Cookie 0x63825363 DHCP-Message Option 53, length 1: Discover Client-ID Option 61, length 7: ether fa:16:3e:47:60:d5 MSZ Option 57, length 2: 576 Parameter-Request Option 55, length 9: Subnet-Mask, Default-Gateway, Domain-Name-Server, Hostname Domain-Name, MTU, BR, NTP Classless-Static-Route Vendor-Class Option 60, length 12: "udhcp 1.20.1" Hostname Option 12, length 6: "cirros" 17:02:42.073216 fa:16:3e:47:60:d5 > Broadcast, ethertype 802.1Q (0x8100), length 336: vlan 50, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 318) 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:47:60:d5, length 290, xid 0xe959f348, secs 60, Flags [none] Client-Ethernet-Address fa:16:3e:47:60:d5 Vendor-rfc1048 Extensions Magic Cookie 0x63825363 DHCP-Message Option 53, length 1: Discover Client-ID Option 61, length 7: ether fa:16:3e:47:60:d5 MSZ Option 57, length 2: 576 Parameter-Request Option 55, length 9: Subnet-Mask, Default-Gateway, Domain-Name-Server, Hostname Domain-Name, MTU, BR, NTP Classless-Static-Route Vendor-Class Option 60, length 12: "udhcp 1.20.1" Hostname Option 12, length 6: "cirros" 17:03:42.135464 fa:16:3e:47:60:d5 > Broadcast, ethertype 802.1Q (0x8100), length 336: vlan 50, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 318) 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:47:60:d5, length 290, xid 0xe959f348, secs 120, Flags [none] Client-Ethernet-Address fa:16:3e:47:60:d5 Vendor-rfc1048 Extensions Magic Cookie 0x63825363 DHCP-Message Option 53, length 1: Discover Client-ID Option 61, length 7: ether fa:16:3e:47:60:d5 MSZ Option 57, length 2: 576 Parameter-Request Option 55, length 9: Subnet-Mask, Default-Gateway, Domain-Name-Server, Hostname Domain-Name, MTU, BR, NTP Classless-Static-Route Vendor-Class Option 60, length 12: "udhcp 1.20.1" Hostname Option 12, length 6: "cirros" 5. However, at the same time, bond1 is not seeing the above dhcp request [heat-admin@overcloud-controller-0 ~]$ sudo tcpdump -i bond1 -e -n -v port 67 or port 68 tcpdump: WARNING: bond1: no IPv4 address assigned tcpdump: listening on bond1, link-type EN10MB (Ethernet), capture size 65535 bytes ^C 0 packets captured 0 packets received by filter 0 packets dropped by kernel 6. of course, dhcp name space is not seeing the dhcp request either [heat-admin@overcloud-controller-0 ~]$ sudo ip netns exec qdhcp-30caeaf5-ef0a-4d36-8e45-8b8e7cbc3a33 tcpdump -i tap2fb0d546-8b -n -e -v port 67 or port 68 tcpdump: listening on tap2fb0d546-8b, link-type EN10MB (Ethernet), capture size 65535 bytes ^C 0 packets captured 0 packets received by filter 0 packets dropped by kernel Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
*** Bug 1252215 has been marked as a duplicate of this bug. ***
*** Bug 1252218 has been marked as a duplicate of this bug. ***
The behavior you are seeing where the packets do not appear on the "bond1" interface is expected. OVS will make these packets available on the "br-ex", and "vlan*" interfaces. Doing a packet trace on the physical port will show you the actual packets that cross the interface, no matter which VLAN they are in. If you look at each captured frame, you will notice that the VLAN tag is noted for each frame. I see the DHCP requests are coming in on VLAN 50. I would like some additional information about the environment, can you fill in the following information? What is the value of "network_vlan_ranges =" in the /etc/neutron/plugin.ini file? What is the output of this command for the particular network where the DHCP requests are coming from? # neutron net-show --fields provider:segmentation_id <network name> Why is there a "br-bond" bridge, and why does it not have any ports attached? At this point, I will probably need to see copies of the network-environment.yaml, controller.yaml (NIC config), and compute.yaml (NIC config). Can you possibly attach those to this ticket? Thank you in advance.
Thanks a lot Dan. That setup is gone. I'll provide what I have 1. What is the value of "network_vlan_ranges =" in the /etc/neutron/plugin.ini file? network_vlan_ranges = vlan:50:90 2. neutron net-show --fields provider:segmentation_id <network name> The setup is gone. However, following is what I did 1) create the first project, 2) create the first network, 3) start the first instance. That instance is brought up on a compute node and a dhcp port is brought up on a controller node. The network is using vlan 50. The vm instances sends out dhcp requests and as you already see, these requests reach the controller node. 3. The yaml files are listed below network-environment.yaml https://bigswitch.box.com/shared/static/glqof7bbgclv7sne7t5njg05d4b4seii.yaml controller.yaml https://bigswitch.box.com/shared/static/a5lm4geey305l90bouq94npm4dx4bf6b.yaml compute.yaml https://bigswitch.box.com/shared/static/pasagxr0b6c1ql0xzq1ldaiq6s1zt2y5.yaml 4. The command I'm using is openstack overcloud deploy -e /home/stack/network-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml --neutron-bridge-mappings datacentre:br-ex,vlan:br-bond --neutron-network-type vlan --neutron-network-vlan-ranges vlan:50:90 --neutron-disable-tunneling --compute-scale 1 --control-scale 3 --ceph-storage-scale 0 --plan overcloud --control-flavor control --compute-flavor compute --ntp-server 0.rhel.pool.ntp.org --debug
I think we may have a bug in the bonded NIC templates. This will require some validation, can you check the value of this parameter in /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini: bridge_mappings = My concern is that we may need to match the bridge name on the compute and controller hosts, and it looks like the configuration defaults to different bridge names on the controller (default is br-ex on controller, br-bond on compute). If this turns out to be the root cause, a workaround for this is to set the --neutron-physical-bridge to br-bond at deployment, then set the bridge mappings accordingly: openstack overcloud deploy \ --neutron-network-type vlan \ --neutron-physical-bridge br-bond \ --neutron-bridge-mappings datacentre:br-bond \ --neutron-network-vlan-ranges datacentre:<start>:<end> \ [...]
Got it. We are using the GA code to set up an new environment now. Will verify it as soon as we can. Thanks Dan!
I found a couple of issues with the network-environment.yaml, it's missing some parameters that were added after Beta 2. These changes wouldn't affect tenant VLAN communication, but might affect floating IPs. I have modified the network-environment.yaml and attached it to this ticket. The changes were: resource_registry: # Port assignment for the Redis VIP on isolated network (defaults to Internal API) OS::TripleO::Controller::Ports::RedisVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/vip.yaml # Ensures that the floating IPs are placed on br-int for VLAN when using Tuskar parameters: # Set this to "br-bond" if using native VLAN for floating IPs Controller-1::NeutronExternalNetworkBridge: "''" parameter_defaults: # Ensure that the floating IPs are placed on br-int for VLAN when using Heat # Set this to "br-bond" if using native VLAN for floating IPs NeutronExternalNetworkBridge: "''"
Created attachment 1061656 [details] network-environment.yaml (modified)
Upstream fix proposed for this bug: https://review.openstack.org/#/c/213861/
The system that is using GA bits is alive. I'm still seeing the same problem. The command I'm using is openstack overcloud deploy -e /home/stack/network-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml --neutron-bridge-mappings datacentre:br-bond --neutron-network-type vlan --neutron-physical-bridge br-bond --neutron-network-vlan-ranges datacentre:50:90 --neutron-disable-tunneling --compute-scale 1 --control-scale 1 --ceph-storage-scale 0 --templates /home/stack/templates/my-overcloud --control-flavor control --compute-flavor compute --ntp-server 0.rhel.pool.ntp.org --debug After deployment, the bridge on controller node is still at br-ex. The bridge on compute is at br-bond. I just saw the fix. If I make the same change to compute.yaml and deploy overcloud again, will that fix this problem? [heat-admin@overcloud-controller-0 ~]$ sudo ovs-vsctl show 0953f59e-2dc5-4968-926b-e54ad35d6d82 Bridge br-bond Port phy-br-bond Interface phy-br-bond type: patch options: {peer=int-br-bond} Port br-bond Interface br-bond type: internal Bridge br-int fail_mode: secure Port int-br-bond Interface int-br-bond type: patch options: {peer=phy-br-bond} Port br-int Interface br-int type: internal Port "tap6eb97be3-c9" tag: 1 Interface "tap6eb97be3-c9" Bridge br-ex Port "vlan100" tag: 100 Interface "vlan100" type: internal Port "vlan203" tag: 203 Interface "vlan203" type: internal Port br-ex Interface br-ex type: internal Port "vlan202" tag: 202 Interface "vlan202" type: internal Port "vlan201" tag: 201 Interface "vlan201" type: internal Port "bond1" Interface "p1p1" Interface "p1p2" ovs_version: "2.3.1-git3282e51" [heat-admin@overcloud-compute-0 ~]$ sudo ovs-vsctl show f6644566-f5e5-4dcb-99ec-b5bbe9ac2178 Bridge br-int fail_mode: secure Port int-br-bond Interface int-br-bond type: patch options: {peer=phy-br-bond} Port "qvo15e26b66-2b" tag: 2 Interface "qvo15e26b66-2b" Port br-int Interface br-int type: internal Bridge br-bond Port "vlan202" tag: 202 Interface "vlan202" type: internal Port "vlan201" tag: 201 Interface "vlan201" type: internal Port phy-br-bond Interface phy-br-bond type: patch options: {peer=int-br-bond} Port "bond1" Interface "eth3" Interface "eth2" Port br-bond Interface br-bond type: internal ovs_version: "2.3.1-git3282e51"
(In reply to bigswitch from comment #14) > After deployment, the bridge on controller node is still at br-ex. The > bridge on compute is at br-bond. I just saw the fix. If I make the same > change to compute.yaml and deploy overcloud again, will that fix this > problem? Manually applying the fix proposed upstream will have the result that both controller and compute will have the bond placed on "br-ex". In that case, you will not want to use the --neutron-physical-bridge parameter when deploying, and you can leave the default bridge mappings. So then the command line would become: openstack overcloud deploy -e /home/stack/network-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml --neutron-network-type vlan --neutron-network-vlan-ranges datacentre:50:90 --neutron-disable-tunneling --compute-scale 1 --control-scale 1 --ceph-storage-scale 0 --templates /home/stack/templates/my-overcloud --control-flavor control --compute-flavor compute --ntp-server 0.rhel.pool.ntp.org --debug If you manually apply the upstream fix to the templates and use the modified command line, that should have the desired result.
Thanks Dan! It works this time :)
To clarify the problem and workaround: The bond-with-vlans templates had an error in the compute.yaml template. The bridge name did not match the controller. When creating templates. make sure that the controller.yaml and compute.yaml both have the bridge name set to {get_input: bridge_name}, which will resolve to br-ex by default. Do not use the bond-with-vlans templates as-is.
To further explain the Redis VIP parameter: When deploying network isolation with the standard set of networks, it is possible to just include the environments/network-isolation.yaml. That will enable all 6 networks, and will create ports on the various networks for each role. This file also includes a parameter to set where the Redis VIP will listen. When using a custom set of networks/ports with network isolation (not using the standard 6 VLANs), then instead of including that file you enable the networks and ports in the network-environment.yaml. If you are not including the environments/network-isolation.yaml, then you will need to include this line in the resource_registry: section of the network-environment.yaml: # Port assignment for Redis VIP on isolated network (defaults to Internal API) OS::TripleO::Controller::Ports::RedisVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/vip.yaml That will place the Redis VIP on one of the isolated networks (Internal API by default, but can be overridden in the ServiceNetMap). If you don't include that entry, then the Redis VIP will be on the provisioning network, which is not recommended for production.
Setting the severity to urgent won't change anything as far as when it's delivered. This bug is currently ON_QA, which means the bug is has been patched by a developer, and our quality engineering group is testing the fix. This bug also has a 'blocker' flag on it, which means it's critical and it must be part of the 7.1 release.
stack@instack:~>>> rpm -qa | grep openstack-tripleo-heat-templates openstack-tripleo-heat-templates-0.8.6-69.el7ost.noarch stack@instack:~>>> grep bridge /usr/share/openstack-tripleo-heat-templates/network/config/bond-with-vlans/controller.yaml Software Config to drive os-net-config with 2 bonded nics on a bridge type: ovs_bridge name: {get_input: bridge_name} stack@instack:~>>> grep bridge /usr/share/openstack-tripleo-heat-templates/network/config/bond-with-vlans/compute.yaml Software Config to drive os-net-config with 2 bonded nics on a bridge type: ovs_bridge name: {get_input: bridge_name} stack@instack:~>>> nova list +--------------------------------------+------------------------+--------+------------+-------------+---------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+------------------------+--------+------------+-------------+---------------------+ | a6aa6760-f38a-4c49-932e-d4e747a804b0 | overcloud-compute-0 | ACTIVE | - | Running | ctlplane=192.0.2.21 | | a8f6d189-bc7d-4e2a-b6df-f9284c462dd7 | overcloud-controller-0 | ACTIVE | - | Running | ctlplane=192.0.2.23 | | c33c581b-ea01-4c25-9a76-b8549f145476 | overcloud-controller-1 | ACTIVE | - | Running | ctlplane=192.0.2.22 | | a68e0955-3ad1-4916-ae48-9aa2f3ca2457 | overcloud-controller-2 | ACTIVE | - | Running | ctlplane=192.0.2.24 | +--------------------------------------+------------------------+--------+------------+-------------+---------------------+ stack@instack:~>>> ssh heat-admin.2.21 'sudo ip link show dev br-ex' 4: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT link/ether 00:6f:da:f4:b4:cd brd ff:ff:ff:ff:ff:ff stack@instack:~>>> ssh heat-admin.2.23 'sudo ip link show dev br-ex' 4: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT link/ether 00:2b:4c:38:66:74 brd ff:ff:ff:ff:ff:ff
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2015:1862