Bug 1249128

Summary: [Director] can't ping tenant vm in a virt env (no net isolation) with overcloud tenant vlan networking
Product: Red Hat OpenStack Reporter: Marios Andreou <mandreou>
Component: documentationAssignee: RHOS Documentation Team <rhos-docs>
Status: CLOSED EOL QA Contact: RHOS Documentation Team <rhos-docs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.0 (Kilo)CC: dsneddon, gfidente, kbasil, mburns, rhel-osp-director-maint, srevivo
Target Milestone: y2Keywords: Documentation, ZStream
Target Release: 7.0 (Kilo)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
Cause: When deploying with Neutron VLAN-mode tenant networking, the tenant VLANs must be on an OVS bridge. Consequence: When deploying, a NIC template that uses a bridge will need to be used. If network isolation is not used, then we will need to change which template gets used for the compute nodes (and possibly controllers). Workaround (if any): If the NIC configuration is like this: NIC: Provisioning (Native VLAN) NIC: External (Native VLAN), Tenant VLAN Range (tagged) Then we can simply load the net-config-bridge.yaml for the compute node: Step 1) Create a network-environment.yaml with only the following content: resource_registry: OS::TripleO::Compute::Net::SoftwareConfig: /usr/share/openstack-tripleo-heat-templates/net-config-bridge.yaml Step 2) Deploy with the following command-line parameters (e.g. with a tenant VLAN range of 100-200): openstack overcloud deploy \ --neutron-network-type vlan \ --neutron-public-interface <NIC> \ --neutron-bridge-mappings datacentre:br-ex \ --neutron-network-vlan-ranges datacentre:100:200 \ --neutron-disable-tunneling \ -e /home/stack/network-environment.yaml I suspect that some users will want to separate the external and tenant VLAN interfaces. In order to do that, the net-config-bridge.yaml will have to be modified (see attached file net-config-2-bridges.yaml). The tenant VLAN interface should be placed on a new bridge (e.g. br-tenant), and the extra NIC added to that bridge. For example, to use the attached file for both controller and compute: resource_registry: OS::TripleO::Controller::Net::SoftwareConfig: net-config-2-bridges.yaml OS::TripleO::Compute::Net::SoftwareConfig: net-config-2-bridges.yaml The new bridge does not need an IP, so it will be set to use_dhcp: false. That will create an unnumbered bridge interface that can then be included in Neutron with different CLI parameters: openstack overcloud deploy \ --neutron-network-type vlan \ --neutron-public-interface <NIC> \ --neutron-bridge-mappings datacentre:br-ex,tenant:br-tenant --neutron-network-vlan-ranges tenant:100:200 \ --neutron-disable-tunneling \ -e /home/stack/network-environment.yaml Result: VLAN-mode tenant networks can be used without network isolation as long as: the tenant VLAN interface is placed on an OVS bridge; the Neutron bridge mappings are included for that bridge; and the tenant VLAN range is identified. The NIC config template will create the bridge, and the CLI parameters will define the bridge mapping and VLAN ranges. Make sure to use the correct value for --neutron-public-interface, this will correspond to the NIC where the external network (br-ex) is attached.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-08-17 12:30:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
neutron config and ovs state
none
neutron config and ovs state (devtest)
none
net-config-2-bridges.yaml none

Description Marios Andreou 2015-07-31 14:54:51 UTC
Created attachment 1058070 [details]
neutron config and ovs state

Description of problem:

Cant ping overcloud instance when deploying with tenant vlans like

openstack overcloud deploy --plan overcloud --control-scale 1 --compute-scale 1 --neutron-network-type vlan --neutron-disable-tunneling

This may be due to my setup of the overcloud tenant networking, hence this bug as a sanity check and to track the issue, if there is one.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.openstack overcloud deploy --plan overcloud --control-scale 1 --compute-scale 1 --neutron-network-type vlan --neutron-disable-tunneling

2. try the ping test like [1] below
 

Actual results:
cannot ping instance floating ip

Expected results:
ping test should complete.

Additional info:

I launch an instance and try to ping it on a floating ip address like [1] - fails (Unreachable). This test completes fine for me when I instead use gre or vxlan tunnels for the overcloud tenant networking. I am pretty confident that we are configuring the right things in neutron (see attachment). 

I am not so sure about the state of the switch on the compute node (but lgtm on controller, see attachment - in fact can ping the tenant router happily from the undercloud just fine) - I've attached the state of the switch on both nodes when the instance has been successfully spawned. Are we missing an interface on br-ex on the compute node? Otherwise I can't see how the instance is reachable (hence the issue).

I can ping the router interface fine from the undercloud. Pinging the instance:

From 192.0.2.46 icmp_seq=1 Destination Host Unreachable

Can anyone else recreate this? Am I doing it wrong with the overcloud tenant networking (is there some vlan magic I miss?).

appreciate any feedback.

Comment 3 Marios Andreou 2015-07-31 14:56:11 UTC
[1] 
glance image-create --name user --is-public True --disk-format qcow2     --container-format bare --file fedora-user.qcow2 
NETWORK_CIDR='10.0.0.0/8'
OVERCLOUD_NAMESERVER='8.8.8.8'
FLOATING_IP_CIDR='192.0.2.0/24'
FLOATING_IP_START='192.0.2.45'
FLOATING_IP_END='192.0.2.64'
BM_NETWORK_GATEWAY='192.0.2.1'

NETWORK_JSON=$(mktemp)
jq "." <<EOF > $NETWORK_JSON
{
    "float": {
        "cidr": "$NETWORK_CIDR",
        "name": "default-net",
        "nameserver": "$OVERCLOUD_NAMESERVER"
    },
    "external": {
        "name": "ext-net",
        "cidr": "$FLOATING_IP_CIDR",
        "allocation_start": "$FLOATING_IP_START",
        "allocation_end": "$FLOATING_IP_END",
        "gateway": "$BM_NETWORK_GATEWAY"
    }
}
EOF
setup-neutron -n $NETWORK_JSON
neutron net-list

NET_ID=$(neutron net-list -f csv --quote none | grep default-net | cut -d, -f1)

if ! nova keypair-show default 2>/dev/null; then tripleo user-config; fi
nova boot --poll --key-name default --flavor m1.demo --image user --nic net-id=$NET_ID demo
PRIVATEIP=$(nova list | grep demo | awk -F"default-net=" '{print $2}' | awk '{print $1}')
tripleo wait_for 10 5 neutron port-list -f csv -c id --quote none \| grep id
PORT=$(neutron port-list | grep $PRIVATEIP | cut -d'|' -f2)
FLOATINGIP=$(neutron floatingip-create ext-net --port-id "${PORT//[[:space:]]/}" | awk '$2=="floating_ip_address" {print $4}')
SECGROUPID=$(nova secgroup-list | grep default | cut -d ' ' -f2)
neutron security-group-rule-create $SECGROUPID --protocol icmp \
    --direction ingress --port-range-min 8 || true
neutron security-group-rule-create $SECGROUPID --protocol tcp \
    --direction ingress --port-range-min 22 --port-range-max 22 || true
    
(wait for address to associate ok in nova list):
ping  $FLOATINGIP

Comment 4 Giulio Fidente 2015-07-31 16:21:31 UTC
The only thing I can notice from the attached output is that compute and controller seem to have interfaces plugged on different vlans (tag:1 controller, tag:2 compute). Trying to reproduce with devtest.

Comment 5 Marios Andreou 2015-07-31 16:27:09 UTC
FYI (running this now), just launched instance and floatip as above. the tags seem ok (could be copy/paste from different runs above @gfidente)

like CONTROL:
Every 2.0s: ovs-vsctl show                                              Fri Jul 31 12:26:37 2015

200fb223-fc01-4873-a4e6-029a47d2c38c
    Bridge br-int
        fail_mode: secure
        Port br-int
            Interface br-int
                type: internal
        Port "qr-fbdb0b97-bc"
            tag: 1
            Interface "qr-fbdb0b97-bc"
                type: internal
        Port int-br-ex
            Interface int-br-ex
                type: patch
                options: {peer=phy-br-ex}
        Port "tapc16a2d86-75"
            tag: 1
            Interface "tapc16a2d86-75"
                type: internal
    Bridge br-ex
        Port phy-br-ex
            Interface phy-br-ex
                type: patch
                options: {peer=int-br-ex}
        Port br-ex
            Interface br-ex
                type: internal
        Port "qg-a693a4dd-38"
            Interface "qg-a693a4dd-38"
                type: internal
        Port "eth0"
            Interface "eth0"
    ovs_version: "2.3.1-git3282e51"



COMPUTE 

Every 2.0s: ovs-vsctl show                                              Fri Jul 31 12:26:48 2015

e34bdcd6-473c-4b86-b8d0-40f56c466779
    Bridge br-int
        fail_mode: secure
        Port int-br-ex
            Interface int-br-ex
                type: patch
                options: {peer=phy-br-ex}
        Port "qvo067be755-3f"
            tag: 1
            Interface "qvo067be755-3f"
        Port br-int
            Interface br-int
                type: internal
    Bridge br-ex
        Port phy-br-ex
            Interface phy-br-ex
                type: patch
                options: {peer=int-br-ex}
        Port br-ex
            Interface br-ex
                type: internal
    ovs_version: "2.3.1-git3282e51"

Comment 6 Giulio Fidente 2015-07-31 17:02:47 UTC
Created attachment 1058100 [details]
neutron config and ovs state (devtest)

The issue reproduces deploying with devtest; even though in that case both controller and compute had the interfaces plugged on the same vlan tag (tag: 1); that is probably unrelevant.

The overcloud instance does not seem to be able to acquire the IP via dhcp request.

Comment 7 Marios Andreou 2015-08-03 12:14:35 UTC
was able to ping the vm by moving (for the compute node) the ip address from eth0 to the br-ex and adding eth0 to the bridge (i.e. like we do for the controller). still poking/verifying tidying up will update the bug again later.

Comment 8 Marios Andreou 2015-08-03 14:59:13 UTC
The problem is that for the vlan case, since there is no tunneling bridge defined, there is no way for ovs on the compute nodes to speak to the outside world. Whilst poking I realised that we can move the eth0 interface onto the br-ex bridge (and its MAC and also IP address) like we do for the controller so that instances can talk  to the outside world (from the integration bridge, and then via br-ex which now has access).

Workaround for now if you want to deploy with neutron tenant vlan. You need to write the following to a local file (call it whatever you want but change the name accordingly in the deploy command). I have defined a local file called 'vlan_env.yaml' with the following contents:

resource_registry:
  OS::TripleO::Compute::Net::SoftwareConfig: net-config-bridge.yaml

You also need to copy the net-config-bridge.yaml definition to wherever the vlan_env.yaml file has been created:

cp /usr/share/openstack-tripleo-heat-templates/net-config-bridge.yaml ./

Then deploy specifying the vlan options, and passing the custom environment file:

openstack overcloud deploy --plan overcloud --control-scale 1 --compute-scale 1 --neutron-network-type vlan --neutron-disable-tunneling -e vlan_env.yaml

I was able to spawn an overcloud instance instance using the notes above at comment 3. I was able to ping to/from the vm, 2 vms could ping each other, and the outside world.

State of the switches (see eth0 on the br-ex of compute, as on control)

COMPUTE:
Every 2.0s: ovs-vsctl show                                                                                                Mon Aug  3 10:42:26 2015

52643364-4136-4eef-b84b-e93686f2274c
    Bridge br-int
        fail_mode: secure
        Port "qvof889a1ac-92"
            tag: 2
            Interface "qvof889a1ac-92"
        Port br-int
            Interface br-int
                type: internal
        Port int-br-ex
            Interface int-br-ex
                type: patch
                options: {peer=phy-br-ex}
    Bridge br-ex
        Port phy-br-ex
            Interface phy-br-ex
                type: patch
                options: {peer=int-br-ex}
        Port "eth0"
            Interface "eth0"
        Port br-ex
            Interface br-ex
                type: internal
    ovs_version: "2.3.1-git3282e51"

Every 2.0s: ovs-vsctl show                                                                                                                           Mon Aug  3 10:41:08 2015


CONTROL:
c460a217-afd1-4c2b-baa3-39166695016c
    Bridge br-ex
        Port "qg-f2893472-97"
            Interface "qg-f2893472-97"
                type: internal
        Port "eth0"
            Interface "eth0"
        Port phy-br-ex
            Interface phy-br-ex
                type: patch
                options: {peer=int-br-ex}
        Port br-ex
            Interface br-ex
                type: internal
    Bridge br-int
        fail_mode: secure
        Port "qr-9c47af37-9d"
            tag: 1
            Interface "qr-9c47af37-9d"
                type: internal
        Port "tap484c7b56-e2"
            tag: 1
            Interface "tap484c7b56-e2"
                type: internal
        Port int-br-ex
            Interface int-br-ex
                type: patch
                options: {peer=phy-br-ex}
        Port br-int
            Interface br-int
                type: internal
    ovs_version: "2.3.1-git3282e51"

Comment 9 Dan Sneddon 2015-08-03 15:20:27 UTC
I'm not sure this is anything but a doc bug. The expectation has always been that net-config-bridge.yaml would be used for compute nodes when using VLAN mode tenant networking without network isolation. That requirement comes from upstream.

Comment 10 Marios Andreou 2015-08-03 15:28:47 UTC
Great thanks for the sanity check Dan! The trouble is with the Tuskar deployment workflow we can't override the resource registry other than with an environment file so yes in the very least we need to document this (otherwise, the expectation is specifying --tenant-network-types as vlan and you're good to go)

Comment 11 chris alfonso 2015-08-03 19:01:26 UTC
Keith, Just want to check in here with you. This looks like we can say we require network isolation for VLAN-mode networking.

Comment 12 Dan Sneddon 2015-08-03 20:06:57 UTC
(In reply to chris alfonso from comment #11)
> Keith, Just want to check in here with you. This looks like we can say we
> require network isolation for VLAN-mode networking.

I think we will require network isolation for some use cases here, but there is a simple use case where this should work (for a POC, for instance):

The network switches will have to be configured like this:

NIC: Provisioning (Native VLAN)
NIC: External (Native VLAN), Tenant VLAN Range (tagged)

Any NIC can be configured for either role, but the tenant VLANs and external network must share the br-ex bridge.

The deployment can then make use of the NIC with the tenant VLANs like this:

Step 1) Create a network-environment.yaml with only the following content:

resource_registry:
  OS::TripleO::Compute::Net::SoftwareConfig: net-config-noop.yaml

Step 2) Deploy with the following command-line parameters (e.g. with a tenant VLAN range of 100-200):

openstack overcloud deploy \
--neutron-network-type vlan \
--neutron-public-interface <NIC> \
--neutron-bridge-mappings datacentre:br-ex \
--neutron-network-vlan-ranges datacentre:100:200 \
--neutron-disable-tunneling

I suspect that some users will want to separate the external and tenant VLAN interfaces. In order to do that, the net-config-bridge.yaml will have to be modified. The tenant VLAN interface should be placed on a new bridge (e.g. br-tenant), and the extra NIC added to that bridge. Disable IP on the new bridge by setting use_dhcp: false. That will create an unnumbered bridge interface that can then be included with different CLI parameters:


openstack overcloud deploy \
--neutron-network-type vlan \
--neutron-public-interface <NIC> \
--neutron-bridge-mappings datacentre:br-ex,tenant:br-tenant
--neutron-network-vlan-ranges tenant:100:200 \
--neutron-disable-tunneling

Comment 13 Dan Sneddon 2015-08-03 20:13:08 UTC
I have attached a sample file named "net-config-2-bridges.yaml" which shows an example of using a dedicated interface (nic3 in the example) for tenant networking without network isolation.

This file would be used for both controller and compute nodes, so you would need a network-environment.yaml with the following content:

resource_registry:
  OS::TripleO::Compute::Net::SoftwareConfig: net-config-2-bridges.yaml
  OS::TripleO::Controller::Net::SoftwareConfig: net-config-2-bridges.yaml

(This assumes that the net-config-2-bridges.yaml file is in /home/stack, along with the network-environment.yaml file)

Then deploy with:

openstack overcloud deploy \
--neutron-network-type vlan \
--neutron-public-interface <NIC> \
--neutron-bridge-mappings datacentre:br-ex,tenant:br-tenant
--neutron-network-vlan-ranges tenant:100:200 \
--neutron-disable-tunneling \
-e /home/stack/network-environment.yaml

Comment 14 Dan Sneddon 2015-08-03 20:15:10 UTC
(In reply to Dan Sneddon from comment #12)

One quick note here. When using a network-environment.yaml file to declare the NIC configuration template to use, it should be either a full path, or the file should be placed in the same directory as the network-environment.yaml. Here is the corrected resource_registry entry when using network-environment.yaml:

resource_registry:
  OS::TripleO::Compute::Net::SoftwareConfig:
    /usr/share/openstack-tripleo-heat-templates/net-config-noop.yaml

Comment 15 Dan Sneddon 2015-08-03 20:15:33 UTC
Created attachment 1058888 [details]
net-config-2-bridges.yaml

Comment 16 Dan Sneddon 2015-08-03 20:26:18 UTC
(In reply to Dan Sneddon from comment #12)

Comments #12 and #13 incorrectly included the 'net-config-noop.yaml' file, when I meant to include the 'net-config-bridge.yaml'.  I updated the doc text to reflect the correct template.

To recap, when using only two interfaces (ctlplane and external/tenant), just include the net-config-bridge.yaml with the full path for the Compute template and deploy with the datacentre:br-ex bridge mappings.

When using three interfaces (ctlplane, external, tenant), use the attached net-config-2-bridges.yaml (or similar) to create a separate OVS bridge for a dedicated adapter on both the compute and controller. Then deploy with the tenant:br-tenant bridge mapping.

When the network gets more complicated than that (external on VLAN, external shared with provisioning, etc.) then we need network isolation to support those features.

Comment 17 chris alfonso 2015-08-31 16:53:51 UTC
Dan Macpherson, Is this a candidate for the advanced deployment section?

Comment 18 Scott Lewis 2018-08-17 12:30:41 UTC
OSP 7 has reached its retirement, please see https://access.redhat.com/errata/RHBA-2018:2362

Comment 19 Red Hat Bugzilla 2023-09-14 03:02:54 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days