Bug 1249128
Summary: | [Director] can't ping tenant vm in a virt env (no net isolation) with overcloud tenant vlan networking | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Marios Andreou <mandreou> | ||||||||
Component: | documentation | Assignee: | RHOS Documentation Team <rhos-docs> | ||||||||
Status: | CLOSED EOL | QA Contact: | RHOS Documentation Team <rhos-docs> | ||||||||
Severity: | unspecified | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 7.0 (Kilo) | CC: | dsneddon, gfidente, kbasil, mburns, rhel-osp-director-maint, srevivo | ||||||||
Target Milestone: | y2 | Keywords: | Documentation, ZStream | ||||||||
Target Release: | 7.0 (Kilo) | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Known Issue | |||||||||
Doc Text: |
Cause: When deploying with Neutron VLAN-mode tenant networking, the tenant VLANs must be on an OVS bridge.
Consequence: When deploying, a NIC template that uses a bridge will need to be used. If network isolation is not used, then we will need to change which template gets used for the compute nodes (and possibly controllers).
Workaround (if any): If the NIC configuration is like this:
NIC: Provisioning (Native VLAN)
NIC: External (Native VLAN), Tenant VLAN Range (tagged)
Then we can simply load the net-config-bridge.yaml for the compute node:
Step 1) Create a network-environment.yaml with only the following content:
resource_registry:
OS::TripleO::Compute::Net::SoftwareConfig:
/usr/share/openstack-tripleo-heat-templates/net-config-bridge.yaml
Step 2) Deploy with the following command-line parameters (e.g. with a tenant VLAN range of 100-200):
openstack overcloud deploy \
--neutron-network-type vlan \
--neutron-public-interface <NIC> \
--neutron-bridge-mappings datacentre:br-ex \
--neutron-network-vlan-ranges datacentre:100:200 \
--neutron-disable-tunneling \
-e /home/stack/network-environment.yaml
I suspect that some users will want to separate the external and tenant VLAN interfaces. In order to do that, the net-config-bridge.yaml will have to be modified (see attached file net-config-2-bridges.yaml). The tenant VLAN interface should be placed on a new bridge (e.g. br-tenant), and the extra NIC added to that bridge. For example, to use the attached file for both controller and compute:
resource_registry:
OS::TripleO::Controller::Net::SoftwareConfig:
net-config-2-bridges.yaml
OS::TripleO::Compute::Net::SoftwareConfig:
net-config-2-bridges.yaml
The new bridge does not need an IP, so it will be set to use_dhcp: false. That will create an unnumbered bridge interface that can then be included in Neutron with different CLI parameters:
openstack overcloud deploy \
--neutron-network-type vlan \
--neutron-public-interface <NIC> \
--neutron-bridge-mappings datacentre:br-ex,tenant:br-tenant
--neutron-network-vlan-ranges tenant:100:200 \
--neutron-disable-tunneling \
-e /home/stack/network-environment.yaml
Result: VLAN-mode tenant networks can be used without network isolation as long as: the tenant VLAN interface is placed on an OVS bridge; the Neutron bridge mappings are included for that bridge; and the tenant VLAN range is identified. The NIC config template will create the bridge, and the CLI parameters will define the bridge mapping and VLAN ranges. Make sure to use the correct value for --neutron-public-interface, this will correspond to the NIC where the external network (br-ex) is attached.
|
Story Points: | --- | ||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2018-08-17 12:30:41 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
[1] glance image-create --name user --is-public True --disk-format qcow2 --container-format bare --file fedora-user.qcow2 NETWORK_CIDR='10.0.0.0/8' OVERCLOUD_NAMESERVER='8.8.8.8' FLOATING_IP_CIDR='192.0.2.0/24' FLOATING_IP_START='192.0.2.45' FLOATING_IP_END='192.0.2.64' BM_NETWORK_GATEWAY='192.0.2.1' NETWORK_JSON=$(mktemp) jq "." <<EOF > $NETWORK_JSON { "float": { "cidr": "$NETWORK_CIDR", "name": "default-net", "nameserver": "$OVERCLOUD_NAMESERVER" }, "external": { "name": "ext-net", "cidr": "$FLOATING_IP_CIDR", "allocation_start": "$FLOATING_IP_START", "allocation_end": "$FLOATING_IP_END", "gateway": "$BM_NETWORK_GATEWAY" } } EOF setup-neutron -n $NETWORK_JSON neutron net-list NET_ID=$(neutron net-list -f csv --quote none | grep default-net | cut -d, -f1) if ! nova keypair-show default 2>/dev/null; then tripleo user-config; fi nova boot --poll --key-name default --flavor m1.demo --image user --nic net-id=$NET_ID demo PRIVATEIP=$(nova list | grep demo | awk -F"default-net=" '{print $2}' | awk '{print $1}') tripleo wait_for 10 5 neutron port-list -f csv -c id --quote none \| grep id PORT=$(neutron port-list | grep $PRIVATEIP | cut -d'|' -f2) FLOATINGIP=$(neutron floatingip-create ext-net --port-id "${PORT//[[:space:]]/}" | awk '$2=="floating_ip_address" {print $4}') SECGROUPID=$(nova secgroup-list | grep default | cut -d ' ' -f2) neutron security-group-rule-create $SECGROUPID --protocol icmp \ --direction ingress --port-range-min 8 || true neutron security-group-rule-create $SECGROUPID --protocol tcp \ --direction ingress --port-range-min 22 --port-range-max 22 || true (wait for address to associate ok in nova list): ping $FLOATINGIP The only thing I can notice from the attached output is that compute and controller seem to have interfaces plugged on different vlans (tag:1 controller, tag:2 compute). Trying to reproduce with devtest. FYI (running this now), just launched instance and floatip as above. the tags seem ok (could be copy/paste from different runs above @gfidente) like CONTROL: Every 2.0s: ovs-vsctl show Fri Jul 31 12:26:37 2015 200fb223-fc01-4873-a4e6-029a47d2c38c Bridge br-int fail_mode: secure Port br-int Interface br-int type: internal Port "qr-fbdb0b97-bc" tag: 1 Interface "qr-fbdb0b97-bc" type: internal Port int-br-ex Interface int-br-ex type: patch options: {peer=phy-br-ex} Port "tapc16a2d86-75" tag: 1 Interface "tapc16a2d86-75" type: internal Bridge br-ex Port phy-br-ex Interface phy-br-ex type: patch options: {peer=int-br-ex} Port br-ex Interface br-ex type: internal Port "qg-a693a4dd-38" Interface "qg-a693a4dd-38" type: internal Port "eth0" Interface "eth0" ovs_version: "2.3.1-git3282e51" COMPUTE Every 2.0s: ovs-vsctl show Fri Jul 31 12:26:48 2015 e34bdcd6-473c-4b86-b8d0-40f56c466779 Bridge br-int fail_mode: secure Port int-br-ex Interface int-br-ex type: patch options: {peer=phy-br-ex} Port "qvo067be755-3f" tag: 1 Interface "qvo067be755-3f" Port br-int Interface br-int type: internal Bridge br-ex Port phy-br-ex Interface phy-br-ex type: patch options: {peer=int-br-ex} Port br-ex Interface br-ex type: internal ovs_version: "2.3.1-git3282e51" Created attachment 1058100 [details]
neutron config and ovs state (devtest)
The issue reproduces deploying with devtest; even though in that case both controller and compute had the interfaces plugged on the same vlan tag (tag: 1); that is probably unrelevant.
The overcloud instance does not seem to be able to acquire the IP via dhcp request.
was able to ping the vm by moving (for the compute node) the ip address from eth0 to the br-ex and adding eth0 to the bridge (i.e. like we do for the controller). still poking/verifying tidying up will update the bug again later. The problem is that for the vlan case, since there is no tunneling bridge defined, there is no way for ovs on the compute nodes to speak to the outside world. Whilst poking I realised that we can move the eth0 interface onto the br-ex bridge (and its MAC and also IP address) like we do for the controller so that instances can talk to the outside world (from the integration bridge, and then via br-ex which now has access). Workaround for now if you want to deploy with neutron tenant vlan. You need to write the following to a local file (call it whatever you want but change the name accordingly in the deploy command). I have defined a local file called 'vlan_env.yaml' with the following contents: resource_registry: OS::TripleO::Compute::Net::SoftwareConfig: net-config-bridge.yaml You also need to copy the net-config-bridge.yaml definition to wherever the vlan_env.yaml file has been created: cp /usr/share/openstack-tripleo-heat-templates/net-config-bridge.yaml ./ Then deploy specifying the vlan options, and passing the custom environment file: openstack overcloud deploy --plan overcloud --control-scale 1 --compute-scale 1 --neutron-network-type vlan --neutron-disable-tunneling -e vlan_env.yaml I was able to spawn an overcloud instance instance using the notes above at comment 3. I was able to ping to/from the vm, 2 vms could ping each other, and the outside world. State of the switches (see eth0 on the br-ex of compute, as on control) COMPUTE: Every 2.0s: ovs-vsctl show Mon Aug 3 10:42:26 2015 52643364-4136-4eef-b84b-e93686f2274c Bridge br-int fail_mode: secure Port "qvof889a1ac-92" tag: 2 Interface "qvof889a1ac-92" Port br-int Interface br-int type: internal Port int-br-ex Interface int-br-ex type: patch options: {peer=phy-br-ex} Bridge br-ex Port phy-br-ex Interface phy-br-ex type: patch options: {peer=int-br-ex} Port "eth0" Interface "eth0" Port br-ex Interface br-ex type: internal ovs_version: "2.3.1-git3282e51" Every 2.0s: ovs-vsctl show Mon Aug 3 10:41:08 2015 CONTROL: c460a217-afd1-4c2b-baa3-39166695016c Bridge br-ex Port "qg-f2893472-97" Interface "qg-f2893472-97" type: internal Port "eth0" Interface "eth0" Port phy-br-ex Interface phy-br-ex type: patch options: {peer=int-br-ex} Port br-ex Interface br-ex type: internal Bridge br-int fail_mode: secure Port "qr-9c47af37-9d" tag: 1 Interface "qr-9c47af37-9d" type: internal Port "tap484c7b56-e2" tag: 1 Interface "tap484c7b56-e2" type: internal Port int-br-ex Interface int-br-ex type: patch options: {peer=phy-br-ex} Port br-int Interface br-int type: internal ovs_version: "2.3.1-git3282e51" I'm not sure this is anything but a doc bug. The expectation has always been that net-config-bridge.yaml would be used for compute nodes when using VLAN mode tenant networking without network isolation. That requirement comes from upstream. Great thanks for the sanity check Dan! The trouble is with the Tuskar deployment workflow we can't override the resource registry other than with an environment file so yes in the very least we need to document this (otherwise, the expectation is specifying --tenant-network-types as vlan and you're good to go) Keith, Just want to check in here with you. This looks like we can say we require network isolation for VLAN-mode networking. (In reply to chris alfonso from comment #11) > Keith, Just want to check in here with you. This looks like we can say we > require network isolation for VLAN-mode networking. I think we will require network isolation for some use cases here, but there is a simple use case where this should work (for a POC, for instance): The network switches will have to be configured like this: NIC: Provisioning (Native VLAN) NIC: External (Native VLAN), Tenant VLAN Range (tagged) Any NIC can be configured for either role, but the tenant VLANs and external network must share the br-ex bridge. The deployment can then make use of the NIC with the tenant VLANs like this: Step 1) Create a network-environment.yaml with only the following content: resource_registry: OS::TripleO::Compute::Net::SoftwareConfig: net-config-noop.yaml Step 2) Deploy with the following command-line parameters (e.g. with a tenant VLAN range of 100-200): openstack overcloud deploy \ --neutron-network-type vlan \ --neutron-public-interface <NIC> \ --neutron-bridge-mappings datacentre:br-ex \ --neutron-network-vlan-ranges datacentre:100:200 \ --neutron-disable-tunneling I suspect that some users will want to separate the external and tenant VLAN interfaces. In order to do that, the net-config-bridge.yaml will have to be modified. The tenant VLAN interface should be placed on a new bridge (e.g. br-tenant), and the extra NIC added to that bridge. Disable IP on the new bridge by setting use_dhcp: false. That will create an unnumbered bridge interface that can then be included with different CLI parameters: openstack overcloud deploy \ --neutron-network-type vlan \ --neutron-public-interface <NIC> \ --neutron-bridge-mappings datacentre:br-ex,tenant:br-tenant --neutron-network-vlan-ranges tenant:100:200 \ --neutron-disable-tunneling I have attached a sample file named "net-config-2-bridges.yaml" which shows an example of using a dedicated interface (nic3 in the example) for tenant networking without network isolation. This file would be used for both controller and compute nodes, so you would need a network-environment.yaml with the following content: resource_registry: OS::TripleO::Compute::Net::SoftwareConfig: net-config-2-bridges.yaml OS::TripleO::Controller::Net::SoftwareConfig: net-config-2-bridges.yaml (This assumes that the net-config-2-bridges.yaml file is in /home/stack, along with the network-environment.yaml file) Then deploy with: openstack overcloud deploy \ --neutron-network-type vlan \ --neutron-public-interface <NIC> \ --neutron-bridge-mappings datacentre:br-ex,tenant:br-tenant --neutron-network-vlan-ranges tenant:100:200 \ --neutron-disable-tunneling \ -e /home/stack/network-environment.yaml (In reply to Dan Sneddon from comment #12) One quick note here. When using a network-environment.yaml file to declare the NIC configuration template to use, it should be either a full path, or the file should be placed in the same directory as the network-environment.yaml. Here is the corrected resource_registry entry when using network-environment.yaml: resource_registry: OS::TripleO::Compute::Net::SoftwareConfig: /usr/share/openstack-tripleo-heat-templates/net-config-noop.yaml Created attachment 1058888 [details]
net-config-2-bridges.yaml
(In reply to Dan Sneddon from comment #12) Comments #12 and #13 incorrectly included the 'net-config-noop.yaml' file, when I meant to include the 'net-config-bridge.yaml'. I updated the doc text to reflect the correct template. To recap, when using only two interfaces (ctlplane and external/tenant), just include the net-config-bridge.yaml with the full path for the Compute template and deploy with the datacentre:br-ex bridge mappings. When using three interfaces (ctlplane, external, tenant), use the attached net-config-2-bridges.yaml (or similar) to create a separate OVS bridge for a dedicated adapter on both the compute and controller. Then deploy with the tenant:br-tenant bridge mapping. When the network gets more complicated than that (external on VLAN, external shared with provisioning, etc.) then we need network isolation to support those features. Dan Macpherson, Is this a candidate for the advanced deployment section? OSP 7 has reached its retirement, please see https://access.redhat.com/errata/RHBA-2018:2362 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |
Created attachment 1058070 [details] neutron config and ovs state Description of problem: Cant ping overcloud instance when deploying with tenant vlans like openstack overcloud deploy --plan overcloud --control-scale 1 --compute-scale 1 --neutron-network-type vlan --neutron-disable-tunneling This may be due to my setup of the overcloud tenant networking, hence this bug as a sanity check and to track the issue, if there is one. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1.openstack overcloud deploy --plan overcloud --control-scale 1 --compute-scale 1 --neutron-network-type vlan --neutron-disable-tunneling 2. try the ping test like [1] below Actual results: cannot ping instance floating ip Expected results: ping test should complete. Additional info: I launch an instance and try to ping it on a floating ip address like [1] - fails (Unreachable). This test completes fine for me when I instead use gre or vxlan tunnels for the overcloud tenant networking. I am pretty confident that we are configuring the right things in neutron (see attachment). I am not so sure about the state of the switch on the compute node (but lgtm on controller, see attachment - in fact can ping the tenant router happily from the undercloud just fine) - I've attached the state of the switch on both nodes when the instance has been successfully spawned. Are we missing an interface on br-ex on the compute node? Otherwise I can't see how the instance is reachable (hence the issue). I can ping the router interface fine from the undercloud. Pinging the instance: From 192.0.2.46 icmp_seq=1 Destination Host Unreachable Can anyone else recreate this? Am I doing it wrong with the overcloud tenant networking (is there some vlan magic I miss?). appreciate any feedback.