Bug 1562810 - The parameter bridge_name does not evaluate to br-ex by default when used inside the ceph-storage nic configuration file which ultimately causes the deployment to fail
Summary: The parameter bridge_name does not evaluate to br-ex by default when used ins...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Emilien Macchi
QA Contact: Gurenko Alex
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-02 14:47 UTC by Punit Kundal
Modified: 2018-04-18 23:46 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-04-17 18:45:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1569263 0 medium CLOSED [RFE] Add Ceph-Optimized NIC Config Examples 2023-09-15 01:27:11 UTC

Description Punit Kundal 2018-04-02 14:47:00 UTC
Description of problem:

When creating nic configuration templates for overcloud nodes, there is a parameter available that goes by the name of bridge_name. It evaluates to a default value of br-ex. 

This works fine when used in controller.yaml and compute.yaml. 

But when the same parameter is used in ceph-storage.yaml nic configuration template, it evaluates to null.

This then ultimately causes the os-net-config to fail on the overcloud ceph-storage nodes leading to a deployment failure in general. 


Below is a snippet of the traceback that is observed in the /var/log/messages file:

+++
Apr  2 06:40:10 localhost os-collect-config: ++ os-apply-config --key os_net_config --type raw --key-default ''
Apr  2 06:40:11 localhost os-collect-config: + NET_CONFIG='{"network_config": [{"routes": [{"ip_netmask": "169.254.169.254/32", "next_hop": "192.0.2.1"}], "use_dhcp": false, "type": "interface", "addresses": [{"ip_netmask": "192.0.2.20/24"}], "name": "nic1"}, {"dns_servers": ["192.168.122.1"], "addresses": [{"ip_netmask": "192.168.122.45/24"}], "members": [{"type": "interface", "name": "nic2", "primary": true}], "routes": [{"default": true, "next_hop": "192.168.122.1"}], "use_dhcp": false, "type": "ovs_bridge", "name": null}, {"use_dhcp": false, "type": "ovs_bridge", "name": "br-isolated", "members": [{"type": "interface", "name": "nic3", "primary": true}, {"type": "vlan", "addresses": [{"ip_netmask": "172.168.124.61/24"}], "vlan_id": 20}, {"type": "vlan", "addresses": [{"ip_netmask": "172.168.128.60/24"}], "vlan_id": 40}]}]}'
Apr  2 06:40:11 localhost os-collect-config: + '[' -n '{"network_config": [{"routes": [{"ip_netmask": "169.254.169.254/32", "next_hop": "192.0.2.1"}], "use_dhcp": false, "type": "interface", "addresses": [{"ip_netmask": "192.0.2.20/24"}], "name": "nic1"}, {"dns_servers": ["192.168.122.1"], "addresses": [{"ip_netmask": "192.168.122.45/24"}], "members": [{"type": "interface", "name": "nic2", "primary": true}], "routes": [{"default": true, "next_hop": "192.168.122.1"}], "use_dhcp": false, "type": "ovs_bridge", "name": null}, {"use_dhcp": false, "type": "ovs_bridge", "name": "br-isolated", "members": [{"type": "interface", "name": "nic3", "primary": true}, {"type": "vlan", "addresses": [{"ip_netmask": "172.168.124.61/24"}], "vlan_id": 20}, {"type": "vlan", "addresses": [{"ip_netmask": "172.168.128.60/24"}], "vlan_id": 40}]}]}' ']'
Apr  2 06:40:11 localhost os-collect-config: + trap configure_safe_defaults EXIT
Apr  2 06:40:11 localhost os-collect-config: + os-net-config -c /etc/os-net-config/config.json -v --detailed-exit-codes
Apr  2 06:40:11 localhost dhclient[9798]: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 6 (xid=0x44462575)
Apr  2 06:40:11 localhost os-collect-config: [2018/04/02 06:40:11 AM] [INFO] Using config file at: /etc/os-net-config/config.json
Apr  2 06:40:11 localhost os-collect-config: [2018/04/02 06:40:11 AM] [INFO] Using mapping file at: /etc/os-net-config/mapping.yaml
Apr  2 06:40:11 localhost os-collect-config: [2018/04/02 06:40:11 AM] [INFO] Ifcfg net config provider created.
Apr  2 06:40:11 localhost os-collect-config: [2018/04/02 06:40:11 AM] [INFO] nic3 mapped to: eth2
Apr  2 06:40:11 localhost os-collect-config: [2018/04/02 06:40:11 AM] [INFO] nic2 mapped to: eth1
Apr  2 06:40:11 localhost os-collect-config: [2018/04/02 06:40:11 AM] [INFO] nic1 mapped to: eth0
Apr  2 06:40:11 localhost os-collect-config: [2018/04/02 06:40:11 AM] [INFO] adding interface: eth0
Apr  2 06:40:11 localhost os-collect-config: [2018/04/02 06:40:11 AM] [INFO] adding custom route for interface: eth0
Apr  2 06:40:11 localhost os-collect-config: Traceback (most recent call last):
Apr  2 06:40:11 localhost os-collect-config: File "/usr/bin/os-net-config", line 10, in <module>
Apr  2 06:40:11 localhost os-collect-config: sys.exit(main())
Apr  2 06:40:11 localhost os-collect-config: File "/usr/lib/python2.7/site-packages/os_net_config/cli.py", line 184, in main
Apr  2 06:40:11 localhost os-collect-config: obj = objects.object_from_json(iface_json)
Apr  2 06:40:11 localhost os-collect-config: File "/usr/lib/python2.7/site-packages/os_net_config/objects.py", line 42, in object_from_json
Apr  2 06:40:11 localhost os-collect-config: return OvsBridge.from_json(json)
Apr  2 06:40:11 localhost os-collect-config: File "/usr/lib/python2.7/site-packages/os_net_config/objects.py", line 478, in from_json
Apr  2 06:40:11 localhost os-collect-config: name = _get_required_field(json, 'name', 'OvsBridge')
Apr  2 06:40:11 localhost os-collect-config: File "/usr/lib/python2.7/site-packages/os_net_config/objects.py", line 78, in _get_required_field
Apr  2 06:40:11 localhost os-collect-config: raise InvalidConfigException(msg)
Apr  2 06:40:11 localhost os-collect-config: os_net_config.objects.InvalidConfigException: OvsBridge JSON objects require 'name' to be configured.
Apr  2 06:40:11 localhost os-collect-config: + RETVAL=1
Apr  2 06:40:11 localhost os-collect-config: + [[ 1 == 2 ]]
Apr  2 06:40:11 localhost os-collect-config: + [[ 1 != 0 ]]
Apr  2 06:40:11 localhost os-collect-config: + echo 'ERROR: os-net-config configuration failed.'
Apr  2 06:40:11 localhost os-collect-config: ERROR: os-net-config configuration failed.
Apr  2 06:40:11 localhost os-collect-config: + exit 1
Apr  2 06:40:11 localhost os-collect-config: + configure_safe_defaults
Apr  2 06:40:11 localhost os-collect-config: + [[ 1 == 0 ]]
Apr  2 06:40:11 localhost os-collect-config: + cat
Apr  2 06:40:11 localhost os-collect-config: ++ ls /sys/class/net
Apr  2 06:40:11 localhost os-collect-config: ++ grep -v '^lo$'
+++

Below is the /etc/os-net-config/config.json file that get's written for the ceph-storage nodes:

+++
[root@overcloud-cephstorage-0 ~]# cat /etc/os-net-config/config.json | python -m json.tool
{
    "network_config": [
        {
            "addresses": [
                {
                    "ip_netmask": "192.0.2.20/24"
                }
            ],
            "name": "nic1",
            "routes": [
                {
                    "ip_netmask": "169.254.169.254/32",
                    "next_hop": "192.0.2.1"
                }
            ],
            "type": "interface",
            "use_dhcp": false
        },
        {
            "addresses": [
                {
                    "ip_netmask": "192.168.122.45/24"
                }
            ],
            "dns_servers": [
                "192.168.122.1"
            ],
            "members": [
                {
                    "name": "nic2",
                    "primary": true,
                    "type": "interface"
                }
            ],
            "name": null,  << value is set to null, instead it should be br-ex
            "routes": [
                {
                    "default": true,
                    "next_hop": "192.168.122.1"
                }
            ],
            "type": "ovs_bridge",
            "use_dhcp": false
        },
        {
            "members": [
                {
                    "name": "nic3",
                    "primary": true,
                    "type": "interface"
                },
                {
                    "addresses": [
                        {
                            "ip_netmask": "172.168.124.61/24"
                        }
                    ],
                    "type": "vlan",
                    "vlan_id": 20
                },
                {
                    "addresses": [
                        {
                            "ip_netmask": "172.168.128.60/24"
                        }
                    ],
                    "type": "vlan",
                    "vlan_id": 40
                }
            ],
            "name": "br-isolated",
            "type": "ovs_bridge",
            "use_dhcp": false
        }
    ]
}
+++


Version-Release number of selected component (if applicable):


How reproducible:

Always reproducible for ceph-storage nodes.


Steps to Reproduce:
1.
2.
3.

Actual results:
bridge_name evaluates to null for ceph-storage.yaml nic configuration template

Expected results:
bridge_name should evaluate to br-ex by default when used with ceph-storage.yaml 


Additional info:

I will attach the templates that I used for this deployment along with the /var/log/messages file from one of the ceph storage nodes.

Please let me know if any other info is needed here.

FYI: I have seen this behaviour on RHOS 11 too.

Comment 2 Bob Fournier 2018-04-17 13:13:04 UTC
In OSP-10, it looks like the $THT/puppet implementation is different with regards to bridge_name between controller/compute and ceph, probably because br-ex isn't used by ceph in any of the example reference templates.

Controller - br-ex explicitly used for bridge_name:
https://github.com/openstack/tripleo-heat-templates/blob/stable/newton/puppet/controller-role.yaml#L267

Compute - bridge_name comes from NeutronPhysicalBridge
https://github.com/openstack/tripleo-heat-templates/blob/stable/newton/puppet/compute-role.yaml#L248
which is set to br-ex:
https://github.com/openstack/tripleo-heat-templates/blob/stable/newton/puppet/compute-role.yaml#L35

CephStorage - bridge name is not set under NetworkDeployment:
https://github.com/openstack/tripleo-heat-templates/blob/stable/newton/puppet/cephstorage-role.yaml#L226

Can you try adding the setting of bridge_name under NetworkDeployment in a copy of 
/puppet/cephstorage-role.yaml and include that in the deployment?
     input_values:
        bridge_name: br-ex

BTW, this is resolved in OSP-12 and later by a generic puppet implementation using Jinja:
https://github.com/openstack/tripleo-heat-templates/blob/stable/pike/puppet/role.role.j2.yaml#L451

Comment 3 Dan Sneddon 2018-04-17 18:45:53 UTC
(In reply to Bob Fournier from comment #2)
> In OSP-10, it looks like the $THT/puppet implementation is different with
> regards to bridge_name between controller/compute and ceph, probably because
> br-ex isn't used by ceph in any of the example reference templates.

Ceph nodes don't need, and shouldn't have, a br-ex bridge. In fact, Ceph nodes shouldn't have any bridges.

The bridges in OSP are used by Neutron. Neutron uses the physical bridges such as br-ex to attach to physical networks. The bridge performs tasks like tagging/untagging VLAN tags or encapsulating/decapsulating VXLAN, and allows multiple VIFs to be attached to the same physical network interface.

Ceph nodes don't run any Neutron code, and don't need bridges for any reason. Adding a bridge to a Ceph node will only result in slight excess CPU utilization for no benefit.

If you look at the example NIC config templates for the Ceph nodes, you will see there are no bridges present. If you want to attach a Ceph node to the external network, it should be directly attached.

Example of attaching the external network to NIC3 on the native VLAN (untagged):

              - type: interface
                name: nic3
                use_dhcp: false
                addresses:
                - ip_netmask:
                    get_param: ExternalIpSubnet

Example of attaching the external network to NIC3 on a trunked VLAN (tagged):

              - type: interface
                name: nic3
                use_dhcp: false
              - type: vlan
                device: nic3
                vlan_id:
                  get_param: ExternalNetworkVlanID
                addresses:
                - ip_netmask:
                    get_param: ExternalIpSubnet

If you need further assistance in creating working Ceph NIC config templates, please feel free to respond which will reopen this ticket.

Comment 4 Dan Sneddon 2018-04-17 19:49:48 UTC
Just to clarify my earlier comments, some of the example templates for Ceph servers do have bridges, but they aren't required for Ceph servers. In the case of the bond-with-vlans templates, we use an OVS bridge to manage the bond. In the case of the single-nic-vlans templates, we also use a bridge, but this in this case OVS is used for VLAN tagging.

It is not required for the Ceph servers to have a bridge, however, so the above examples show how to attach the External network to a Ceph server without a bridge.

Comment 5 Dan Sneddon 2018-04-17 21:35:05 UTC
Bridge names may also be specified explicitly, rather than using the "bridge_name" token:

        - type: ovs_bridge
          name: br-ex

Comment 6 Punit Kundal 2018-04-18 05:41:50 UTC
Hello Dan and Bob,

Thanks for your replies. It has made things pretty clear for me. 

However if the ceph-storage nodes do not require any bridges at all then we would need to mention the same in the deployment documentation and provide a working nic-configuration example of the same. 

The examples that are mentioned currently at [1] do not differentiate in any way between ceph-storage nodes and controllers/computes. 

One of the major challenges for the customers who are deploying RHOSP without engaging GPS is creating templates and front line support folks can not provide them with fully customized templates and this prolongs the deployment times because many a times the customers who don't engage GPS can't work with the template examples that we have.

Additionally the default template examples under /usr/share/openstack-tripleo-heat-templates/network/config 

use bridges for nodes in all the default roles except for the multi-nic configuration.

[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/advanced_overcloud_customization/sect-isolating_networks#sect-Creating_Custom_Interface_Templates

I have also tested specifying the br-ex bridge name explicitly and it works without any issues and the deployment does complete.

Thank you again for the valuable inputs, it has made my understanding much clearer.

Regards,
Punit

Comment 7 Dan Sneddon 2018-04-18 23:46:49 UTC
Hello Punit,

I appreciate your feedback about Ceph network config. We have never created NIC config templates that were optimized for Ceph. I have created an RFE bug to track implementation and documentation:

https://bugzilla.redhat.com/show_bug.cgi?id=1569263


Note You need to log in before you can comment on or make changes to this bug.