1956338 – Getting error "ovs-appctl: cannot read pidfile \"/var/run/openvswitch/ovs-vswitchd.pid\" (No such file or directory)" while upgrading ceph storage node.

Bug 1956338 - Getting error "ovs-appctl: cannot read pidfile \"/var/run/openvswitch/ovs-vswitchd.pid\" (No such file or directory)" while upgrading ceph storage node.

Summary: Getting error "ovs-appctl: cannot read pidfile \"/var/run/openvswitch/ovs-vsw...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-heat-templates
Sub Component:
Version:	16.1 (Train)
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	RHOS Maint
QA Contact:	Joe H. Rahme
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-05-03 13:51 UTC by rbsshasha
Modified:	2022-08-23 10:43 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-05-29 10:07:07 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
nohup output file is also attached to see logs in more details. (423.93 KB, text/plain) 2021-05-03 13:51 UTC, rbsshasha	no flags	Details
ceph-storage.yaml file (6.36 KB, text/plain) 2021-05-17 16:32 UTC, rbsshasha	no flags	Details
controller.yaml file (8.66 KB, text/plain) 2021-05-17 16:34 UTC, rbsshasha	no flags	Details
computesriov.yaml file (7.46 KB, text/plain) 2021-05-17 16:35 UTC, rbsshasha	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	OSP-3559	0	None	None	None	2022-08-23 10:43:20 UTC

Description rbsshasha 2021-05-03 13:51:36 UTC

Created attachment 1779004 [details]
nohup output file is also attached to see logs in more details.

Description of problem: After executing upgrade command with no tags on ceph nodes network config files vanished from ceph node.


Version-Release number of selected component (if applicable):


How reproducible:
During RHOSP13 to 16 FFU ceph node upgradation fails because of openvswitch in inactive state due to ceph node upgrdation getting failed.

Steps to Reproduce:
We have executed below command for ceph upgradation
nohup openstack overcloud upgrade run --stack overcloud --limit overcloud-cephstorage-0 -y &

Ceph node upgradation failed because of below error:
os_net_config.ConfigurationError: Failure(s) occurred when applying configuration

Below is the ip a output of ceph node:
[root@overcloud-cephstorage-0 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 14:18:77:43:ab:c0 brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.221/24 brd 192.168.100.255 scope global dynamic em1
       valid_lft 81907sec preferred_lft 81907sec
    inet6 fe80::1618:77ff:fe43:abc0/64 scope link
       valid_lft forever preferred_lft forever
3: em2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 14:18:77:43:ab:c1 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::1618:77ff:fe43:abc1/64 scope link
       valid_lft forever preferred_lft forever
4: em3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 14:18:77:43:ab:c2 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::1618:77ff:fe43:abc2/64 scope link
       valid_lft forever preferred_lft forever
5: em4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 14:18:77:43:ab:c3 brd ff:ff:ff:ff:ff:ff
6: p1p1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether a0:36:9f:ec:4c:44 brd ff:ff:ff:ff:ff:ff
7: p1p2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether a0:36:9f:ec:4c:46 brd ff:ff:ff:ff:ff:ff
8: p4p1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether a0:36:9f:d3:ce:48 brd ff:ff:ff:ff:ff:ff
9: p4p2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether a0:36:9f:d3:ce:4a brd ff:ff:ff:ff:ff:ff

Below is ip r output of effected node:
[root@overcloud-cephstorage-0 ~]# ip r
default via 192.168.100.34 dev em1
192.168.100.0/24 dev em1 proto kernel scope link src 192.168.100.221


Actual results:
Upgradation was failing because ovs-vswitchd service was in inactive state during ceph node upgradation because of this ceph node upgradation was failing because of networking.
[root@overcloud-cephstorage-0 ~]# systemctl list-unit-files|grep -i ovs
ovs-delete-transient-ports.service                               static
ovs-vswitchd.service                                             static
ovsdb-server.service                                             static
[root@overcloud-cephstorage-0 ~]# systemctl status ovs-vswitchd.service
● ovs-vswitchd.service - Open vSwitch Forwarding Unit
   Loaded: loaded (/usr/lib/systemd/system/ovs-vswitchd.service; static; vendor preset: disabled)
   Active: inactive (dead)


After activating ovs service in ceph node upgradation successfully completed.

[root@overcloud-cephstorage-0 ~]# systemctl status ovs-vswitchd.service
● ovs-vswitchd.service - Open vSwitch Forwarding Unit
   Loaded: loaded (/usr/lib/systemd/system/ovs-vswitchd.service; static; vendor preset: disabled)
   Active: active (running) since Thu 2021-04-29 15:09:47 UTC; 17s ago
  Process: 96792 ExecStart=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server --no-monitor --system-id=random ${OVS_USER_OPT} s>
  Process: 96789 ExecStartPre=/usr/bin/chmod 0775 /dev/hugepages (code=exited, status=0/SUCCESS)
  Process: 96786 ExecStartPre=/bin/sh -c /usr/bin/chown :$${OVS_USER_ID##*:} /dev/hugepages (code=exited, status=0/SUCCESS)
 Main PID: 96844 (ovs-vswitchd)
    Tasks: 1 (limit: 822668)
   Memory: 20.4M
   CGroup: /system.slice/ovs-vswitchd.service
           └─96844 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --user openvswit>

Apr 29 15:09:47 overcloud-cephstorage-0 systemd[1]: Starting Open vSwitch Forwarding Unit...
Apr 29 15:09:47 overcloud-cephstorage-0 ovs-ctl[96792]: Inserting openvswitch module [  OK  ]
Apr 29 15:09:47 overcloud-cephstorage-0 ovs-ctl[96792]: Starting ovs-vswitchd [  OK  ]
Apr 29 15:09:47 overcloud-cephstorage-0 ovs-vsctl[96851]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --no-wait add Open_vSwitch . exter>
Apr 29 15:09:47 overcloud-cephstorage-0 ovs-ctl[96792]: Enabling remote OVSDB managers [  OK  ]
Apr 29 15:09:47 overcloud-cephstorage-0 systemd[1]: Started Open vSwitch Forwarding Unit.

Unset noout flag ------------------------------------------------------- 14.97s
tripleo-podman : Purge /var/lib/docker ---------------------------------- 8.87s
tripleo-podman : Uninstall Docker rpm ----------------------------------- 5.46s
Gathering Facts --------------------------------------------------------- 3.32s
Render all_nodes data as group_vars for overcloud ----------------------- 2.53s
Gathering Facts --------------------------------------------------------- 2.04s
tripleo-podman : Check docker service state ----------------------------- 1.38s
tripleo-podman : Check if docker has some data -------------------------- 0.96s
tripleo-podman : Refresh hardware facts --------------------------------- 0.90s
tripleo-podman : Clean podman images ------------------------------------ 0.38s
tripleo-podman : Clean podman images ------------------------------------ 0.38s
include_tasks ----------------------------------------------------------- 0.35s
tripleo-podman : Clean podman volumes ----------------------------------- 0.33s
include_tasks ----------------------------------------------------------- 0.33s
Stop docker ------------------------------------------------------------- 0.33s
Purge everything about docker on the host ------------------------------- 0.28s
Unset noout flag -------------------------------------------------------- 0.27s
include_tasks ----------------------------------------------------------- 0.27s
include_tasks ----------------------------------------------------------- 0.25s
Stop docker ------------------------------------------------------------- 0.23s

Updated nodes - overcloud-cephstorage-0
Success
2021-04-29 21:05:17.529 528361 INFO tripleoclient.v1.overcloud_upgrade.UpgradeRun [-] Completed Overcloud Upgrade Run for overcloud-cephstorage-0 with playbooks ['upgrade_steps_playbook.yaml', 'deploy_steps_playbook.yaml', 'post_upgrade_steps_playbook.yaml']
2021-04-29 21:05:17.535 528361 INFO osc_lib.shell [-] END return value: None
 

I am getting same error for compute node as well, I just wanted to know if there is any mechanism that openvswitch service will be in active state without manual intervention so that rhosp upgradation will not get hamper in this state. 
 
Expected results:
openvswitch service should be in active state during upgradation. 

Additional info:
RHEL Version: Red Hat Enterprise Linux release 8.2 (Ootpa)
RHOSP Version: 16.2

Below is the upgrade prepare command used during upgradation:
nohup openstack overcloud upgrade prepare  
--templates /home/stack/openstack-tripleo-heat-templates-rendered_16 
-r /home/stack/templates/roles_data.yaml 
-n /home/stack/templates/network_data.yaml 
-e /home/stack/containers-prepare-parameter.yaml 
-e /home/stack/templates/upgrades-environment.yaml 
-e /home/stack/templates/rhsm.yml 
-e /home/stack/openstack-tripleo-heat-templates-rendered_16/environments/network-isolation.yaml 
-e /home/stack/templates/network-environment.yaml 
-e /home/stack/templates/node-info.yaml 
-e /home/stack/openstack-tripleo-heat-templates-rendered_16/environments/services/neutron-sriov.yaml 
-e /home/stack/openstack-tripleo-heat-templates-rendered_16/environments/services/neutron-ovs.yaml 
-e /home/stack/openstack-tripleo-heat-templates-rendered_16/environments/ceph-ansible/ceph-ansible.yaml 
-e /home/stack/openstack-tripleo-heat-templates-rendered_16/environments/cinder-backup.yaml 
-e /home/stack/templates/storage-config.yaml 
-e /home/stack/openstack-tripleo-heat-templates-rendered_16/environments/host-config-and-reboot.yaml 
--libvirt-type kvm  
--ntp-server pool.ntp.org -v -y &

I am using below redhat document for rhosp13 to 16.1 upgradation.
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html-single/framework_for_upgrades_13_to_16.1/index

Comment 1 rbsshasha 2021-05-11 08:44:59 UTC

Hi OVS team,
Please suggest, I am waiting for your suggestions, do let me know if you need any other information.

Comment 3 Giulio Fidente 2021-05-17 14:33:55 UTC

can you attach the NIC config templates; this looks like a configuration issue or an environmental issue because the logs suggest there was an error with em2 and em3 not getting the IP assigned by DHCP

        "[2021/04/29 02:03:51 PM] [ERROR] Failure(s) occurred when applying configuration",
        "[2021/04/29 02:03:51 PM] [ERROR] stdout: ",
        "Determining IP information for em2... failed.",
        ", stderr: WARN      : [ifup] You are using 'ifup' script provided by 'network-scripts', which are now deprecated.",
        "WARN      : [ifup] 'network-scripts' will be removed in one of the next major releases of RHEL.",
        "WARN      : [ifup] It is advised to switch to 'NetworkManager' instead - it provides 'ifup/ifdown' scripts as well.",
        "",
        "[2021/04/29 02:03:51 PM] [ERROR] stdout: ",
        "Determining IP information for em3... failed.",
        ", stderr: WARN      : [ifup] You are using 'ifup' script provided by 'network-scripts', which are now deprecated.",
        "WARN      : [ifup] 'network-scripts' will be removed in one of the next major releases of RHEL.",
        "WARN      : [ifup] It is advised to switch to 'NetworkManager' instead - it provides 'ifup/ifdown' scripts as well.",
        "",
        "Traceback (most recent call last):",
        "  File \"/bin/os-net-config\", line 10, in <module>",
        "    sys.exit(main())",
        "  File \"/usr/lib/python3.6/site-packages/os_net_config/cli.py\", line 349, in main",
        "    activate=not opts.no_activate)",
        "  File \"/usr/lib/python3.6/site-packages/os_net_config/impl_ifcfg.py\", line 1806, in apply",
        "    raise os_net_config.ConfigurationError(message)",
        "os_net_config.ConfigurationError: Failure(s) occurred when applying configuration",
        "+ RETVAL=1",
        "+ set -e",
        "+ [[ 1 == 2 ]]",
        "+ [[ 1 != 0 ]]",
        "+ echo 'ERROR: configuration of safe defaults failed.'"

Comment 4 rbsshasha 2021-05-17 16:32:47 UTC

Created attachment 1784160 [details]
ceph-storage.yaml file

Comment 5 rbsshasha 2021-05-17 16:34:46 UTC

Created attachment 1784161 [details]
controller.yaml file

Comment 6 rbsshasha 2021-05-17 16:35:33 UTC

Created attachment 1784162 [details]
computesriov.yaml file

Comment 7 rbsshasha 2021-05-19 09:21:35 UTC

(In reply to Giulio Fidente from comment #3)
> can you attach the NIC config templates; this looks like a configuration
> issue or an environmental issue because the logs suggest there was an error
> with em2 and em3 not getting the IP assigned by DHCP
> 
>         "[2021/04/29 02:03:51 PM] [ERROR] Failure(s) occurred when applying
> configuration",
>         "[2021/04/29 02:03:51 PM] [ERROR] stdout: ",
>         "Determining IP information for em2... failed.",
>         ", stderr: WARN      : [ifup] You are using 'ifup' script provided
> by 'network-scripts', which are now deprecated.",
>         "WARN      : [ifup] 'network-scripts' will be removed in one of the
> next major releases of RHEL.",
>         "WARN      : [ifup] It is advised to switch to 'NetworkManager'
> instead - it provides 'ifup/ifdown' scripts as well.",
>         "",
>         "[2021/04/29 02:03:51 PM] [ERROR] stdout: ",
>         "Determining IP information for em3... failed.",
>         ", stderr: WARN      : [ifup] You are using 'ifup' script provided
> by 'network-scripts', which are now deprecated.",
>         "WARN      : [ifup] 'network-scripts' will be removed in one of the
> next major releases of RHEL.",
>         "WARN      : [ifup] It is advised to switch to 'NetworkManager'
> instead - it provides 'ifup/ifdown' scripts as well.",
>         "",
>         "Traceback (most recent call last):",
>         "  File \"/bin/os-net-config\", line 10, in <module>",
>         "    sys.exit(main())",
>         "  File \"/usr/lib/python3.6/site-packages/os_net_config/cli.py\",
> line 349, in main",
>         "    activate=not opts.no_activate)",
>         "  File
> \"/usr/lib/python3.6/site-packages/os_net_config/impl_ifcfg.py\", line 1806,
> in apply",
>         "    raise os_net_config.ConfigurationError(message)",
>         "os_net_config.ConfigurationError: Failure(s) occurred when applying
> configuration",
>         "+ RETVAL=1",
>         "+ set -e",
>         "+ [[ 1 == 2 ]]",
>         "+ [[ 1 != 0 ]]",
>         "+ echo 'ERROR: configuration of safe defaults failed.'"

Hi Giulio Fidente,
I have attached nic-configs file for controller,computesriov and ceph-storage, do let me know if any other info required from my side.

Comment 8 Giulio Fidente 2021-05-19 10:27:51 UTC

hi there seems to be a validation issue for the nic templates config/indentation

        "[2021/04/29 02:00:28 PM] [WARNING] Config file failed schema validation at network_config/1:",
        "    {'dns_servers': ['8.8.8.8', '8.8.4.4'], 'domain': [], 'members': [{'bonding_options': 'bond_mode=active-backup', 'members': [{'name': 'em2', 'primary': True, 'type': 'interface'}, {'name': 'em3', 'type': 'interface'}], 'name': 'bond1', 'ovs_options': None, 'type': 'ovs_bond'}, {'addresses': [{'ip_netmask': '192.168.23.161/24'}], 'type': 'vlan', 'vlan_id': 23}, {'addresses': [{'ip_netmask': '192.168.24.200/24'}], 'type': 'vlan', 'vlan_id': 24}], 'name': 'br-bond', 'type': 'ovs_bridge', 'nic_mapping': None, 'persist_mapping': False} is not valid under any of the given schemas",
        "  Sub-schemas tested and not matching:",
        "  - items/oneOf/ovs_bridge/members/items/oneOf: {'bonding_options': 'bond_mode=active-backup', 'members': [{'name': 'em2', 'primary': True, 'type': 'interface'}, {'name': 'em3', 'type': 'interface'}], 'name': 'bond1', 'ovs_options': None, 'type': 'ovs_bond'} is not valid under any of the given schemas",
        "  -- items/oneOf/ovs_bridge/members/items/oneOf/ovs_bond/additionalProperties: Additional properties are not allowed ('bonding_options' was unexpected)",
        "  -- items/oneOf/ovs_bridge/members/items/oneOf/ovs_bond/ovs_options/oneOf: None is not valid under any of the given schemas",
        "  --- items/oneOf/ovs_bridge/members/items/oneOf/ovs_bond/ovs_options/oneOf/ovs_options_string/type: 'None' is not of type 'string'",
        "  --- items/oneOf/ovs_bridge/members/items/oneOf/ovs_bond/ovs_options/oneOf/param/oneOf: None is not valid under any of the given schemas",
        "  ---- items/oneOf/ovs_bridge/members/items/oneOf/ovs_bond/ovs_options/oneOf/param/oneOf/0/type: 'None' is not of type 'object'",
        "  ---- items/oneOf/ovs_bridge/members/items/oneOf/ovs_bond/ovs_options/oneOf/param/oneOf/1/type: 'None' is not of type 'object'",

we're looking into that to try find the exact problem

Comment 9 Brendan Shephard 2021-05-19 10:58:18 UTC

Hi,

Looking at the ceph-storage.yaml file, it appears you're using bonding_options - which is intended to be used with Linux Bonds:
              - type: ovs_bridge
                name: br-bond
                dns_servers:
                  get_param: DnsServers
                domain:
                  get_param: DnsSearchDomains
                members:
                - type: ovs_bond
                  name: bond1
                  ovs_options: null
                  bonding_options:
                    get_param: BondInterfaceOvsOptions
                  members:
                  - type: interface
                    name: em2
                    primary: true
                  - type: interface
                    name: em3

The correct way to do this with a ovs_bond would be:
              - type: ovs_bridge
                name: br-bond
                dns_servers:
                  get_param: DnsServers
                domain:
                  get_param: DnsSearchDomains
                members:
                - type: ovs_bond
                  name: bond1
                  ovs_options:
                    get_param: BondInterfaceOvsOptions
                  members:
                  - type: interface
                    name: em2
                    primary: true
                  - type: interface
                    name: em3


The schema is defined here and we can see that bonding_options is only used for the linux_bond  and linux_team interface types:
https://github.com/openstack/os-net-config/blob/stable/train/os_net_config/schema.yaml#L1165-L1181

Whereas for ovs_bond it uses bonding_options:
https://github.com/openstack/os-net-config/blob/stable/train/os_net_config/schema.yaml#L606-L623

Here's some examples from the default network config files for reference:
linux_bond example:
https://github.com/openstack/tripleo-heat-templates/blob/stable/train/network/config/bond-with-vlans/role.role.j2.yaml#L195-L200

ovs_bond example:
https://github.com/openstack/tripleo-heat-templates/blob/stable/train/network/config/bond-with-vlans/role.role.j2.yaml#L159-L164

I believe that is the reason the schemas are not matching and you're getting those errors.

Comment 10 Dan Sneddon 2021-05-19 17:38:29 UTC

I would recommend the following changes:

- Convert from OVS bond to Linux bond

- Add "device: bond1" to the VLANs that are attached to the bond

- Remove OVS bridge (there is no reason for it if you are using Linux bonds)

- Remove line that has: "ovs_options: null"


Alternately, you can remove the "bonding_options:" line and put the "get_param: BondInterfaceOvsOptions" under the ovs_options: line (without the "null").

So it would look like this:

              network_config:
              - type: interface
                name: em1
                use_dhcp: false
                dns_servers:
                  get_param: DnsServers
                domain:
                  get_param: DnsSearchDomains
                addresses:
                - ip_netmask:
                    list_join:
                    - /
                    - - get_param: ControlPlaneIp
                      - get_param: ControlPlaneSubnetCidr
                routes:
                - ip_netmask: 169.254.169.254/32
                  next_hop:
                    get_param: EC2MetadataIp
                - default: true
                  next_hop:
                    get_param: ControlPlaneDefaultRoute
              - type: linux_bond
                name: bond1
                bonding_options:
                  get_param: BondInterfaceOvsOptions
                members:
                - type: interface
                  name: em2
                  primary: true
                - type: interface
                  name: em3
              - type: vlan
                vlan_id:
                  get_param: StorageNetworkVlanID
                addresses:
                - ip_netmask:
                    get_param: StorageIpSubnet
              - type: vlan
                vlan_id:
                  get_param: StorageMgmtNetworkVlanID
                addresses:
                - ip_netmask:
                    get_param: StorageMgmtIpSubnet

Comment 11 Dan Sneddon 2021-05-19 17:40:14 UTC

Note that in order to apply the network configuration, you will have to have NetworkDeploymentActions set with "UPDATE" in the list in an environment file:

parameter_defaults:
  NetworkDeploymentActions: ["CREATE","UPDATE"]

Then run a stack update, and on subsequent stack updates you shouldn't need to set NetworkDeploymentActions.

Comment 12 rbsshasha 2021-05-20 05:56:36 UTC

(In reply to Dan Sneddon from comment #10)
> I would recommend the following changes:
> 
> - Convert from OVS bond to Linux bond
> 
> - Add "device: bond1" to the VLANs that are attached to the bond
> 
> - Remove OVS bridge (there is no reason for it if you are using Linux bonds)
> 
> - Remove line that has: "ovs_options: null"
> 
> 
> Alternately, you can remove the "bonding_options:" line and put the
> "get_param: BondInterfaceOvsOptions" under the ovs_options: line (without
> the "null").
> 
> So it would look like this:
> 
>               network_config:
>               - type: interface
>                 name: em1
>                 use_dhcp: false
>                 dns_servers:
>                   get_param: DnsServers
>                 domain:
>                   get_param: DnsSearchDomains
>                 addresses:
>                 - ip_netmask:
>                     list_join:
>                     - /
>                     - - get_param: ControlPlaneIp
>                       - get_param: ControlPlaneSubnetCidr
>                 routes:
>                 - ip_netmask: 169.254.169.254/32
>                   next_hop:
>                     get_param: EC2MetadataIp
>                 - default: true
>                   next_hop:
>                     get_param: ControlPlaneDefaultRoute
>               - type: linux_bond
>                 name: bond1
>                 bonding_options:
>                   get_param: BondInterfaceOvsOptions
>                 members:
>                 - type: interface
>                   name: em2
>                   primary: true
>                 - type: interface
>                   name: em3
>               - type: vlan
>                 vlan_id:
>                   get_param: StorageNetworkVlanID
>                 addresses:
>                 - ip_netmask:
>                     get_param: StorageIpSubnet
>               - type: vlan
>                 vlan_id:
>                   get_param: StorageMgmtNetworkVlanID
>                 addresses:
>                 - ip_netmask:
>                     get_param: StorageMgmtIpSubnet

Hi Dan,
I will apply these changes suggested by you and verify once again during rhosp fast forward upgrade, but one thing that comes to mind is why I am not getting error for controller nodes because I am using the same bonding_options parameter in nic-configs file for controller nodes also like below:
              - type: ovs_bridge
                name: br-ex
                dns_servers:
                  get_param: DnsServers
                domain:
                  get_param: DnsSearchDomains
                members:
                - type: ovs_bond
                  name: bond1
                  ovs_options: null
                  bonding_options:
                    get_param: BondInterfaceOvsOptions
                  members:
                  - type: interface
                    name: em2
                    primary: true
                  - type: interface
                    name: em3

Note You need to log in before you can comment on or make changes to this bug.