Following is the detailed log analysis to explain why this feature is necessary. Because RHOSP7 doesn't have the workflow to let user configure the uplinks, neutron-bsn-lldp service has to be smart enough to figure out which are the uplinks to send out LLDP. If all the following 3 conditions holds, we consider a link as an uplink. 1) the link is a physical link and is up (managed by network-online.service) 2) the link is attached to ovs (managed by os-collect-config.service) 3) the link does not have a IP address (managed by os-collect-config.service) As a result, neutron-bsn-lldp.service should be enabled AFTER network-online.service and os-collect-config.service has been started. Otherwise, neutron-bsn-lldp cannot decide which are the uplinks. However, the os-collect-config.service not only does 2) and 3), but also starts openstack services that require IP connectivity. The problem is that without properly sending out LLDP, the fabric cannot provide IP connectivity. If we put "Wants=network-online.target" and "After=syslog.target network.target network-online.target" into neutron-bsn-lldp.service, This log shows that the services start in following order: bring up links -> start lldp service -> attach uplinks to ovs. Oct 7 18:01:26 localhost NetworkManager[604]: <info> (p1p1): link connected Oct 7 18:01:26 localhost NetworkManager[604]: <info> (p1p2): link connected Oct 7 18:01:33 localhost systemd: Started bsn lldp. Oct 7 18:02:34 localhost ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl -t 10 -- --fake-iface add-bond br-ex bond1 p1p1 p1p2 bond_mode=balance-tcp lacp=active other-config:lacp-fallback-ab=true other-config:lacp-time=fast Oct 7 18:02:36 localhost kernel: device bond1 entered promiscuous mode Oct 7 18:02:36 localhost systemd: Started DHCP interface bond1. Oct 7 18:02:36 localhost NetworkManager[604]: <info> (bond1): link connected This order is wrong and shouldn't be working. However, the reason it works in most cases is https://github.com/stackforge/networking-bigswitch/blob/master/bsnstacklib/bsnlldp/bsnlldp.py#L331-L334, in which neutron-bsn-lldp service keeps looking for uplinks until it finds at least one uplink. However, if an uplink temporarily fails to be attached to ovs, LLDP won't be sent via that uplink. Following is an example, Oct 7 18:02:47 localhost os-collect-config: [2015/10/07 06:02:47 PM] [INFO] running ifup on interface: p1p1 Oct 7 18:02:48 localhost os-collect-config: [2015/10/07 06:02:48 PM] [INFO] running ifup on interface: p1p2 Oct 7 18:02:48 localhost os-collect-config: [2015/10/07 06:02:48 PM] [INFO] Running ovs-appctl bond/set-active-slave ('bond1', 'p1p1') Oct 7 18:02:48 localhost os-collect-config: Traceback (most recent call last): Oct 7 18:02:48 localhost os-collect-config: File "/usr/bin/os-net-config", line 10, in <module> Oct 7 18:02:48 localhost os-collect-config: sys.exit(main()) Oct 7 18:02:48 localhost os-collect-config: File "/usr/lib/python2.7/site-packages/os_net_config/cli.py", line 187, in main Oct 7 18:02:48 localhost os-collect-config: activate=not opts.no_activate) Oct 7 18:02:48 localhost os-collect-config: File "/usr/lib/python2.7/site-packages/os_net_config/impl_ifcfg.py", line 312, in apply Oct 7 18:02:48 localhost os-collect-config: self.bond_primary_ifaces[bond]) Oct 7 18:02:48 localhost os-collect-config: File "/usr/lib/python2.7/site-packages/os_net_config/__init__.py", line 146, in ovs_appctl Oct 7 18:02:48 localhost os-collect-config: self.execute(msg, '/bin/ovs-appctl', action, *parameters) Oct 7 18:02:48 localhost os-collect-config: File "/usr/lib/python2.7/site-packages/os_net_config/__init__.py", line 108, in execute Oct 7 18:02:48 localhost os-collect-config: processutils.execute(cmd, *args, **kwargs) Oct 7 18:02:48 localhost os-collect-config: File "/usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py", line 266, in execute Oct 7 18:02:48 localhost os-collect-config: cmd=sanitized_cmd) Oct 7 18:02:48 localhost os-collect-config: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. Oct 7 18:02:48 localhost os-collect-config: Command: /bin/ovs-appctl bond/set-active-slave bond1 p1p1 Oct 7 18:02:48 localhost os-collect-config: Exit code: 2 Oct 7 18:02:48 localhost os-collect-config: Stdout: u'' Oct 7 18:02:48 localhost os-collect-config: Stderr: u'cannot make disabled slave active\novs-appctl: ovs-vswitchd: server returned an error\n' Oct 7 18:02:48 localhost os-collect-config: + RETVAL=1 Oct 7 18:02:48 localhost os-collect-config: + [[ 1 == 2 ]] Oct 7 18:02:48 localhost os-collect-config: + [[ 1 != 0 ]] Oct 7 18:02:48 localhost os-collect-config: + echo 'ERROR: os-net-config configuration failed.' Oct 7 18:02:48 localhost os-collect-config: ERROR: os-net-config configuration failed. Oct 7 18:02:48 localhost os-collect-config: + exit 1 Oct 7 18:02:48 localhost os-collect-config: [2015-10-07 18:02:48,413] (os-refresh-config) [ERROR] during configure phase. [Command '['dib-run-parts', '/usr/libexec/os-refresh-config/configure.d']' returned non-zero exit status 1] Oct 7 18:02:48 localhost os-collect-config: [2015-10-07 18:02:48,413] (os-refresh-config) [ERROR] Aborting... Oct 7 18:02:48 localhost os-collect-config: 2015-10-07 18:02:48.416 7470 ERROR os-collect-config [-] Command failed, will not cache new data. Command 'os-refresh-config' returned non-zero exit status 1 Oct 7 18:02:48 localhost os-collect-config: 2015-10-07 18:02:48.416 7470 WARNING os-collect-config [-] Sleeping 30.00 seconds before re-exec.
This bug did not make the OSP 8.0 release. It is being deferred to OSP 10.
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.
If this bug is still applicable, we can create a new bug so that we can work on this further in the next release. Otherwise, I'll assume that workarounds have been found.
Closing this out due to lack of manpower, lower priority, and the fact that workarounds exist.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days