This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1269036 - RFE: Support the workflow that automatically detects nics and lets user customize bonds [NEEDINFO]
RFE: Support the workflow that automatically detects nics and lets user custo...
Status: NEW
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director (Show other bugs)
7.0 (Kilo)
Unspecified Unspecified
high Severity high
: ---
: ---
Assigned To: Bob Fournier
Shai Revivo
: FutureFeature
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-10-06 01:39 EDT by bigswitch
Modified: 2017-09-25 22:05 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
dsneddon: needinfo? (rhosp-bugs-internal)


Attachments (Terms of Use)

  None (edit)
Comment 4 bigswitch 2015-10-08 04:33:24 EDT
Following is the detailed log analysis to explain why this feature is necessary.

Because RHOSP7 doesn't have the workflow to let user configure the uplinks, neutron-bsn-lldp service has to be smart enough to figure out which are the uplinks to send out LLDP. If all the following 3 conditions holds, we consider a link as an uplink.
1) the link is a physical link and is up (managed by network-online.service)
2) the link is attached to ovs (managed by os-collect-config.service)
3) the link does not have a IP address (managed by os-collect-config.service)

As a result, neutron-bsn-lldp.service should be enabled AFTER network-online.service and os-collect-config.service has been started. Otherwise, neutron-bsn-lldp cannot decide which are the uplinks.

However, the os-collect-config.service not only does 2) and 3), but also starts openstack services that require IP connectivity. The problem is that without properly sending out LLDP, the fabric cannot provide IP connectivity.

If we put "Wants=network-online.target" and "After=syslog.target network.target network-online.target" into neutron-bsn-lldp.service,
This log shows that the services start in following order:
bring up links -> start lldp service -> attach uplinks to ovs.

Oct  7 18:01:26 localhost NetworkManager[604]: <info>  (p1p1): link connected
Oct  7 18:01:26 localhost NetworkManager[604]: <info>  (p1p2): link connected
Oct  7 18:01:33 localhost systemd: Started bsn lldp.
Oct  7 18:02:34 localhost ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl -t 10 -- --fake-iface add-bond br-ex bond1 p1p1 p1p2 bond_mode=balance-tcp lacp=active other-config:lacp-fallback-ab=true other-config:lacp-time=fast
Oct  7 18:02:36 localhost kernel: device bond1 entered promiscuous mode
Oct  7 18:02:36 localhost systemd: Started DHCP interface bond1.
Oct  7 18:02:36 localhost NetworkManager[604]: <info>  (bond1): link connected

This order is wrong and shouldn't be working. However, the reason it works in most cases is https://github.com/stackforge/networking-bigswitch/blob/master/bsnstacklib/bsnlldp/bsnlldp.py#L331-L334, in which neutron-bsn-lldp service keeps looking for uplinks until it finds at least one uplink.

However, if an uplink temporarily fails to be attached to ovs, LLDP won't be sent via that uplink. Following is an example,

Oct  7 18:02:47 localhost os-collect-config: [2015/10/07 06:02:47 PM] [INFO] running ifup on interface: p1p1
Oct  7 18:02:48 localhost os-collect-config: [2015/10/07 06:02:48 PM] [INFO] running ifup on interface: p1p2
Oct  7 18:02:48 localhost os-collect-config: [2015/10/07 06:02:48 PM] [INFO] Running ovs-appctl bond/set-active-slave ('bond1', 'p1p1')
Oct  7 18:02:48 localhost os-collect-config: Traceback (most recent call last):
Oct  7 18:02:48 localhost os-collect-config: File "/usr/bin/os-net-config", line 10, in <module>
Oct  7 18:02:48 localhost os-collect-config: sys.exit(main())
Oct  7 18:02:48 localhost os-collect-config: File "/usr/lib/python2.7/site-packages/os_net_config/cli.py", line 187, in main
Oct  7 18:02:48 localhost os-collect-config: activate=not opts.no_activate)
Oct  7 18:02:48 localhost os-collect-config: File "/usr/lib/python2.7/site-packages/os_net_config/impl_ifcfg.py", line 312, in apply
Oct  7 18:02:48 localhost os-collect-config: self.bond_primary_ifaces[bond])
Oct  7 18:02:48 localhost os-collect-config: File "/usr/lib/python2.7/site-packages/os_net_config/__init__.py", line 146, in ovs_appctl
Oct  7 18:02:48 localhost os-collect-config: self.execute(msg, '/bin/ovs-appctl', action, *parameters)
Oct  7 18:02:48 localhost os-collect-config: File "/usr/lib/python2.7/site-packages/os_net_config/__init__.py", line 108, in execute
Oct  7 18:02:48 localhost os-collect-config: processutils.execute(cmd, *args, **kwargs)
Oct  7 18:02:48 localhost os-collect-config: File "/usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py", line 266, in execute
Oct  7 18:02:48 localhost os-collect-config: cmd=sanitized_cmd)
Oct  7 18:02:48 localhost os-collect-config: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
Oct  7 18:02:48 localhost os-collect-config: Command: /bin/ovs-appctl bond/set-active-slave bond1 p1p1
Oct  7 18:02:48 localhost os-collect-config: Exit code: 2
Oct  7 18:02:48 localhost os-collect-config: Stdout: u''
Oct  7 18:02:48 localhost os-collect-config: Stderr: u'cannot make disabled slave active\novs-appctl: ovs-vswitchd: server returned an error\n'
Oct  7 18:02:48 localhost os-collect-config: + RETVAL=1
Oct  7 18:02:48 localhost os-collect-config: + [[ 1 == 2 ]]
Oct  7 18:02:48 localhost os-collect-config: + [[ 1 != 0 ]]
Oct  7 18:02:48 localhost os-collect-config: + echo 'ERROR: os-net-config configuration failed.'
Oct  7 18:02:48 localhost os-collect-config: ERROR: os-net-config configuration failed.
Oct  7 18:02:48 localhost os-collect-config: + exit 1
Oct  7 18:02:48 localhost os-collect-config: [2015-10-07 18:02:48,413] (os-refresh-config) [ERROR] during configure phase. [Command '['dib-run-parts', '/usr/libexec/os-refresh-config/configure.d']' returned non-zero exit status 1]
Oct  7 18:02:48 localhost os-collect-config: [2015-10-07 18:02:48,413] (os-refresh-config) [ERROR] Aborting...
Oct  7 18:02:48 localhost os-collect-config: 2015-10-07 18:02:48.416 7470 ERROR os-collect-config [-] Command failed, will not cache new data. Command 'os-refresh-config' returned non-zero exit status 1
Oct  7 18:02:48 localhost os-collect-config: 2015-10-07 18:02:48.416 7470 WARNING os-collect-config [-] Sleeping 30.00 seconds before re-exec.
Comment 6 Mike Burns 2016-04-07 16:54:03 EDT
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.
Comment 9 Red Hat Bugzilla Rules Engine 2017-02-06 10:26:47 EST
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.
Comment 10 Dan Sneddon 2017-02-06 14:37:34 EST
If this bug is still applicable, we can create a new bug so that we can work on this further in the next release. Otherwise, I'll assume that workarounds have been found.

Note You need to log in before you can comment on or make changes to this bug.