Bug 1250743 - RFE: Add LLDP service to overcloud nodes (for compatibility with Big Switch bonding)
Summary: RFE: Add LLDP service to overcloud nodes (for compatibility with Big Switch b...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: y1
: 7.0 (Kilo)
Assignee: Marios Andreou
QA Contact: yeylon@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-08-05 21:13 UTC by Dan Sneddon
Modified: 2016-04-18 06:51 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-08-27 15:33:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
the modified send_lldp script for use with the virt-customize workaround (5.19 KB, text/x-python)
2015-08-20 17:54 UTC, Marios Andreou
no flags Details
the modified send_lldp.service file for use with the virt-customize workaround (242 bytes, text/plain)
2015-08-20 17:56 UTC, Marios Andreou
no flags Details

Description Dan Sneddon 2015-08-05 21:13:59 UTC
Description of problem:
The Big Switch SDN solution uses LLDP to identify the systems attached to each port. Bonds are created dynamically based on the system description in the LLDP packets sent to the switch by the host.

Version-Release number of selected component (if applicable):
GA

How reproducible:
100%

Steps to Reproduce:
1. Deploy overcloud
2.
3.

Actual results:
lldpad is not installed or configured on the overcloud nodes

Expected results:
lldpad should be installed, but a custom system description needs to be used as well (which Big Switch will supply)

Additional info:
This will require several changes:
  * Add lldpad RPM to overcloud image
  * Configure the system description string to Big Switch specifications (will probably require a mix of Heat template and Puppet manifest work)

Comment 3 bigswitch 2015-08-05 22:39:06 UTC
The lldp script we use to send out lldp is available at
https://bigswitch.box.com/shared/static/0hj2cadff9l572ri2wworasq8nuykkuh

The systemd service file is available at
https://bigswitch.box.com/shared/static/3i92z7fho68totv7gfp3chdun42zkj5a.service,

where the %(uname)s is the fqdn name of the compute node (uname -n),
the mac address in --system-desc is big switch OUI,
the %(uplinks)s is the 10G uplinks.

One example is: 
[heat-admin@overcloud-compute-0 ~]$ ps aux | grep lldp
root     21119  0.0  0.0 185072  7804 ?        Ss   00:10   0:00 python /bin/send_lldp --system-desc 5c:16:c7:00:00:00 --system-name overcloud-compute-0.localdomain -i 10 --network_interface p1p1,p1p2

Please let us know if there's any change we need to make to properly package it.

Comment 4 Marios Andreou 2015-08-20 17:54:54 UTC
Created attachment 1065353 [details]
the modified send_lldp script for use with the virt-customize workaround

Comment 5 Marios Andreou 2015-08-20 17:56:14 UTC
Created attachment 1065354 [details]
the modified send_lldp.service file for use with the virt-customize workaround

Comment 6 Marios Andreou 2015-08-20 18:02:24 UTC
Hi, Xin et al, as promised, here is the quick workaround that I mentioned on the call. I hope it can be useful immediately; as we discussed, I moved the config into the send_lldp script itself rather than in the systemd file - the script definitely needs improvement (my additions I mean, this is just a quick proof of concept). One outstanding task here is the logic for getting the appropriate interfaces (at run time) - see the script for the def get_10g_devices() method - it just returns ['eth0'] for now. As we discussed, one potential solution is to 'discover' these from the system, for example from the ifconfig output if that is sufficient (TBD). To be clear, i attach modified version of these scripts which you need to use in the process below. 

The steps below use 'virt-customize' (sudo yum install -y libguestfs-tools) to 'install' the two files (service and script) as well as enable the service at startup, in place, on the overcloud-full.qcow2 image. This is the image that all nodes boot to eventually become compute/control/whatever, so a change here will be available on all nodes. If you aren't familiar with the location of the overcloud-full.qcow2 image on your system please let me (and or jistr if you need it tomorrow when I'm away) and we can work it out... I am not sure if you'll be familiar with working with the overcloud images, I guess you will at least have downloaded them as part of your setup.

In any case, the aim is that once we have modified this image, we re-upload it into glance so it is used in your deployments. If you are doing the upload as part of your normal deploy process then you should follow these steps before uploading to glance for the first time. 

Otherwise (you have already uploaded the images previously), there is a process described below for deleting the existing images and reuploading them, HOWEVER please proceed with caution - you need to know the location of your images, in particular overcloud-full.qcow2 (obviously, this is the one we're patching) but also deploy-ramdisk-ironic.initramfs and .kernel - I am assuming you are following a procedure like https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/7/html/Director_Installation_and_Usage/sect-Obtaining_Images_for_Overcloud_Nodes.html where you are downloading these at some point as part of the setup (in which case you'll know where they are and can work with them here). 

I hope this is helpful as a start. If it is, on Monday we should also work out how to collaborate on these files (maybe we setup a temp github repo or something? though the intent is to just get this into a form that works 'for now' while we figure out a better solution) instead of attaching them here. I look forward to comments/thoughts.

thanks! marios

### WORKAROUND:

# Do the following, after you have built (or otherwise obtained) the overcloud-full.qcow image, but BEFORE this is uploaded into glance for you deployment:
#First backup overcloud-full.qcow incase we break everything:
#
cp overcloud-full.qcow2 overcloud-full.qcow.BACKUP

# You need the two modified files I have attached, send_lldp and send_lldp.service in the same directory as the overcloud-full.qcow2:
# Make sure permissions are correct
#
chmod 644 send_lldp.service
chmod 755 send_lldp

# Put the two files into the image and enable the service for startup:
# NOTE: "virt-customize" is provided by libguestfs-tools, sudo yum install -y libguestfs-tools should be enough
#
virt-customize --upload send_lldp:/bin/send_lldp --upload send_lldp.service:/lib/systemd/system/send_lldp.service --run-command "systemctl enable send_lldp.service" -a overcloud-full.qcow2 

# Now you can upload the images to glance (if this is the first time you are doing so, just:
#
# openstack overcloud image upload

# HOWEVER, if you already have images uploaded from earlier, you need to delete them (they will get reloaded). for the deletion you can use:
#
###for img in $(glance image-list | grep active | awk '{print $2}'); do glance image-delete $img; done
#
# and then you can: 
#
# openstack overcloud image upload

# initially you can just boot this image directly for testing purposes
#
nova boot --flavor baremetal --image overcloud-full --key-name default test

# After the instance is ACTIVE, ssh cloud-user@nova-ip should get you in
# login and confirm the service is loaded and running ok:
#
service send_lldp status -l

I also tried this "send_lldp modified" image with a simple deployment of one compute and one control. Below you can see the service active on the nodes and a tcpdump frame of the LLDP from each node:

openstack overcloud deploy --templates --control-scale 1 --compute-scale 1

overcloud-controller like:
[root@overcloud-controller-0 heat-admin]# service send_lldp status -l
Redirecting to /bin/systemctl status  -l send_lldp.service
send_lldp.service - send lldp
   Loaded: loaded (/usr/lib/systemd/system/send_lldp.service; enabled)
   Active: active (running) since Thu 2015-08-20 13:09:51 EDT; 14s ago
 Main PID: 797 (python)
   CGroup: /system.slice/send_lldp.service
           └─797 python /bin/send_lldp --system-desc 5c:16:c7:00:00:00 --system-name %(uname)s -i 10

Aug 20 13:09:51 localhost.localdomain systemd[1]: Starting send lldp...
Aug 20 13:09:51 localhost.localdomain systemd[1]: Started send lldp.

13:11:01.788591 LLDP, length 87
        Chassis ID TLV (1), length 17
          Subtype Local (7): Big Cloud Fabric
          0x0000:  0742 6967 2043 6c6f 7564 2046 6162 7269
          0x0010:  63
        Port ID TLV (2), length 5
          Subtype Interface alias (1): eth0
          0x0000:  0165 7468 30
        Time to Live TLV (3), length 2: TTL 120s
          0x0000:  0078
        System Name TLV (5), length 34: overcloud-controller-0.localdomain
          0x0000:  6f76 6572 636c 6f75 642d 636f 6e74 726f
          0x0010:  6c6c 6572 2d30 2e6c 6f63 616c 646f 6d61
          0x0020:  696e
        System Description TLV (6), length 17
          5c:16:c7:00:00:00
          0x0000:  3563 3a31 363a 6337 3a30 303a 3030 3a30
          0x0010:  30
        End TLV (0), length 0


[root@overcloud-compute-0 heat-admin]# service send_lldp status -l
Redirecting to /bin/systemctl status  -l send_lldp.service
send_lldp.service - send lldp
   Loaded: loaded (/usr/lib/systemd/system/send_lldp.service; enabled)
   Active: active (running) since Thu 2015-08-20 13:09:55 EDT; 2min 36s ago
 Main PID: 791 (python)
   CGroup: /system.slice/send_lldp.service
           └─791 python /bin/send_lldp --system-desc 5c:16:c7:00:00:00 --system-name %(uname)s -i 10

Aug 20 13:09:55 localhost.localdomain systemd[1]: Starting send lldp...
Aug 20 13:09:55 localhost.localdomain systemd[1]: Started send lldp.


13:14:35.848013 LLDP, length 84
	Chassis ID TLV (1), length 17
	  Subtype Local (7): Big Cloud Fabric
	  0x0000:  0742 6967 2043 6c6f 7564 2046 6162 7269
	  0x0010:  63
	Port ID TLV (2), length 5
	  Subtype Interface alias (1): eth0
	  0x0000:  0165 7468 30
	Time to Live TLV (3), length 2: TTL 120s
	  0x0000:  0078
	System Name TLV (5), length 31: overcloud-compute-0.localdomain
	  0x0000:  6f76 6572 636c 6f75 642d 636f 6d70 7574
	  0x0010:  652d 302e 6c6f 6361 6c64 6f6d 6169 6e
	System Description TLV (6), length 17
	  5c:16:c7:00:00:00
	  0x0000:  3563 3a31 363a 6337 3a30 303a 3030 3a30
	  0x0010:  30
	End TLV (0), length 0

Comment 7 bigswitch 2015-08-20 18:08:14 UTC
Cool! let us quickly try this workaround today. Will keep you posted. Thanks a lot :)

Comment 8 Aditya Vaja 2015-08-20 18:15:17 UTC
Thanks Marios! I have a github repo where I'll be updating files. Feel free to fork. I'll integrate your changes and also update it to get interface information from the system. Here's the link to the repo: https://github.com/wolverineav/python-networking-bigswitch-bsnlldp

*send_lldp is now bsnlldp.

Comment 9 bigswitch 2015-08-22 22:28:23 UTC
LLDP rpm package is ready. This package automatically figure out host fqdn and nic name

Spec URL: 
wget https://bigswitch.box.com/shared/static/6z9kkxoyx77r3hymyfpu105a6hbariiw.spec -O python-networking-bigswitch-bsnlldp.spec

SRPM URL: 
wget https://bigswitch.box.com/shared/static/3bvtnxxn1jawz5p6sjsckkrrux7kz53s.rpm -O python-networking-bigswitch-bsnlldp-3.0.0-1.el7.centos.src.rpm

RPM
wget https://bigswitch.box.com/shared/static/ims16o3h295788fqwo7hadia4niqaxg1.rpm -O python-networking-bigswitch-bsnlldp-3.0.0-1.el7.centos.noarch.rpm

We need to start the review process for this package as well.

Comment 10 bigswitch 2015-08-22 22:42:04 UTC
The key question is: this LLDP rpm package has nothing to do with openstack. It is just needed for Big Switch solution to automatically form bonds. Do we still have to have it reviewed in RDO and than RHOSP?

Comment 11 Dan Sneddon 2015-08-27 15:28:39 UTC
Just to note, the original ask was for lldpad, but that came from a BigSwitch engineer who was unaware that BigSwitch had their own Python script to send LLDP packets.

We may add lldpad as a service to future OpenStack images. If and when we do, we will need to add a boolean so that the lldpad service can be turned off when using BigSwitch.

Comment 12 Mike Burns 2015-08-27 15:33:04 UTC
So for now, this is closed since the inclusion is not really needed at this point.  Closing this bug for now.


Note You need to log in before you can comment on or make changes to this bug.