Description of problem: The Big Switch SDN solution uses LLDP to identify the systems attached to each port. Bonds are created dynamically based on the system description in the LLDP packets sent to the switch by the host. Version-Release number of selected component (if applicable): GA How reproducible: 100% Steps to Reproduce: 1. Deploy overcloud 2. 3. Actual results: lldpad is not installed or configured on the overcloud nodes Expected results: lldpad should be installed, but a custom system description needs to be used as well (which Big Switch will supply) Additional info: This will require several changes: * Add lldpad RPM to overcloud image * Configure the system description string to Big Switch specifications (will probably require a mix of Heat template and Puppet manifest work)
The lldp script we use to send out lldp is available at https://bigswitch.box.com/shared/static/0hj2cadff9l572ri2wworasq8nuykkuh The systemd service file is available at https://bigswitch.box.com/shared/static/3i92z7fho68totv7gfp3chdun42zkj5a.service, where the %(uname)s is the fqdn name of the compute node (uname -n), the mac address in --system-desc is big switch OUI, the %(uplinks)s is the 10G uplinks. One example is: [heat-admin@overcloud-compute-0 ~]$ ps aux | grep lldp root 21119 0.0 0.0 185072 7804 ? Ss 00:10 0:00 python /bin/send_lldp --system-desc 5c:16:c7:00:00:00 --system-name overcloud-compute-0.localdomain -i 10 --network_interface p1p1,p1p2 Please let us know if there's any change we need to make to properly package it.
Created attachment 1065353 [details] the modified send_lldp script for use with the virt-customize workaround
Created attachment 1065354 [details] the modified send_lldp.service file for use with the virt-customize workaround
Hi, Xin et al, as promised, here is the quick workaround that I mentioned on the call. I hope it can be useful immediately; as we discussed, I moved the config into the send_lldp script itself rather than in the systemd file - the script definitely needs improvement (my additions I mean, this is just a quick proof of concept). One outstanding task here is the logic for getting the appropriate interfaces (at run time) - see the script for the def get_10g_devices() method - it just returns ['eth0'] for now. As we discussed, one potential solution is to 'discover' these from the system, for example from the ifconfig output if that is sufficient (TBD). To be clear, i attach modified version of these scripts which you need to use in the process below. The steps below use 'virt-customize' (sudo yum install -y libguestfs-tools) to 'install' the two files (service and script) as well as enable the service at startup, in place, on the overcloud-full.qcow2 image. This is the image that all nodes boot to eventually become compute/control/whatever, so a change here will be available on all nodes. If you aren't familiar with the location of the overcloud-full.qcow2 image on your system please let me (and or jistr if you need it tomorrow when I'm away) and we can work it out... I am not sure if you'll be familiar with working with the overcloud images, I guess you will at least have downloaded them as part of your setup. In any case, the aim is that once we have modified this image, we re-upload it into glance so it is used in your deployments. If you are doing the upload as part of your normal deploy process then you should follow these steps before uploading to glance for the first time. Otherwise (you have already uploaded the images previously), there is a process described below for deleting the existing images and reuploading them, HOWEVER please proceed with caution - you need to know the location of your images, in particular overcloud-full.qcow2 (obviously, this is the one we're patching) but also deploy-ramdisk-ironic.initramfs and .kernel - I am assuming you are following a procedure like https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/7/html/Director_Installation_and_Usage/sect-Obtaining_Images_for_Overcloud_Nodes.html where you are downloading these at some point as part of the setup (in which case you'll know where they are and can work with them here). I hope this is helpful as a start. If it is, on Monday we should also work out how to collaborate on these files (maybe we setup a temp github repo or something? though the intent is to just get this into a form that works 'for now' while we figure out a better solution) instead of attaching them here. I look forward to comments/thoughts. thanks! marios ### WORKAROUND: # Do the following, after you have built (or otherwise obtained) the overcloud-full.qcow image, but BEFORE this is uploaded into glance for you deployment: #First backup overcloud-full.qcow incase we break everything: # cp overcloud-full.qcow2 overcloud-full.qcow.BACKUP # You need the two modified files I have attached, send_lldp and send_lldp.service in the same directory as the overcloud-full.qcow2: # Make sure permissions are correct # chmod 644 send_lldp.service chmod 755 send_lldp # Put the two files into the image and enable the service for startup: # NOTE: "virt-customize" is provided by libguestfs-tools, sudo yum install -y libguestfs-tools should be enough # virt-customize --upload send_lldp:/bin/send_lldp --upload send_lldp.service:/lib/systemd/system/send_lldp.service --run-command "systemctl enable send_lldp.service" -a overcloud-full.qcow2 # Now you can upload the images to glance (if this is the first time you are doing so, just: # # openstack overcloud image upload # HOWEVER, if you already have images uploaded from earlier, you need to delete them (they will get reloaded). for the deletion you can use: # ###for img in $(glance image-list | grep active | awk '{print $2}'); do glance image-delete $img; done # # and then you can: # # openstack overcloud image upload # initially you can just boot this image directly for testing purposes # nova boot --flavor baremetal --image overcloud-full --key-name default test # After the instance is ACTIVE, ssh cloud-user@nova-ip should get you in # login and confirm the service is loaded and running ok: # service send_lldp status -l I also tried this "send_lldp modified" image with a simple deployment of one compute and one control. Below you can see the service active on the nodes and a tcpdump frame of the LLDP from each node: openstack overcloud deploy --templates --control-scale 1 --compute-scale 1 overcloud-controller like: [root@overcloud-controller-0 heat-admin]# service send_lldp status -l Redirecting to /bin/systemctl status -l send_lldp.service send_lldp.service - send lldp Loaded: loaded (/usr/lib/systemd/system/send_lldp.service; enabled) Active: active (running) since Thu 2015-08-20 13:09:51 EDT; 14s ago Main PID: 797 (python) CGroup: /system.slice/send_lldp.service └─797 python /bin/send_lldp --system-desc 5c:16:c7:00:00:00 --system-name %(uname)s -i 10 Aug 20 13:09:51 localhost.localdomain systemd[1]: Starting send lldp... Aug 20 13:09:51 localhost.localdomain systemd[1]: Started send lldp. 13:11:01.788591 LLDP, length 87 Chassis ID TLV (1), length 17 Subtype Local (7): Big Cloud Fabric 0x0000: 0742 6967 2043 6c6f 7564 2046 6162 7269 0x0010: 63 Port ID TLV (2), length 5 Subtype Interface alias (1): eth0 0x0000: 0165 7468 30 Time to Live TLV (3), length 2: TTL 120s 0x0000: 0078 System Name TLV (5), length 34: overcloud-controller-0.localdomain 0x0000: 6f76 6572 636c 6f75 642d 636f 6e74 726f 0x0010: 6c6c 6572 2d30 2e6c 6f63 616c 646f 6d61 0x0020: 696e System Description TLV (6), length 17 5c:16:c7:00:00:00 0x0000: 3563 3a31 363a 6337 3a30 303a 3030 3a30 0x0010: 30 End TLV (0), length 0 [root@overcloud-compute-0 heat-admin]# service send_lldp status -l Redirecting to /bin/systemctl status -l send_lldp.service send_lldp.service - send lldp Loaded: loaded (/usr/lib/systemd/system/send_lldp.service; enabled) Active: active (running) since Thu 2015-08-20 13:09:55 EDT; 2min 36s ago Main PID: 791 (python) CGroup: /system.slice/send_lldp.service └─791 python /bin/send_lldp --system-desc 5c:16:c7:00:00:00 --system-name %(uname)s -i 10 Aug 20 13:09:55 localhost.localdomain systemd[1]: Starting send lldp... Aug 20 13:09:55 localhost.localdomain systemd[1]: Started send lldp. 13:14:35.848013 LLDP, length 84 Chassis ID TLV (1), length 17 Subtype Local (7): Big Cloud Fabric 0x0000: 0742 6967 2043 6c6f 7564 2046 6162 7269 0x0010: 63 Port ID TLV (2), length 5 Subtype Interface alias (1): eth0 0x0000: 0165 7468 30 Time to Live TLV (3), length 2: TTL 120s 0x0000: 0078 System Name TLV (5), length 31: overcloud-compute-0.localdomain 0x0000: 6f76 6572 636c 6f75 642d 636f 6d70 7574 0x0010: 652d 302e 6c6f 6361 6c64 6f6d 6169 6e System Description TLV (6), length 17 5c:16:c7:00:00:00 0x0000: 3563 3a31 363a 6337 3a30 303a 3030 3a30 0x0010: 30 End TLV (0), length 0
Cool! let us quickly try this workaround today. Will keep you posted. Thanks a lot :)
Thanks Marios! I have a github repo where I'll be updating files. Feel free to fork. I'll integrate your changes and also update it to get interface information from the system. Here's the link to the repo: https://github.com/wolverineav/python-networking-bigswitch-bsnlldp *send_lldp is now bsnlldp.
LLDP rpm package is ready. This package automatically figure out host fqdn and nic name Spec URL: wget https://bigswitch.box.com/shared/static/6z9kkxoyx77r3hymyfpu105a6hbariiw.spec -O python-networking-bigswitch-bsnlldp.spec SRPM URL: wget https://bigswitch.box.com/shared/static/3bvtnxxn1jawz5p6sjsckkrrux7kz53s.rpm -O python-networking-bigswitch-bsnlldp-3.0.0-1.el7.centos.src.rpm RPM wget https://bigswitch.box.com/shared/static/ims16o3h295788fqwo7hadia4niqaxg1.rpm -O python-networking-bigswitch-bsnlldp-3.0.0-1.el7.centos.noarch.rpm We need to start the review process for this package as well.
The key question is: this LLDP rpm package has nothing to do with openstack. It is just needed for Big Switch solution to automatically form bonds. Do we still have to have it reviewed in RDO and than RHOSP?
Just to note, the original ask was for lldpad, but that came from a BigSwitch engineer who was unaware that BigSwitch had their own Python script to send LLDP packets. We may add lldpad as a service to future OpenStack images. If and when we do, we will need to add a boolean so that the lldpad service can be turned off when using BigSwitch.
So for now, this is closed since the inclusion is not really needed at this point. Closing this bug for now.