Description of problem:
As described in the upstream bug at https://bugs.launchpad.net/tripleo/+bug/1669714 wherever openvswitch 2.6 becomes available, any attempt to perform major upgrade or minor update on those nodes is subject to node connectivity problems and ultimately need reboot. We need to remove the special case handling that was previously required when we were going from 2.4 to 2.5.
Update: seems like we should *still* carry a special case upgrade for openvswitch and specifically ovs 2.5.0-14 - I've decided to use the same bug in an attempt to minimize the inevitable confusion here :(
Please see the discussion at https://bugzilla.redhat.com/show_bug.cgi?id=1424945#c11 for more information but essentially the workaround is the same as the one we previously had, with the addition of the '--notriggerun' flag for the package update.
I have just posted https://review.openstack.org/450607 Add special case upgrade from openvswitch 2.5.0-14 and matbu has this for the ansible steps at https://review.openstack.org/#/c/434346/
moving back to assigned.
just changed the title... the bug was originally tracking removal of openvswitch workaround, which we did. Now we are using the same bug to re-add the workaround along with the extra flag.
Adding puppet-vswitch patch that ensure puppet is working with dpdk openvswitch 2.6.
Point to puppet-vswitch ocata.
Point openvswitch exception to stable/ocata.
Everything merged in stable/ocata.
I would like to verify this bug but it's not clear to me what's the proper way to do the verification. During OSP10->11 upgrade we should run 'rpm -U --replacepkgs --notriggerun --nopostun $ovs_package' as we're upgrading from 2.5.0-14. Is it enough to check that this command was run during upgrade and instances connectivity is not disrupted? Is there any additional step that needs to be run to make sure openvswitch was upgraded correctly? Thanks!
hey marius, yeah checking that command was executed would be great. Really though the verification here is the absence of any network/interface issues during the upgrade - would be great to also confirm before/after versions of openvswitch (i.e. i had no issues going from ovs2.5.x to 2.6.x and afaics it ran the --nopostun rpm install would be ideal
will leave the needinfo incase network team wants to add to this
On controllers after running major-upgrade-composable-steps.yaml:
[root@overcloud-controller-0 ~]# rpm -qa | grep ^openvswitch
Checking the yum.log we can see that it hasn't been update via yum:
[root@overcloud-controller-0 ~]# grep openvswitch /var/log/yum.log
Apr 25 16:22:40 Updated: python-openvswitch-2.6.1-10.git20161206.el7fdp.noarch
Apr 25 16:23:30 Updated: 1:openstack-neutron-openvswitch-10.0.1-1.el7ost.noarch
[root@overcloud-controller-0 ~]# ls /root/OVS_UPGRADE/openvswitch-2.6.1-10.git20161206.el7fdp.x86_64.rpm
OVS 2.5 is still loaded:
[root@overcloud-controller-0 ~]# ovs-vsctl show | grep ovs_version
No network connectivity issues showed up during this step.
After one of the controller reboot we can see the new OVS version is loaded:
[root@overcloud-controller-1 heat-admin]# ovs-vsctl show | grep ovs_version
Tunnels are set up: http://paste.openstack.org/show/607894/
All the agents are up:
On compute node we can see in the log the special case upgrade of openvswitch:
Tue Apr 25 13:22:59 EDT 2017 upgrade-non-controller.sh Executing /root/tripleo_upgrade_node.sh on 192.168.0.21
Manual upgrade of openvswitch - ovs-2.5.0-14 or restart in postun detected
Attempting to downloading latest openvswitch with yumdownloader
Loaded plugins: product-id
Repository rhelosp-fdp-pending is listed more than once in the configuration
--> Running transaction check
---> Package openvswitch.x86_64 0:2.6.1-10.git20161206.el7fdp will be installed
--> Finished Dependency Resolution
Updating openvswitch-2.6.1-10.git20161206.el7fdp.x86_64.rpm with --nopostun --notriggerun
Once upgrade has finished:
[root@overcloud-compute-0 ~]# rpm -qa | grep ^openvswitch
o[root@overcloud-compute-0 ~]# ovs-vsctl show | grep ovs_version
Instance running on this node is still reachable.
After compute node reboot:
[root@overcloud-compute-0 heat-admin]# ovs-vsctl show | grep ovs_version
The compute node can take new workloads which are reachable.
Agents look good on all nodes:
Given that I wasn't able to hit any issues related to the openvswitch package upgrade I am moving this bug to verified state.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.