Bug 1431108 - ovs 2.5 to 2.6 tripleo upgrade&update need to special case openvswitch upgrade
Summary: ovs 2.5 to 2.6 tripleo upgrade&update need to special case openvswitch upgrade
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 11.0 (Ocata)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 11.0 (Ocata)
Assignee: Marios Andreou
QA Contact: Marius Cornea
URL:
Whiteboard:
Depends On:
Blocks: 1431115
TreeView+ depends on / blocked
 
Reported: 2017-03-10 12:12 UTC by Marios Andreou
Modified: 2017-05-17 20:06 UTC (History)
9 users (show)

Fixed In Version: openstack-tripleo-heat-templates-6.0.0-4.el7ost puppet-vswitch-6.3.0-2.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1431115 (view as bug list)
Environment:
Last Closed: 2017-05-17 20:06:32 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:1245 normal SHIPPED_LIVE Red Hat OpenStack Platform 11.0 Bug Fix and Enhancement Advisory 2017-05-17 23:01:50 UTC
OpenStack gerrit 441184 None None None 2017-03-10 12:16:05 UTC
OpenStack gerrit 451231 None None None 2017-04-06 10:56:55 UTC
OpenStack gerrit 452524 None None None 2017-04-02 20:41:12 UTC
OpenStack gerrit 453197 None None None 2017-04-04 15:53:12 UTC
Launchpad 1669714 None None None 2017-03-10 12:14:54 UTC

Description Marios Andreou 2017-03-10 12:12:45 UTC
Description of problem:

As described in the upstream bug at https://bugs.launchpad.net/tripleo/+bug/1669714 wherever openvswitch 2.6 becomes available, any attempt to perform major upgrade or minor update on those nodes is subject to node connectivity problems and ultimately need reboot. We need to remove the special case handling that was previously required when we were going from 2.4 to 2.5.

Comment 2 Marios Andreou 2017-03-28 09:00:31 UTC
Update: seems like we should *still* carry a special case upgrade for openvswitch and specifically ovs 2.5.0-14 - I've decided to use the same bug in an attempt to minimize the inevitable confusion here :(

Please see the discussion at https://bugzilla.redhat.com/show_bug.cgi?id=1424945#c11 for more information but essentially the workaround is the same as the one we previously had, with the addition of the '--notriggerun' flag for the package update.

I have just posted https://review.openstack.org/450607 Add special case upgrade from openvswitch 2.5.0-14 and matbu has this for the ansible steps at https://review.openstack.org/#/c/434346/

moving back to assigned.

Comment 3 Marios Andreou 2017-03-30 14:27:48 UTC
just changed the title... the bug was originally tracking removal of openvswitch workaround, which we did. Now we are using the same bug to re-add the workaround along with the extra flag.

Comment 4 Sofer Athlan-Guyot 2017-04-04 08:26:31 UTC
Adding puppet-vswitch patch that ensure puppet is working with dpdk openvswitch 2.6.

Comment 5 Sofer Athlan-Guyot 2017-04-04 15:53:12 UTC
Point to puppet-vswitch ocata.

Comment 6 Sofer Athlan-Guyot 2017-04-06 10:56:56 UTC
Point openvswitch exception to stable/ocata.

Comment 7 Sofer Athlan-Guyot 2017-04-07 08:02:47 UTC
Everything merged in stable/ocata.

Comment 12 Marius Cornea 2017-04-24 09:58:12 UTC
I would like to verify this bug but it's not clear to me what's the proper way to do the verification. During OSP10->11 upgrade we should run 'rpm -U --replacepkgs --notriggerun --nopostun $ovs_package' as we're upgrading from 2.5.0-14. Is it enough to check that this command was run during upgrade and instances connectivity is not disrupted? Is there any additional step that needs to be run to make sure openvswitch was upgraded correctly? Thanks!

Comment 13 Marios Andreou 2017-04-24 10:00:53 UTC
hey marius, yeah checking that command was executed would be great. Really though the verification here is the absence of any network/interface issues during the upgrade - would be great to also confirm before/after versions of openvswitch (i.e. i had no issues going from ovs2.5.x to 2.6.x and afaics it ran the --nopostun rpm install would be ideal

will leave the needinfo incase network team wants to add to this

Comment 15 Marius Cornea 2017-04-25 17:59:03 UTC
On controllers after running major-upgrade-composable-steps.yaml:

[root@overcloud-controller-0 ~]# rpm -qa | grep ^openvswitch
openvswitch-2.6.1-10.git20161206.el7fdp.x86_64

Checking the yum.log we can see that it hasn't been update via yum:

[root@overcloud-controller-0 ~]# grep openvswitch /var/log/yum.log 
Apr 25 16:22:40 Updated: python-openvswitch-2.6.1-10.git20161206.el7fdp.noarch
Apr 25 16:23:30 Updated: 1:openstack-neutron-openvswitch-10.0.1-1.el7ost.noarch

[root@overcloud-controller-0 ~]# ls /root/OVS_UPGRADE/openvswitch-2.6.1-10.git20161206.el7fdp.x86_64.rpm 
/root/OVS_UPGRADE/openvswitch-2.6.1-10.git20161206.el7fdp.x86_64.rpm

OVS 2.5 is still loaded:

[root@overcloud-controller-0 ~]# ovs-vsctl show | grep ovs_version
    ovs_version: "2.5.0"

No network connectivity issues showed up during this step.

After one of the controller reboot we can see the new OVS version is loaded:
[root@overcloud-controller-1 heat-admin]# ovs-vsctl show | grep ovs_version
    ovs_version: "2.6.1"

Tunnels are set up: http://paste.openstack.org/show/607894/

All the agents are up:
http://paste.openstack.org/show/607895/

On compute node we can see in the log the special case upgrade of openvswitch:

Tue Apr 25 13:22:59 EDT 2017 upgrade-non-controller.sh Executing /root/tripleo_upgrade_node.sh on 192.168.0.21
 "nova_compute",
openvswitch-2.5.0-14.git20160727.el7fdp.x86_64
Manual upgrade of openvswitch - ovs-2.5.0-14 or restart in postun detected
/home/heat-admin/OVS_UPGRADE /home/heat-admin
Attempting to downloading latest openvswitch with yumdownloader
Loaded plugins: product-id
Repository rhelosp-fdp-pending is listed more than once in the configuration
--> Running transaction check
---> Package openvswitch.x86_64 0:2.6.1-10.git20161206.el7fdp will be installed
--> Finished Dependency Resolution
Updating openvswitch-2.6.1-10.git20161206.el7fdp.x86_64.rpm with --nopostun --notriggerun
/home/heat-admin

Once upgrade has finished:

[root@overcloud-compute-0 ~]# rpm -qa | grep ^openvswitch
openvswitch-2.6.1-10.git20161206.el7fdp.x86_64
o[root@overcloud-compute-0 ~]# ovs-vsctl show | grep ovs_version
    ovs_version: "2.5.0"

Instance running on this node is still reachable.

After compute node reboot:
[root@overcloud-compute-0 heat-admin]# ovs-vsctl show | grep ovs_version
    ovs_version: "2.6.1"

The compute node can take new workloads which are reachable.

Agents look good on all nodes:
http://paste.openstack.org/show/607899/

Given that I wasn't able to hit any issues related to the openvswitch package upgrade I am moving this bug to verified state.

Comment 16 errata-xmlrpc 2017-05-17 20:06:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1245


Note You need to log in before you can comment on or make changes to this bug.