Description of problem: During upgrade of OSP-9 to OSP-10, the upgrade of the openvswitch package cleans up all ips on the controller making the upgrade falis. Version-Release number of selected component (if applicable): openvswitch-2.4.0-1.el7.x86_64 to openvswitch-2.5.0-3.el7.x86_64 How reproducible: always Steps to Reproduce: 1. Well I guess that just ugrading the package should show the problem. Actual results: clean up of all ips on the crontroller 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UP qlen 1000 link/ether 00:a8:94:21:61:39 brd ff:ff:ff:ff:ff:ff inet6 fe80::2a8:94ff:fe21:6139/64 scope link valid_lft forever preferred_lft forever 3: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN link/ether aa:b4:d7:24:62:c5 brd ff:ff:ff:ff:ff:ff 12: vlan10: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN link/ether d6:4a:74:67:1c:e9 brd ff:ff:ff:ff:ff:ff 13: vlan20: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN link/ether fa:93:78:4b:31:f9 brd ff:ff:ff:ff:ff:ff 14: vlan40: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN link/ether da:b9:ee:92:85:7e brd ff:ff:ff:ff:ff:ff 15: vlan50: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN link/ether 1a:ea:62:67:31:e7 brd ff:ff:ff:ff:ff:ff 16: vlan30: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN link/ether de:4c:10:bd:96:f3 brd ff:ff:ff:ff:ff:ff 17: br-ex: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN link/ether 00:a8:94:21:61:39 brd ff:ff:ff:ff:ff:ff 18: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN link/ether b6:ef:9a:fe:44:42 brd ff:ff:ff:ff:ff:ff 19: br-tun: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN link/ether d6:12:60:1f:a9:45 brd ff:ff:ff:ff:ff:ff Expected results: successful upgrade of openvswitch Additional info: Part of the yum upgrade relevant to the problem: Updating : openvswitch-2.5.0-3.el7.x86_64 149/476 ... Cleanup : openvswitch-2.4.0-1.el7.x86_64 474/476 ^^^ at this point the ssh connection died A simple systemctl restart network from the console makes everything working again.
Upstream bug: https://bugs.launchpad.net/neutron/+bug/1514056
Still not sure the above patch solve the problem. I upgraded osp-9 using https://review.openstack.org/#/c/348889/. After setup, all my bridge but br-ex are in secure mode: ovs-vsctl show | grep -B1 secure Bridge br-int fail_mode: secure -- Bridge br-tun fail_mode: secure but the upgrade fails with lost connectivity. After restoring it, I did: yum downgrade openvswitch It installed openvswitch-2.0.0-7.el7.x86_64 and everything was fine After I did: yum install ftp://ftp.icm.edu.pl/vol/rzm5/linux-slc/centos/7.1.1503/cloud/x86_64/openstack-kilo/common/openvswitch-2.4.0-1.el7.x86_64.rpm And everything was ok. Then I did yum upgrade I had to do a systemctl restart network to have this output Running transaction Updating : openvswitch-2.5.0-3.el7.x86_64 1/2 Cleanup : openvswitch-2.4.0-1.el7.x86_64 2/2 Verifying : openvswitch-2.5.0-3.el7.x86_64 1/2 Verifying : openvswitch-2.4.0-1.el7.x86_64 2/2 Updated: openvswitch.x86_64 0:2.5.0-3.el7 without it the connection was lost. So it's either a new "feature" in openvswitch-2.5 or the spec of the rpm which is not good. Trying to set the br-ex to secure (ovs-vsctl set-fail-mode br-ex secure) is a no go as there is no controller associated with br-ex (If I understand all correctly). The net result of running the above command is ... immediate lost of connectivity. For the time being I'm going to pin openvswitch to 2.4 during the upgrade.
I confirm that it nothing to do with the upstream patch mentionned by Lukas. Pinning openvswitch "solves" it. --- extraconfig/tasks/major_upgrade_controller_pacemaker_1.sh.orig 2016-08-12 06:01:20.900145477 -0400 +++ extraconfig/tasks/major_upgrade_controller_pacemaker_1.sh 2016-08-12 10:45:55.870145477 -0400 @@ -145,7 +145,6 @@ yum -y install python-zaqarclient # needed for os-collect-config +yum -y install yum-plugin-versionlock +yum versionlock openvswitch yum -y -q update
Note, this happen on all upgraded note so: upgrade-non-controller.sh --upgrade overcloud-objectstorage-0 .... Cleanup : 1:librados2-0.94.5-14.el7cp.x86_64 475/481 Cleanup : parted-3.1-23.el7.x86_64 476/481 Cleanup : gperftools-libs-2.4-7.el7.x86_64 477/481 Cleanup : python-pandas-0.17.0-1.el7ost.x86_64 478/481 Cleanup : openvswitch-2.4.0-1.el7.x86_64 479/481 Write failed: Broken pipe Will hang up at the cleanup stage of the openvswitch and then fails. The same pinning must go into all upgrade scripts in tripleo heat template: extraconfig/tasks/major_upgrade_ceph_storage.sh extraconfig/tasks/major_upgrade_compute.sh extraconfig/tasks/major_upgrade_object_storage.sh
Looks like duplicate of bug 1371840 (or the other way around) to me.
Hi, Yep this is. So this is a WONTFIX as well ? Is the pinning the acceptable solution for OSP-9 to OSP-10 upgrade ? When the 2.6 version will be available ?
Closing as dupe, lets keep the related discussion in the main bug. *** This bug has been marked as a duplicate of bug 1371840 ***
Hey, reopening this one because the duplicate at https://bugzilla.redhat.com/show_bug.cgi?id=1371840 is marked as 'wontfix'... I'll use this bug to track the workaround we will carry in the upgrade/update to deal with the openvswitch update.
also retargetting to tripleo-heat-templates since we'll be carrying a workaround for the issue there
changed the upstream review to point to newton @ https://review.openstack.org/#/c/389753/ (master merged a little while ago)
Please note that the full context around how we came to use this workaround is in BZ 1371840 and also BZ 1385096
We still need to backport this to mitaka and liberty ...
Adding a note for reference... there is a related BZ at https://bugzilla.redhat.com/show_bug.cgi?id=1388675 with its own upstream bug and review (linked there) which is a follow on from the fix landed here (the fix there adds the --replacepkgs in case latest ovs was already installed and fixes a syntax nit with the ceph upgrade script)
also adding a link to https://review.openstack.org/#/c/390792/ since you need both reviews for the 'complete' ovs upgrade workaround.
marios - is that another duplicate of one of these : https://bugzilla.redhat.com/show_bug.cgi?id=1386299 - https://bugzilla.redhat.com/show_bug.cgi?id=1385096
(In reply to Omri Hochman from comment #17) > marios - is that another duplicate of one of these : > > https://bugzilla.redhat.com/show_bug.cgi?id=1386299 - ^^^ No I don't think so, though I did think they may be related at one point (we have discussed this on lifecycle scrum). in BZ 1386299 the problem appears when you reboot. Here the problem was just upgrading openvswitch from 2.4 to 2.5 (no reboot needed, IPs disappeared just by doing the yum update). > https://bugzilla.redhat.com/show_bug.cgi?id=1385096 ^^^ No though they definitely *are* related. BZ 1385096 is tracking the problem we are 'fixing' here against openvswitch. In this bug we worked around the problem with the review linked in external trackers. thanks
Deployed RHOS 9 latest Upgraded to RHOS 10 with latest puddle (2016-11-14.1) I no longer see this issue. openvswitch-2.5.0-14.git20160727.el7fdp.x86_64 python-openvswitch-2.5.0-14.git20160727.el7fdp.noarch openstack-neutron-openvswitch-9.1.0-4.el7ost.noarch
adding the one more fix needed here @ https://review.openstack.org/#/c/401195/ waiting on CI to merge to stable/newton then we can move this back to POST
https://review.openstack.org/#/c/401195/ landed newton moving POST
Verified with openstack-tripleo-heat-templates-5.1.0-6.el7ost.noarch After upgrade to osp10, I've rebooted all the OC nodes and check that all overcloud nodes are still reachable after return from reboot.
(In reply to Omri Hochman from comment #25) > Verified with openstack-tripleo-heat-templates-5.1.0-6.el7ost.noarch > > > After upgrade to osp10, I've rebooted all the OC nodes and check that all > overcloud nodes are still reachable after return from reboot. just to be clear, this BZ is about the openvswitch 2.4-2.5 upgrade which causes nodes to lose IPs (the reboot one was also openvswitch but a different issue). However the fact that you successfully upgraded w/out problem (i.e. nodes don't lose IPs during the yum update on a given node) is enough to verify here.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-2948.html