|Summary:||Upgrade of openvswitch-2.4.0-1.el7 makes ip disappears. (osp10)|
|Product:||Red Hat OpenStack||Reporter:||Sofer Athlan-Guyot <sathlang>|
|Component:||openstack-tripleo-heat-templates||Assignee:||Marios Andreou <mandreou>|
|Status:||CLOSED ERRATA||QA Contact:||Omri Hochman <ohochman>|
|Version:||10.0 (Newton)||CC:||aloughla, apevec, chrisw, jcoufal, jschluet, lbezdick, mandreou, mburns, mlammon, rhel-osp-director-maint, rhos-maint, sathlang, srevivo|
|Target Milestone:||rc||Keywords:||Reopened, Triaged|
|Target Release:||10.0 (Newton)|
|Fixed In Version:||openstack-tripleo-heat-templates-5.1.0-6.el7ost||Doc Type:||If docs needed, set a value|
|Doc Text:||Story Points:||---|
|:||1388543 1388546 (view as bug list)||Environment:|
|Last Closed:||2016-12-14 15:49:41 UTC||Type:||Bug|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Bug Depends On:|
|Bug Blocks:||1337794, 1388543, 1388546, 1394322|
Description Sofer Athlan-Guyot 2016-08-05 16:11:39 UTC
Description of problem: During upgrade of OSP-9 to OSP-10, the upgrade of the openvswitch package cleans up all ips on the controller making the upgrade falis. Version-Release number of selected component (if applicable): openvswitch-2.4.0-1.el7.x86_64 to openvswitch-2.5.0-3.el7.x86_64 How reproducible: always Steps to Reproduce: 1. Well I guess that just ugrading the package should show the problem. Actual results: clean up of all ips on the crontroller 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UP qlen 1000 link/ether 00:a8:94:21:61:39 brd ff:ff:ff:ff:ff:ff inet6 fe80::2a8:94ff:fe21:6139/64 scope link valid_lft forever preferred_lft forever 3: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN link/ether aa:b4:d7:24:62:c5 brd ff:ff:ff:ff:ff:ff 12: vlan10: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN link/ether d6:4a:74:67:1c:e9 brd ff:ff:ff:ff:ff:ff 13: vlan20: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN link/ether fa:93:78:4b:31:f9 brd ff:ff:ff:ff:ff:ff 14: vlan40: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN link/ether da:b9:ee:92:85:7e brd ff:ff:ff:ff:ff:ff 15: vlan50: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN link/ether 1a:ea:62:67:31:e7 brd ff:ff:ff:ff:ff:ff 16: vlan30: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN link/ether de:4c:10:bd:96:f3 brd ff:ff:ff:ff:ff:ff 17: br-ex: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN link/ether 00:a8:94:21:61:39 brd ff:ff:ff:ff:ff:ff 18: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN link/ether b6:ef:9a:fe:44:42 brd ff:ff:ff:ff:ff:ff 19: br-tun: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN link/ether d6:12:60:1f:a9:45 brd ff:ff:ff:ff:ff:ff Expected results: successful upgrade of openvswitch Additional info: Part of the yum upgrade relevant to the problem: Updating : openvswitch-2.5.0-3.el7.x86_64 149/476 ... Cleanup : openvswitch-2.4.0-1.el7.x86_64 474/476 ^^^ at this point the ssh connection died A simple systemctl restart network from the console makes everything working again.
Comment 2 Lukas Bezdicka 2016-08-08 14:46:16 UTC
Upstream bug: https://bugs.launchpad.net/neutron/+bug/1514056
Comment 3 Sofer Athlan-Guyot 2016-08-10 13:48:36 UTC
Still not sure the above patch solve the problem. I upgraded osp-9 using https://review.openstack.org/#/c/348889/. After setup, all my bridge but br-ex are in secure mode: ovs-vsctl show | grep -B1 secure Bridge br-int fail_mode: secure -- Bridge br-tun fail_mode: secure but the upgrade fails with lost connectivity. After restoring it, I did: yum downgrade openvswitch It installed openvswitch-2.0.0-7.el7.x86_64 and everything was fine After I did: yum install ftp://ftp.icm.edu.pl/vol/rzm5/linux-slc/centos/7.1.1503/cloud/x86_64/openstack-kilo/common/openvswitch-2.4.0-1.el7.x86_64.rpm And everything was ok. Then I did yum upgrade I had to do a systemctl restart network to have this output Running transaction Updating : openvswitch-2.5.0-3.el7.x86_64 1/2 Cleanup : openvswitch-2.4.0-1.el7.x86_64 2/2 Verifying : openvswitch-2.5.0-3.el7.x86_64 1/2 Verifying : openvswitch-2.4.0-1.el7.x86_64 2/2 Updated: openvswitch.x86_64 0:2.5.0-3.el7 without it the connection was lost. So it's either a new "feature" in openvswitch-2.5 or the spec of the rpm which is not good. Trying to set the br-ex to secure (ovs-vsctl set-fail-mode br-ex secure) is a no go as there is no controller associated with br-ex (If I understand all correctly). The net result of running the above command is ... immediate lost of connectivity. For the time being I'm going to pin openvswitch to 2.4 during the upgrade.
Comment 4 Sofer Athlan-Guyot 2016-08-12 14:48:29 UTC
I confirm that it nothing to do with the upstream patch mentionned by Lukas. Pinning openvswitch "solves" it. --- extraconfig/tasks/major_upgrade_controller_pacemaker_1.sh.orig 2016-08-12 06:01:20.900145477 -0400 +++ extraconfig/tasks/major_upgrade_controller_pacemaker_1.sh 2016-08-12 10:45:55.870145477 -0400 @@ -145,7 +145,6 @@ yum -y install python-zaqarclient # needed for os-collect-config +yum -y install yum-plugin-versionlock +yum versionlock openvswitch yum -y -q update
Comment 5 Sofer Athlan-Guyot 2016-08-26 18:27:00 UTC
Note, this happen on all upgraded note so: upgrade-non-controller.sh --upgrade overcloud-objectstorage-0 .... Cleanup : 1:librados2-0.94.5-14.el7cp.x86_64 475/481 Cleanup : parted-3.1-23.el7.x86_64 476/481 Cleanup : gperftools-libs-2.4-7.el7.x86_64 477/481 Cleanup : python-pandas-0.17.0-1.el7ost.x86_64 478/481 Cleanup : openvswitch-2.4.0-1.el7.x86_64 479/481 Write failed: Broken pipe Will hang up at the cleanup stage of the openvswitch and then fails. The same pinning must go into all upgrade scripts in tripleo heat template: extraconfig/tasks/major_upgrade_ceph_storage.sh extraconfig/tasks/major_upgrade_compute.sh extraconfig/tasks/major_upgrade_object_storage.sh
Comment 6 Panu Matilainen 2016-09-01 07:23:35 UTC
Looks like duplicate of bug 1371840 (or the other way around) to me.
Comment 7 Sofer Athlan-Guyot 2016-09-01 16:43:49 UTC
Hi, Yep this is. So this is a WONTFIX as well ? Is the pinning the acceptable solution for OSP-9 to OSP-10 upgrade ? When the 2.6 version will be available ?
Comment 8 Panu Matilainen 2016-09-14 12:59:42 UTC
Closing as dupe, lets keep the related discussion in the main bug. *** This bug has been marked as a duplicate of bug 1371840 ***
Comment 9 Marios Andreou 2016-10-20 14:17:00 UTC
Hey, reopening this one because the duplicate at https://bugzilla.redhat.com/show_bug.cgi?id=1371840 is marked as 'wontfix'... I'll use this bug to track the workaround we will carry in the upgrade/update to deal with the openvswitch update.
Comment 10 Marios Andreou 2016-10-20 14:19:49 UTC
also retargetting to tripleo-heat-templates since we'll be carrying a workaround for the issue there
Comment 11 Marios Andreou 2016-10-21 16:44:08 UTC
changed the upstream review to point to newton @ https://review.openstack.org/#/c/389753/ (master merged a little while ago)
Comment 12 Marios Andreou 2016-10-21 16:49:12 UTC
Please note that the full context around how we came to use this workaround is in BZ 1371840 and also BZ 1385096
Comment 13 Marios Andreou 2016-10-25 15:19:49 UTC
We still need to backport this to mitaka and liberty ...
Comment 15 Marios Andreou 2016-10-31 13:49:08 UTC
Adding a note for reference... there is a related BZ at https://bugzilla.redhat.com/show_bug.cgi?id=1388675 with its own upstream bug and review (linked there) which is a follow on from the fix landed here (the fix there adds the --replacepkgs in case latest ovs was already installed and fixes a syntax nit with the ceph upgrade script)
Comment 16 Marios Andreou 2016-10-31 13:57:21 UTC
also adding a link to https://review.openstack.org/#/c/390792/ since you need both reviews for the 'complete' ovs upgrade workaround.
Comment 17 Omri Hochman 2016-11-07 21:55:24 UTC
marios - is that another duplicate of one of these : https://bugzilla.redhat.com/show_bug.cgi?id=1386299 - https://bugzilla.redhat.com/show_bug.cgi?id=1385096
Comment 18 Marios Andreou 2016-11-08 07:52:58 UTC
(In reply to Omri Hochman from comment #17) > marios - is that another duplicate of one of these : > > https://bugzilla.redhat.com/show_bug.cgi?id=1386299 - ^^^ No I don't think so, though I did think they may be related at one point (we have discussed this on lifecycle scrum). in BZ 1386299 the problem appears when you reboot. Here the problem was just upgrading openvswitch from 2.4 to 2.5 (no reboot needed, IPs disappeared just by doing the yum update). > https://bugzilla.redhat.com/show_bug.cgi?id=1385096 ^^^ No though they definitely *are* related. BZ 1385096 is tracking the problem we are 'fixing' here against openvswitch. In this bug we worked around the problem with the review linked in external trackers. thanks
Comment 20 mlammon 2016-11-15 19:06:08 UTC
Deployed RHOS 9 latest Upgraded to RHOS 10 with latest puddle (2016-11-14.1) I no longer see this issue. openvswitch-2.5.0-14.git20160727.el7fdp.x86_64 python-openvswitch-2.5.0-14.git20160727.el7fdp.noarch openstack-neutron-openvswitch-9.1.0-4.el7ost.noarch
Comment 22 Marios Andreou 2016-11-23 18:20:39 UTC
adding the one more fix needed here @ https://review.openstack.org/#/c/401195/ waiting on CI to merge to stable/newton then we can move this back to POST
Comment 23 Marios Andreou 2016-11-24 10:16:09 UTC
https://review.openstack.org/#/c/401195/ landed newton moving POST
Comment 25 Omri Hochman 2016-11-29 21:07:31 UTC
Verified with openstack-tripleo-heat-templates-5.1.0-6.el7ost.noarch After upgrade to osp10, I've rebooted all the OC nodes and check that all overcloud nodes are still reachable after return from reboot.
Comment 26 Marios Andreou 2016-11-30 08:35:58 UTC
(In reply to Omri Hochman from comment #25) > Verified with openstack-tripleo-heat-templates-5.1.0-6.el7ost.noarch > > > After upgrade to osp10, I've rebooted all the OC nodes and check that all > overcloud nodes are still reachable after return from reboot. just to be clear, this BZ is about the openvswitch 2.4-2.5 upgrade which causes nodes to lose IPs (the reboot one was also openvswitch but a different issue). However the fact that you successfully upgraded w/out problem (i.e. nodes don't lose IPs during the yum update on a given node) is enough to verify here.
Comment 28 errata-xmlrpc 2016-12-14 15:49:41 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-2948.html