Bug 1364540
Summary: | Upgrade of openvswitch-2.4.0-1.el7 makes ip disappears. (osp10) | |||
---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Sofer Athlan-Guyot <sathlang> | |
Component: | openstack-tripleo-heat-templates | Assignee: | Marios Andreou <mandreou> | |
Status: | CLOSED ERRATA | QA Contact: | Omri Hochman <ohochman> | |
Severity: | medium | Docs Contact: | ||
Priority: | urgent | |||
Version: | 10.0 (Newton) | CC: | aloughla, apevec, chrisw, jcoufal, jschluet, lbezdick, mandreou, mburns, mlammon, rhel-osp-director-maint, rhos-maint, sathlang, srevivo | |
Target Milestone: | rc | Keywords: | Reopened, Triaged | |
Target Release: | 10.0 (Newton) | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | openstack-tripleo-heat-templates-5.1.0-6.el7ost | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1388543 1388546 (view as bug list) | Environment: | ||
Last Closed: | 2016-12-14 15:49:41 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1337794, 1388543, 1388546, 1394322 |
Description
Sofer Athlan-Guyot
2016-08-05 16:11:39 UTC
Upstream bug: https://bugs.launchpad.net/neutron/+bug/1514056 Still not sure the above patch solve the problem. I upgraded osp-9 using https://review.openstack.org/#/c/348889/. After setup, all my bridge but br-ex are in secure mode: ovs-vsctl show | grep -B1 secure Bridge br-int fail_mode: secure -- Bridge br-tun fail_mode: secure but the upgrade fails with lost connectivity. After restoring it, I did: yum downgrade openvswitch It installed openvswitch-2.0.0-7.el7.x86_64 and everything was fine After I did: yum install ftp://ftp.icm.edu.pl/vol/rzm5/linux-slc/centos/7.1.1503/cloud/x86_64/openstack-kilo/common/openvswitch-2.4.0-1.el7.x86_64.rpm And everything was ok. Then I did yum upgrade I had to do a systemctl restart network to have this output Running transaction Updating : openvswitch-2.5.0-3.el7.x86_64 1/2 Cleanup : openvswitch-2.4.0-1.el7.x86_64 2/2 Verifying : openvswitch-2.5.0-3.el7.x86_64 1/2 Verifying : openvswitch-2.4.0-1.el7.x86_64 2/2 Updated: openvswitch.x86_64 0:2.5.0-3.el7 without it the connection was lost. So it's either a new "feature" in openvswitch-2.5 or the spec of the rpm which is not good. Trying to set the br-ex to secure (ovs-vsctl set-fail-mode br-ex secure) is a no go as there is no controller associated with br-ex (If I understand all correctly). The net result of running the above command is ... immediate lost of connectivity. For the time being I'm going to pin openvswitch to 2.4 during the upgrade. I confirm that it nothing to do with the upstream patch mentionned by Lukas. Pinning openvswitch "solves" it. --- extraconfig/tasks/major_upgrade_controller_pacemaker_1.sh.orig 2016-08-12 06:01:20.900145477 -0400 +++ extraconfig/tasks/major_upgrade_controller_pacemaker_1.sh 2016-08-12 10:45:55.870145477 -0400 @@ -145,7 +145,6 @@ yum -y install python-zaqarclient # needed for os-collect-config +yum -y install yum-plugin-versionlock +yum versionlock openvswitch yum -y -q update Note, this happen on all upgraded note so: upgrade-non-controller.sh --upgrade overcloud-objectstorage-0 .... Cleanup : 1:librados2-0.94.5-14.el7cp.x86_64 475/481 Cleanup : parted-3.1-23.el7.x86_64 476/481 Cleanup : gperftools-libs-2.4-7.el7.x86_64 477/481 Cleanup : python-pandas-0.17.0-1.el7ost.x86_64 478/481 Cleanup : openvswitch-2.4.0-1.el7.x86_64 479/481 Write failed: Broken pipe Will hang up at the cleanup stage of the openvswitch and then fails. The same pinning must go into all upgrade scripts in tripleo heat template: extraconfig/tasks/major_upgrade_ceph_storage.sh extraconfig/tasks/major_upgrade_compute.sh extraconfig/tasks/major_upgrade_object_storage.sh Looks like duplicate of bug 1371840 (or the other way around) to me. Hi, Yep this is. So this is a WONTFIX as well ? Is the pinning the acceptable solution for OSP-9 to OSP-10 upgrade ? When the 2.6 version will be available ? Closing as dupe, lets keep the related discussion in the main bug. *** This bug has been marked as a duplicate of bug 1371840 *** Hey, reopening this one because the duplicate at https://bugzilla.redhat.com/show_bug.cgi?id=1371840 is marked as 'wontfix'... I'll use this bug to track the workaround we will carry in the upgrade/update to deal with the openvswitch update. also retargetting to tripleo-heat-templates since we'll be carrying a workaround for the issue there changed the upstream review to point to newton @ https://review.openstack.org/#/c/389753/ (master merged a little while ago) Please note that the full context around how we came to use this workaround is in BZ 1371840 and also BZ 1385096 We still need to backport this to mitaka and liberty ... Adding a note for reference... there is a related BZ at https://bugzilla.redhat.com/show_bug.cgi?id=1388675 with its own upstream bug and review (linked there) which is a follow on from the fix landed here (the fix there adds the --replacepkgs in case latest ovs was already installed and fixes a syntax nit with the ceph upgrade script) also adding a link to https://review.openstack.org/#/c/390792/ since you need both reviews for the 'complete' ovs upgrade workaround. marios - is that another duplicate of one of these : https://bugzilla.redhat.com/show_bug.cgi?id=1386299 - https://bugzilla.redhat.com/show_bug.cgi?id=1385096 (In reply to Omri Hochman from comment #17) > marios - is that another duplicate of one of these : > > https://bugzilla.redhat.com/show_bug.cgi?id=1386299 - ^^^ No I don't think so, though I did think they may be related at one point (we have discussed this on lifecycle scrum). in BZ 1386299 the problem appears when you reboot. Here the problem was just upgrading openvswitch from 2.4 to 2.5 (no reboot needed, IPs disappeared just by doing the yum update). > https://bugzilla.redhat.com/show_bug.cgi?id=1385096 ^^^ No though they definitely *are* related. BZ 1385096 is tracking the problem we are 'fixing' here against openvswitch. In this bug we worked around the problem with the review linked in external trackers. thanks Deployed RHOS 9 latest Upgraded to RHOS 10 with latest puddle (2016-11-14.1) I no longer see this issue. openvswitch-2.5.0-14.git20160727.el7fdp.x86_64 python-openvswitch-2.5.0-14.git20160727.el7fdp.noarch openstack-neutron-openvswitch-9.1.0-4.el7ost.noarch adding the one more fix needed here @ https://review.openstack.org/#/c/401195/ waiting on CI to merge to stable/newton then we can move this back to POST https://review.openstack.org/#/c/401195/ landed newton moving POST Verified with openstack-tripleo-heat-templates-5.1.0-6.el7ost.noarch After upgrade to osp10, I've rebooted all the OC nodes and check that all overcloud nodes are still reachable after return from reboot. (In reply to Omri Hochman from comment #25) > Verified with openstack-tripleo-heat-templates-5.1.0-6.el7ost.noarch > > > After upgrade to osp10, I've rebooted all the OC nodes and check that all > overcloud nodes are still reachable after return from reboot. just to be clear, this BZ is about the openvswitch 2.4-2.5 upgrade which causes nodes to lose IPs (the reboot one was also openvswitch but a different issue). However the fact that you successfully upgraded w/out problem (i.e. nodes don't lose IPs during the yum update on a given node) is enough to verify here. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-2948.html |