Bug 1385096

Summary: [fdBeta] yum update of openvswitch 2.5 from 2.4 causes restart of openvswitch service that can disrupt network connectivity
Product: Red Hat Enterprise Linux 7 Reporter: Perry Myers <pmyers>
Component: openvswitchAssignee: Flavio Leitner <fleitner>
Status: CLOSED CURRENTRELEASE QA Contact: Rick Alongi <ralongi>
Severity: high Docs Contact:
Priority: high    
Version: 7.3CC: amuller, atragler, fleitner, kzhang, nyechiel, rkhan
Target Milestone: rc   
Target Release: 7.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openvswitch-2.5.0-16.git20160727.el7fdb Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1397045 (view as bug list) Environment:
Last Closed: 2017-01-12 17:31:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1397045    

Description Perry Myers 2016-10-14 18:14:02 UTC
Description of problem:
This bug is related to bug 1371840 as that bug is an example of where the yum update of openvswitch from 2.4 to 2.5 can disrupt network connectivity.

Right now the openvswitch spec file contains a %postun directive that restarts the openvswitch daemon via systemctl. On yum update, this would cause the openvswitch 2.4 process to be unloaded and the 2.5 to be loaded.

If your system is running RHEL 7.2.z, this means a "yum update" would take you to RHEL 7.3 userspace but you would still be running RHEL 7.2.z kernel until you reboot. In many operational environments, the yum update and reboot steps are decoupled to be scheduled as part of planned downtime (think hypervisor running many VMs where those VMs all need to be live migrated before you can reboot).

This means for an extended period of time, you'd be running ovs 2.5 on RHEL 7.3 userspace with RHEL 7.2.z kernel.

In addition, we've already seen one bug (bug 1371840) that results in a loss of network connectivity on yum update of openvswitch and restart of the service. This is due to HOTPLUG being disabled intentionally on RHOSP nodes (issues with bonds, I believe from https://bugzilla.redhat.com/show_bug.cgi?id=1371840#c18)

The workaround discussed in that bug would be to first shut down OVS 2.4, then update OVS 2.5, then start OVS 2.5. However, this workaround also has a loss of network connectivity (albeit briefly) during the period where the daemon is shut down.

Perhaps a less invasive way to do the ovs 2.5 yum update would be to _not_ restart the service in %postun. This would be conceptually similar to updating qemu-kvm package which (of course) does not restart every qemu-kvm process that is spawned by libvirt.

One downside to this would be that updates of OVS 2.5 to OVS 2.5.z packages would not be restarted by default, meaning operators have to manually always update.

Is there a way perhaps to say in spec %postun "If previous version was 2.4, do not restart service. But if previous version is 2.5+ then restart service"

From a user PoV this might be bad, because the behavior is different depending on what was installed before. 

If bug 1371840 can be fixed so that there is no network disruption during RHOSP upgrades, then the %postun to restart can be left in, so long as the combination of OVS 2.5 + RHEL 7.2.z kernel + RHEL 7.3 userspace is supportable and will technically work.

Note: I considered just adding this info to bug 1371840 since it's very related to that discussion, but fbl said we might want to track this discussion separately for now. Later depending on the outcome of the discussion, this bug might end up being CLOSED->DUPLICATE perhaps.

Version-Release number of selected component (if applicable):
openvswitch-2.5.0-5.git20160628.el7fdb.x86_64

How reproducible:
Every time on RHOSP nodes (OSP9 to 10 upgrade process)

Steps to Reproduce:
1. Install OSP9 cluster
2. Enable FD Beta Channel and yum update nodes to OVS 2.5 and RHEL 7.3

Actual results:
Loss of network connectivity is seen due to bug 1371840 but furthermore, the user is now running OVS 2.5 and RHEL 7.2.z kernel, which may not be a desired combination.

Expected results:
No loss of network connectivity, and running a supported combination of OVS and kernel.

Additional info:
https://bugzilla.redhat.com/show_bug.cgi?id=1371840

Comment 1 Flavio Leitner 2016-10-18 13:17:56 UTC
Unfortunately there is no way to guarantee that the network is functional while restarting OVS.  For instance, internal ports are TAP devices which are connected to the user-space daemon being restarted.  Also that when using DPDK ports, all ports are in user-space and restarting the service means that all DPDK ports will be removed at some point.

Having said that, the package will be fixed to not restart the service during the upgrade.  The side effect is that security issues won't be applied immediately until a manual service restart or system reboot.

Comment 4 Flavio Leitner 2016-10-18 15:28:31 UTC
There is a catch here that even if we stop restarting the service in the RPM package, upgrading from existing packages will continue to be an issue because the rpm section doing that is %postun.

This is the RPM order during the upgrade:
 %pretrans of new package
 %pre of new package
 (package install)
 %post of new package
 %triggerin of other packages (set off by installing new package)
 %triggerin of new package (if any are true)
 %triggerun of old package (if it's set off by uninstalling the old package)
 %triggerun of other packages (set off by uninstalling old package)
 %preun of old package (removal of old package)
>%postun of old package
 %triggerpostun of old package (if it's set off by uninstalling the old package)
 %triggerpostun of other packages (if they're setu off by uninstalling the old package)
 %posttrans of new package


and I am not finding a way to the new package to prevent the %postun of the old package to be executed.

If you're using rpm directly, then using -U --nopostun would work around the issue.

Comment 6 Flavio Leitner 2016-11-21 13:20:49 UTC
This is committed on FDBeta and there is no relation with RHEL-7 flags.
I'm cloning this one for FDProd to be included in the next batch.

Comment 10 Flavio Leitner 2017-01-12 17:31:45 UTC
The bugfix for this is release already, but due to bz#1403958, there will be at least one more restart during the upgrade.

So, I am closing this one as the fix is already there and leave to the other bug to fix any reminder issues.