Bug 1571346 - Openvswitch conflict with third party SDN in upgrade process
Summary: Openvswitch conflict with third party SDN in upgrade process
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 3.7.z
Assignee: Vadim Rutkovsky
QA Contact: Gan Huang
Depends On:
TreeView+ depends on / blocked
Reported: 2018-04-24 14:59 UTC by Alejandro Coma
Modified: 2018-06-27 07:59 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2018-06-27 07:59:12 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:2009 0 None None None 2018-06-27 07:59:48 UTC

Description Alejandro Coma 2018-04-24 14:59:15 UTC
Description of problem:
We've hit an issue when upgrading an OCP cluster with a third party SDN (NSX-T)  to 3.7.42  from 3.7.23.
During the upgrade process, the ansible installer has performed a "systemctl stop openvswitch", in the task "Stop node and openvswitch services" (https://github.com/openshift/openshift-ansible/blob/release-3.7/roles/openshift_node_upgrade/tasks/main.yml#L15-L22) which enters in conflict with the SDN implementation, and causes losing all network connectivity in the affected nodes.

In addition to the above, we've done a review of the ansible code and we've find out that the openvswitch package would be updated regardless of using a third party SDN with a different openvswitch version as in our case. (https://github.com/openshift/openshift-ansible/blob/release-3.7/roles/openshift_node_upgrade/tasks/main.yml#L113-L119)

Version-Release number of the following components:
rpm -q openshift-ansible
rpm -q ansible
ansible --version

How reproducible:
In a 3.7.23 OCP cluster with NSX-T plugin, try to upgrade to 3.7.42

Steps to Reproduce:

Actual results:
See description.

Expected results:
Existing OVS is not modified in any way, as it depends on the 3rd party SDN.

Additional info:

Comment 6 Vadim Rutkovsky 2018-05-02 13:15:00 UTC
Created https://github.com/openshift/openshift-ansible/pull/8228 to fix this - RPMs would be installed/upgrade and services would be stopped/started only when openshift_use_sdn is set

Comment 7 Vadim Rutkovsky 2018-05-07 09:23:31 UTC
Fix is available in openshift-ansible-3.7.46-1

Comment 8 Gan Huang 2018-05-08 09:06:05 UTC
Test steps:

1. Install v3.7.23 OCP by enabling flannel (QE don't have NXT test environment at hand, so we use the third party network plugin flannel instead)


2. Try to reproduce by using openshift-ansible-3.7.43-1, we can see that openvswitch service was being restarted and upgraded.

TASK [openshift_node_upgrade : Upgrade openvswitch] ****************************

TASK [openshift_node_upgrade : Stop node and openvswitch services]

3. Using openshift-ansible-3.7.46-1 to see if it's being fixed.

TASK [openshift_node_upgrade : Stop openvswitch service] ***********************

TASK [openshift_node_upgrade : Upgrade openvswitch] ****************************

TASK [openshift_node_upgrade : Start openvswitch service] **********************

TASK [openshift_node_upgrade : Ensure openvswitch service is stopped] **********

All the tasks above were skipped, moving to verified.

Fixed in openshift-ansible-3.7.46-1.

Comment 10 errata-xmlrpc 2018-06-27 07:59:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.