Description of problem: We've hit an issue when upgrading an OCP cluster with a third party SDN (NSX-T) to 3.7.42 from 3.7.23. During the upgrade process, the ansible installer has performed a "systemctl stop openvswitch", in the task "Stop node and openvswitch services" (https://github.com/openshift/openshift-ansible/blob/release-3.7/roles/openshift_node_upgrade/tasks/main.yml#L15-L22) which enters in conflict with the SDN implementation, and causes losing all network connectivity in the affected nodes. In addition to the above, we've done a review of the ansible code and we've find out that the openvswitch package would be updated regardless of using a third party SDN with a different openvswitch version as in our case. (https://github.com/openshift/openshift-ansible/blob/release-3.7/roles/openshift_node_upgrade/tasks/main.yml#L113-L119) Version-Release number of the following components: rpm -q openshift-ansible rpm -q ansible ansible --version How reproducible: In a 3.7.23 OCP cluster with NSX-T plugin, try to upgrade to 3.7.42 Steps to Reproduce: 1. 2. 3. Actual results: See description. Expected results: Existing OVS is not modified in any way, as it depends on the 3rd party SDN. Additional info:
Created https://github.com/openshift/openshift-ansible/pull/8228 to fix this - RPMs would be installed/upgrade and services would be stopped/started only when openshift_use_sdn is set
Fix is available in openshift-ansible-3.7.46-1
Test steps: 1. Install v3.7.23 OCP by enabling flannel (QE don't have NXT test environment at hand, so we use the third party network plugin flannel instead) openshift_use_openshift_sdn=false openshift_use_flannel=true 2. Try to reproduce by using openshift-ansible-3.7.43-1, we can see that openvswitch service was being restarted and upgraded. TASK [openshift_node_upgrade : Upgrade openvswitch] **************************** changed: TASK [openshift_node_upgrade : Stop node and openvswitch services] changed: 3. Using openshift-ansible-3.7.46-1 to see if it's being fixed. TASK [openshift_node_upgrade : Stop openvswitch service] *********************** skipping TASK [openshift_node_upgrade : Upgrade openvswitch] **************************** skipping TASK [openshift_node_upgrade : Start openvswitch service] ********************** skipping TASK [openshift_node_upgrade : Ensure openvswitch service is stopped] ********** skipping All the tasks above were skipped, moving to verified. Fixed in openshift-ansible-3.7.46-1.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2009