Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1392324

Summary:	Regression: Upgrade from 3.2 to 3.3 fails with could not get EgressNetworkPolicies
Product:	OpenShift Container Platform	Reporter:	Takayoshi Kimura <tkimura>
Component:	Cluster Version Operator	Assignee:	Devan Goodwin <dgoodwin>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Anping Li <anli>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	3.3.0	CC:	aos-bugs, dgoodwin, jokerman, mmccomas
Target Milestone:	---
Target Release:	3.3.1
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: An error was identified in the upgrade playbooks regarding the ordering of service restart and reconciliation of roles/SCCs. Consequence: Master services could fail to restart in some scenarios, potentially causing EgressNetworkPolicy errors when later upgrading nodes against masters that were not yet running the new version. Fix: Ordering issues corrected. Result: Master services should now restart reliably before reconciliation, and node upgrade.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-01-04 15:40:10 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Takayoshi Kimura 2016-11-07 08:21:47 UTC

Description of problem:

The following bug still happens:

Bug 1382380 - Upgrade from 3.2 to 3.3 fails with could not get EgressNetworkPolicies

Version-Release number of selected component (if applicable):

atomic-openshift-utils-3.3.41-1.git.0.a1a327b.el7.noarch

How reproducible:

Always, got 1 customer and I can reproduce the issue in my lab as well

Steps to Reproduce:
1. Upgrade 3.2 to 3.3 using /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_3/upgrade.yml
2.
3.

Actual results:

atomic-openshift-node restarted without 3.3 master:

Nov 07 17:19:08 tkimura-oseha-01.usersys.redhat.com atomic-openshift-node[25808]: F1107 17:19:08.480417   25808 node.go:343] error: SDN node startup failed: could not get EgressNetworkPolicies: the server could not find the requested resource

Expected results:

No failure

Additional info:

The upstream fix is:

https://github.com/openshift/openshift-ansible/pull/2593/files

It's included in changelog of atomic-openshift-utils, but the node restart code still exists.

Comment 2 Devan Goodwin 2016-11-07 14:35:32 UTC

I think this might actually be https://github.com/openshift/openshift-ansible/pull/2637 which was fixed in 3.4 but did not get backported to 3.3. In this scenario your failure *was* during node upgrade, not master upgrade as the bug/pr you linked to above. However it appears that despite being upgraded, the masters never actually got restarted.

I will try to reproduce and submit a PR for backporting.

Comment 3 Devan Goodwin 2016-11-07 17:36:30 UTC

I can't quite reproduce the failure, but I can reproduce the condition where master API is not restarted before proceeding to node upgrade.

This was actually already backported and will be available in openshift-ansible-3.3.42-1 or greater. (one version beyond the one where this was reported)

Comment 4 Takayoshi Kimura 2016-11-08 00:48:12 UTC

FYI I use HA setup for the reproducer, 3 master/etcd hosts and 2 node hosts.

Comment 5 Anping Li 2016-11-08 14:52:36 UTC

No EgressNetworkPolicies error so move bug to verified.

Comment 6 Scott Dodson 2017-01-04 15:40:10 UTC

This was fixed in openshift-ansible-3.3.42-1