Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1382380

Summary:	Upgrade from 3.2 to 3.3 fails with could not get EgressNetworkPolicies
Product:	OpenShift Container Platform	Reporter:	Steven Walter <stwalter>
Component:	Cluster Version Operator	Assignee:	Devan Goodwin <dgoodwin>
Status:	CLOSED ERRATA	QA Contact:	Anping Li <anli>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	3.3.0	CC:	anli, aos-bugs, dgoodwin, jialiu, jokerman, mmccomas
Target Milestone:	---
Target Release:	3.3.1
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: Node service was incorrectly being restarted after upgrading master RPM packages. Consequence: In some environments a version mismatch could trigger between the node service, and the not yet restarted master service, causing upgrade to fail. Fix: Incorrect node restart was removed and logic shuffled to ensure masters are upgraded and restarted before we proceed to node upgrade/restart. Result: Upgrade will now complete successfully.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-10-27 16:13:51 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Steven Walter 2016-10-06 14:08:52 UTC

Description of problem:
Upgrade from 3.2 to 3.3 fails during node restart

Version-Release number of selected component (if applicable):

RHEL 7.2 - openshift rpms 3.3.22 installed.

Another customer:
openshift-ansible-lookup-plugins-3.3.28-1.git.0.762256b.el7.noarch atomic-openshift-sdn-ovs-3.3.0.34-1.git.0.83f306f.el7.x86_64 openshift-ansible-3.3.28-1.git.0.762256b.el7.noarch atomic-openshift-3.3.0.34-1.git.0.83f306f.el7.x86_64 openshift-ansible-docs-3.3.28-1.git.0.762256b.el7.noarch tuned-profiles-atomic-openshift-node-3.3.0.34-1.git.0.83f306f.el7.x86_64 openshift-ansible-filter-plugins-3.3.28-1.git.0.762256b.el7.noarch atomic-openshift-utils-3.3.28-1.git.0.762256b.el7.noarch atomic-openshift-node-3.3.0.34-1.git.0.83f306f.el7.x86_64 atomic-openshift-clients-3.3.0.34-1.git.0.83f306f.el7.x86_64 openshift-ansible-roles-3.3.28-1.git.0.762256b.el7.noarch atomic-openshift-master-3.3.0.34-1.git.0.83f306f.el7.x86_64 openshift-ansible-playbooks-3.3.28-1.git.0.762256b.el7.noarch

Another customer:
openshift-ansible-playbooks-3.3.22-1.git.0.6c888c2.el7.noarch.

How reproducible:
Not yet reproduced

Steps to Reproduce:
1. Install 3.2
2. Upgrade to 3.3
3. It seems this might only occur when masters are supposed to be schedulable

Actual results:


Expected results:
atomic-openshift-node[79829]: F1005 12:04:26.820427   79829 node.go:343] error: SDN node startup failed: could not get EgressNetworkPolicies: the server could not find the requested resource
atomic-openshift-node.service: main process exited, code=exited, status=255/n/a
systemd[1]: Failed to start Atomic OpenShift Node.

I'm working to get full ansible output if possible

Comment 1 Steven Walter 2016-10-06 14:14:09 UTC

Additionally, restarting the master services seems to resolve the issue. I am still working to verify that the install playbook can be re-run successfully (i.e. the upgrade actually completes)

Comment 2 Steven Walter 2016-10-10 13:48:46 UTC

Re-running the install works after restarting the services. Customer has provided ansible output showing the install complete, so the workaround is more or less confirmed.

systemctl restart atomic-openshift-api
systemctl restart atomic-openshift-controllers

Comment 4 Devan Goodwin 2016-10-12 18:10:10 UTC

I was unable to reproduce but with the logfile Steven provided I found a likely fix: https://github.com/openshift/openshift-ansible/pull/2593

Comment 6 Anping Li 2016-10-14 06:04:22 UTC

Ater upgraded, the atomic-openshift-node PID service is same as before. The service should be restarted.

Comment 9 Devan Goodwin 2016-10-14 14:20:05 UTC

Easy enough to reproduce on both masters and nodes, this was apparently the only node restart being done during upgrade, if nothing changed in /etc/sysconfig/atomic-openshift-node. (there is nothing version specific in there so often, nothing will change)

Comment 10 Devan Goodwin 2016-10-14 15:09:24 UTC

This was a good catch, thanks Anping.

https://github.com/openshift/openshift-ansible/pull/2604

Comment 13 errata-xmlrpc 2016-10-27 16:13:51 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:2122