Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1382380 - Upgrade from 3.2 to 3.3 fails with could not get EgressNetworkPolicies
Upgrade from 3.2 to 3.3 fails with could not get EgressNetworkPolicies
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Upgrade (Show other bugs)
3.3.0
Unspecified Unspecified
medium Severity medium
: ---
: 3.3.1
Assigned To: Devan Goodwin
Anping Li
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-10-06 10:08 EDT by Steven Walter
Modified: 2016-10-27 12:13 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Node service was incorrectly being restarted after upgrading master RPM packages. Consequence: In some environments a version mismatch could trigger between the node service, and the not yet restarted master service, causing upgrade to fail. Fix: Incorrect node restart was removed and logic shuffled to ensure masters are upgraded and restarted before we proceed to node upgrade/restart. Result: Upgrade will now complete successfully.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-10-27 12:13:51 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:2122 normal SHIPPED_LIVE OpenShift Container Platform atomic-openshift-utils bug fix update 2016-10-27 16:11:30 EDT

  None (edit)
Description Steven Walter 2016-10-06 10:08:52 EDT
Description of problem:
Upgrade from 3.2 to 3.3 fails during node restart

Version-Release number of selected component (if applicable):

RHEL 7.2 - openshift rpms 3.3.22 installed.

Another customer:
openshift-ansible-lookup-plugins-3.3.28-1.git.0.762256b.el7.noarch atomic-openshift-sdn-ovs-3.3.0.34-1.git.0.83f306f.el7.x86_64 openshift-ansible-3.3.28-1.git.0.762256b.el7.noarch atomic-openshift-3.3.0.34-1.git.0.83f306f.el7.x86_64 openshift-ansible-docs-3.3.28-1.git.0.762256b.el7.noarch tuned-profiles-atomic-openshift-node-3.3.0.34-1.git.0.83f306f.el7.x86_64 openshift-ansible-filter-plugins-3.3.28-1.git.0.762256b.el7.noarch atomic-openshift-utils-3.3.28-1.git.0.762256b.el7.noarch atomic-openshift-node-3.3.0.34-1.git.0.83f306f.el7.x86_64 atomic-openshift-clients-3.3.0.34-1.git.0.83f306f.el7.x86_64 openshift-ansible-roles-3.3.28-1.git.0.762256b.el7.noarch atomic-openshift-master-3.3.0.34-1.git.0.83f306f.el7.x86_64 openshift-ansible-playbooks-3.3.28-1.git.0.762256b.el7.noarch

Another customer:
openshift-ansible-playbooks-3.3.22-1.git.0.6c888c2.el7.noarch.

How reproducible:
Not yet reproduced

Steps to Reproduce:
1. Install 3.2
2. Upgrade to 3.3
3. It seems this might only occur when masters are supposed to be schedulable

Actual results:


Expected results:
atomic-openshift-node[79829]: F1005 12:04:26.820427   79829 node.go:343] error: SDN node startup failed: could not get EgressNetworkPolicies: the server could not find the requested resource
atomic-openshift-node.service: main process exited, code=exited, status=255/n/a
systemd[1]: Failed to start Atomic OpenShift Node.

I'm working to get full ansible output if possible
Comment 1 Steven Walter 2016-10-06 10:14:09 EDT
Additionally, restarting the master services seems to resolve the issue. I am still working to verify that the install playbook can be re-run successfully (i.e. the upgrade actually completes)
Comment 2 Steven Walter 2016-10-10 09:48:46 EDT
Re-running the install works after restarting the services. Customer has provided ansible output showing the install complete, so the workaround is more or less confirmed.

systemctl restart atomic-openshift-api
systemctl restart atomic-openshift-controllers
Comment 4 Devan Goodwin 2016-10-12 14:10:10 EDT
I was unable to reproduce but with the logfile Steven provided I found a likely fix: https://github.com/openshift/openshift-ansible/pull/2593
Comment 6 Anping Li 2016-10-14 02:04:22 EDT
Ater upgraded, the atomic-openshift-node PID service is same as before. The service should be restarted.
Comment 9 Devan Goodwin 2016-10-14 10:20:05 EDT
Easy enough to reproduce on both masters and nodes, this was apparently the only node restart being done during upgrade, if nothing changed in /etc/sysconfig/atomic-openshift-node. (there is nothing version specific in there so often, nothing will change)
Comment 10 Devan Goodwin 2016-10-14 11:09:24 EDT
This was a good catch, thanks Anping.

https://github.com/openshift/openshift-ansible/pull/2604
Comment 13 errata-xmlrpc 2016-10-27 12:13:51 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:2122

Note You need to log in before you can comment on or make changes to this bug.