Bug 1570777

Summary: Error while upgrading form 3.7.9 observed in the task - [Upgrade all storage] (docs)
Product: OpenShift Container Platform Reporter: Sanket N <snalawad>
Component: NetworkingAssignee: Ben Bennett <bbennett>
Status: CLOSED ERRATA QA Contact: Meng Bo <bmeng>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.7.1CC: aos-bugs, bbennett, danw, jokerman, mkhan, mmccomas, wmeng
Target Milestone: ---   
Target Release: 3.7.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Network policy objects can't be upgraded between 3.7 z versions before 3.7.23. Consequence: You get an ugly error message. Fix: The documentation has been changed to explain how to work around the problem, and the error has been fixed in 3.7.23. Result: You can upgrade.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-27 07:59:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sanket N 2018-04-23 11:54:39 UTC
Description of problem:

The task [Upgrade all storage] executes oc adm --config=/etc/origin/master/admin.kubeconfig migrate storage --include=*  --confirm) , which fails to migrate some of the network policy objects in the OCP v3.7.9 environment. 


Version-Release number of the following components:

rpm -q openshift-ansible : openshift-ansible-3.7.42-1.git.2.9ee4e71.el7.noarch
rpm -q ansible           : ansible-2.4.2.0-2.el7.noarch

ansible --version        
      : ansible 2.4.2.0
            config file = /etc/ansible/ansible.cfg
            configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
            ansible python module location = /usr/lib/python2.7/site-packages/ansible
            executable location = /usr/bin/ansible
            python version = 2.7.5 (default, Aug  2 2016, 04:20:16) [GCC 4.8.5 20150623 (Red Hat 4.8.5-4)]




Steps to Reproduce:
1. Cluster v3.7.9 with network plugin as openshift-ovs-networkpolicy
2. Create network policy object

cat np.yaml 

apiVersion: extensions/v1beta1
kind: NetworkPolicy
metadata:
  generation: 1
  name: allow-all
spec:
  ingress:
  - {}
  podSelector: {}
<EOF>

3. Run the command 
   oc adm --config=/etc/origin/master/admin.kubeconfig migrate storage --include=*  --confirm


Actual results:
[root@vm251-93 ~]# oc adm --config=/etc/origin/master/admin.kubeconfig migrate storage --include=*  --confirm
E0423 16:04:43.881409 error:     -n policy networkpolicies/allow-all: NetworkPolicy.networking.k8s.io "allow-all" is invalid: spec: Forbidden: updates to networkpolicy spec are forbidden.
summary: total=1061 errors=1 ignored=0 unchanged=1010 migrated=49
info: to rerun only failing resources, add --include=networkpolicies
error: 1 resources failed to migrate


Expected results:

This is the result from OCP v3.7.23 with the same networkpolicy object.

[root@vm253-114 ~]# oc adm --config=/etc/origin/master/admin.kubeconfig migrate storage --include=* --confirm
summary: total=1597 errors=0 ignored=0 unchanged=1581 migrated=16



Additional info:

It seems v3.7.9 is giving the error to the NP objects but the same is not observed in the later version as tested on  v3.7.23 and v3.9.14

Comment 3 Dan Winship 2018-04-26 18:50:48 UTC
> [root@vm251-93 ~]# oc adm --config=/etc/origin/master/admin.kubeconfig migrate storage --include=*  --confirm
> E0423 16:04:43.881409 error:     -n policy networkpolicies/allow-all: NetworkPolicy.networking.k8s.io "allow-all" is invalid: spec: Forbidden: updates to networkpolicy spec are forbidden.

So I'm guessing it's trying to copy an object from extensions/v1beta1 NetworkPolicy to networking/v1 NetworkPolicy, except that because they're both treated as the same thing internally, it effectively ends up trying to modify the existing object, which isn't allowed.

The fix might be to just remove NetworkPolicy from the list of things that need to be migrated?

Comment 4 Ben Bennett 2018-04-26 20:00:13 UTC
Mo: Do you have any idea what we need to do here?

Comment 5 Mo 2018-04-26 20:05:05 UTC
The fix is to correctly handle the no-op update in validation.  The migrate command is correct as-is.

Comment 6 Dan Winship 2018-04-27 08:06:15 UTC
> It seems v3.7.9 is giving the error to the NP objects but the same is not
> observed in the later version as tested on  v3.7.23 and v3.9.14

Uh, wait, if it's fixed in 3.7.23 then what else can we do? We can't retroactively change 3.7.9

Comment 7 Mo 2018-04-27 13:22:52 UTC
If you are on 3.X.Y1 and it got fixed in a later 3.X.Y2 release, then the "solution" is to upgrade your binaries to that 3.X.Y2 release before upgrading to 3.Z.  If that is not possible, then you simply run the migrate command before upgrading and let it fail on that resource (it will still process everything else).  Then skip the pre-upgrade migration task during the upgrade.

Comment 8 Ben Bennett 2018-04-30 13:44:16 UTC
Mo: Thanks for clarifying... is there anything we can do besides document this in the support knowledgebase?

Comment 9 Mo 2018-04-30 15:04:49 UTC
(In reply to Ben Bennett from comment #8)
> Mo: Thanks for clarifying... is there anything we can do besides document
> this in the support knowledgebase?

I suggest adding something like [1] for 3.7 with the specifics of this issue.

[1] https://docs.openshift.org/3.6/install_config/upgrading/upgrading_known_issues.html

Comment 10 Ben Bennett 2018-04-30 19:47:08 UTC
Documented in https://github.com/openshift/openshift-docs/pull/8986

Comment 11 Ben Bennett 2018-05-09 19:52:09 UTC
Docs opened https://github.com/openshift/openshift-docs/pull/9054

Comment 12 Ben Bennett 2018-05-15 15:55:29 UTC
This was changed and went in as https://github.com/openshift/openshift-docs/pull/9054

Comment 13 Meng Bo 2018-05-18 09:03:07 UTC
The document is added. Verify the bug

Comment 15 errata-xmlrpc 2018-06-27 07:59:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2009