Bug 2063265 - [backport 4.10] Suggest to change ztp upgrade workflow to deploy TALO at the end
Summary: [backport 4.10] Suggest to change ztp upgrade workflow to deploy TALO at the end
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Telco Edge
Version: 4.9
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.10.z
Assignee: Jim Ramsay
QA Contact: yliu1
URL:
Whiteboard:
Depends On: 2057678
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-03-11 16:51 UTC by OpenShift BugZilla Robot
Modified: 2022-04-18 23:26 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-21 12:40:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift-kni cnf-features-deploy pull 1012 0 None open [release-4.10] Bug 2063265: ztp: Update the ArgoCD upgrade procedure 2022-03-12 01:01:13 UTC
Red Hat Product Errata RHBA-2022:0928 0 None None None 2022-03-21 12:40:41 UTC

Description OpenShift BugZilla Robot 2022-03-11 16:51:32 UTC
+++ This bug was initially created as a clone of Bug #2057678 +++

Description of problem:
In following doc, TALO was deployed before ZTP workflow (including argocd apps and PGT structures) was updated. 
https://github.com/openshift-kni/cnf-features-deploy/blob/44aa7ebc675dff2a09f5afbdc784e48fdf51624e/ztp/gitops-subscriptions/argocd/Upgrade.md

The problem with that is no cgu will be created automatically because existing policies from 4.9 deployment are all compliant without any wave numbers; and after ztp workflow is updated, all the policies will become NonCompliant. (Also note, because clusters are deployed via 4.9 ztp, they will NOT be removed from managedclusters even if gitops apps are deleted, so TALO will take action right away.)

If we deploy (or restart) TALO after ztp workflow is updated to 4.10, then cgu will be created automatically to apply 4.10 structural changes such as wave annotations, installplanapproval strategy, etc.

Version-Release number of selected component (if applicable):
4.10

How reproducible:
Always (probably)

Steps to Reproduce:
1. Deploy some 4.9 spoke clusters using 4.9 ztp workflow
2. Run following steps as per doc in https://github.com/openshift-kni/cnf-features-deploy/blob/44aa7ebc675dff2a09f5afbdc784e48fdf51624e/ztp/gitops-subscriptions/argocd/Upgrade.md
2.1 Delete gitops apps
2.2 Deploy TALO
2.3 Update PGT to 4.10 structure
2.4 Update ZTP apps to 4.10  

Actual results:
All policies became NonCompliant and no CGU is created

Expected results:
ztp-install CGU is auto created that applies 4.10 ztp changes such as wave annotations. 

Additional info:

Suggest to change ZTP update workflow to below: 
1 Update ZTP apps to 4.10 (if we do this first, then policies will be inform by default, thus we don't need to worry about automatic change in step2)
2 Update PGT to 4.10 structure
3 Deploy TALO (or restart TALO)

Or if deleting argocd apps are necessary, we can do this:
1 Delete gitops apps
2 Update PGT to 4.10 structure
3 Update ZTP apps to 4.10  
4 Deploy TALO (or restart TALO)

--- Additional comment from yliu1 on 2022-02-24 15:58:10 UTC ---

If all the old policies were compliant, then no CGU will be created if TALO was deployed before ztp and pgt got updated. TALO logs as below. 
And if some old policies were NonCompliant, then CGU will be created against old policies, and CGU will likely fail, because old policies were already enforce - if they didn't become compliant, TALO won't make a difference either.

[kni@provisionhost-0-0 ~]$ oc logs -n openshift-cluster-group-upgrades cluster-group-upgrades-controller-manager-75bcc7484d-7sqnd manager
I0223 18:30:18.459515       1 request.go:668] Waited for 1.022655491s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/view.open-cluster-management.io/v1beta1?timeout=32s
2022-02-23T18:30:21.816Z	INFO	controller-runtime.metrics	metrics server is starting to listen	{"addr": "127.0.0.1:8080"}
2022-02-23T18:30:21.825Z	INFO	setup	starting manager
I0223 18:30:21.826051       1 leaderelection.go:243] attempting to acquire leader lease openshift-cluster-group-upgrades/9a2365a3.openshift.io...
2022-02-23T18:30:21.826Z	INFO	controller-runtime.manager	starting metrics server	{"path": "/metrics"}
I0223 18:30:21.853021       1 leaderelection.go:253] successfully acquired lease openshift-cluster-group-upgrades/9a2365a3.openshift.io
2022-02-23T18:30:21.853Z	INFO	controller-runtime.manager.controller.managedclusterForCGU	Starting EventSource	{"reconciler group": "cluster.open-cluster-management.io", "reconciler kind": "ManagedCluster", "source": "kind source: /, Kind="}
2022-02-23T18:30:21.853Z	INFO	controller-runtime.manager.controller.managedclusterForCGU	Starting EventSource	{"reconciler group": "cluster.open-cluster-management.io", "reconciler kind": "ManagedCluster", "source": "kind source: /, Kind="}
2022-02-23T18:30:21.853Z	INFO	controller-runtime.manager.controller.managedclusterForCGU	Starting Controller	{"reconciler group": "cluster.open-cluster-management.io", "reconciler kind": "ManagedCluster"}
2022-02-23T18:30:21.853Z	DEBUG	controller-runtime.manager.events	Normal	{"object": {"kind":"ConfigMap","namespace":"openshift-cluster-group-upgrades","name":"9a2365a3.openshift.io","uid":"346066c2-ffe2-4389-8030-c212c94fc09d","apiVersion":"v1","resourceVersion":"2490961"}, "reason": "LeaderElection", "message": "cluster-group-upgrades-controller-manager-75bcc7484d-7sqnd_bd2d64c6-c4d8-492e-a337-f6c3e921d657 became leader"}
2022-02-23T18:30:21.854Z	DEBUG	controller-runtime.manager.events	Normal	{"object": {"kind":"Lease","namespace":"openshift-cluster-group-upgrades","name":"9a2365a3.openshift.io","uid":"2a9546b1-6e25-4f2e-be37-cbb6f2f70543","apiVersion":"coordination.k8s.io/v1","resourceVersion":"2490967"}, "reason": "LeaderElection", "message": "cluster-group-upgrades-controller-manager-75bcc7484d-7sqnd_bd2d64c6-c4d8-492e-a337-f6c3e921d657 became leader"}
2022-02-23T18:30:21.854Z	INFO	controller-runtime.manager.controller.clustergroupupgrade	Starting EventSource	{"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "source": "kind source: /, Kind="}
2022-02-23T18:30:21.854Z	INFO	controller-runtime.manager.controller.clustergroupupgrade	Starting EventSource	{"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "source": "kind source: apps.open-cluster-management.io/v1, Kind=PlacementRule"}
2022-02-23T18:30:21.854Z	INFO	controller-runtime.manager.controller.clustergroupupgrade	Starting EventSource	{"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "source": "kind source: policy.open-cluster-management.io/v1, Kind=PlacementBinding"}
2022-02-23T18:30:21.854Z	INFO	controller-runtime.manager.controller.clustergroupupgrade	Starting EventSource	{"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "source": "kind source: policy.open-cluster-management.io/v1, Kind=Policy"}
2022-02-23T18:30:21.854Z	INFO	controller-runtime.manager.controller.clustergroupupgrade	Starting Controller	{"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade"}
2022-02-23T18:30:22.055Z	INFO	controller-runtime.manager.controller.managedclusterForCGU	Starting workers	{"reconciler group": "cluster.open-cluster-management.io", "reconciler kind": "ManagedCluster", "worker count": 1}
2022-02-23T18:30:22.055Z	INFO	controllers.ManagedClusterForCGU	Reconciling managedCluster to create clusterGroupUpgrade	{"Request.Name": "helix21-2"}
2022-02-23T18:30:22.056Z	INFO	controllers.ManagedClusterForCGU	cluster is ready	{"Name": "helix21-2"}
2022-02-23T18:30:22.056Z	INFO	controller-runtime.manager.controller.clustergroupupgrade	Starting workers	{"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "worker count": 1}
2022-02-23T18:30:22.157Z	INFO	controllers.ManagedClusterForCGU	No policies need to be managed by ClusterGroupUpgrade operator
2022-02-23T18:30:22.157Z	INFO	controllers.ManagedClusterForCGU	Reconciling managedCluster to create clusterGroupUpgrade	{"Request.Name": "helix21-3"}
2022-02-23T18:30:22.158Z	INFO	controllers.ManagedClusterForCGU	cluster is ready	{"Name": "helix21-3"}
2022-02-23T18:30:22.158Z	INFO	controllers.ManagedClusterForCGU	No policies need to be managed by ClusterGroupUpgrade operator
2022-02-23T18:30:22.158Z	INFO	controllers.ManagedClusterForCGU	Reconciling managedCluster to create clusterGroupUpgrade	{"Request.Name": "local-cluster"}
2022-02-23T18:30:22.158Z	INFO	controllers.ManagedClusterForCGU	cluster is ready	{"Name": "local-cluster"}
2022-02-23T18:30:22.158Z	INFO	controllers.ManagedClusterForCGU	WARN: No child policies found for cluster	{"Name": "local-cluster"}
2022-02-23T18:30:22.158Z	INFO	controllers.ManagedClusterForCGU	Reconciling managedCluster to create clusterGroupUpgrade	{"Request.Name": "helix21-0"}
2022-02-23T18:30:22.158Z	INFO	controllers.ManagedClusterForCGU	cluster is ready	{"Name": "helix21-0"}
2022-02-23T18:30:22.158Z	INFO	controllers.ManagedClusterForCGU	No policies need to be managed by ClusterGroupUpgrade operator
2022-02-23T18:30:22.158Z	INFO	controllers.ManagedClusterForCGU	Reconciling managedCluster to create clusterGroupUpgrade	{"Request.Name": "helix21-1"}
2022-02-23T18:30:22.158Z	INFO	controllers.ManagedClusterForCGU	cluster is ready	{"Name": "helix21-1"}
2022-02-23T18:30:22.158Z	INFO	controllers.ManagedClusterForCGU	No policies need to be managed by ClusterGroupUpgrade operator

--- Additional comment from imiller on 2022-03-11 16:49:13 UTC ---

Upgrade procedure discussed and changes have been made in upstream documentation.

Comment 2 yliu1 2022-03-14 16:31:56 UTC
Verified with 4.10 ZTP.

Comment 5 errata-xmlrpc 2022-03-21 12:40:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.10.5 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0928


Note You need to log in before you can comment on or make changes to this bug.