Bug 2057678 - Suggest to change ztp upgrade workflow to deploy TALO at the end
Summary: Suggest to change ztp upgrade workflow to deploy TALO at the end
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Telco Edge
Version: 4.9
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.11.0
Assignee: Jim Ramsay
QA Contact: yliu1
URL:
Whiteboard:
Depends On:
Blocks: 2063265
TreeView+ depends on / blocked
 
Reported: 2022-02-23 20:18 UTC by yliu1
Modified: 2022-08-26 16:43 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
This doc update is already part of https://github.com/openshift/openshift-docs/pull/43890
Clone Of:
Environment:
Last Closed: 2022-08-26 16:43:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift-kni cnf-features-deploy pull 1002 0 None Merged ztp: Update the ArgoCD upgrade procedure 2022-03-11 16:49:12 UTC

Description yliu1 2022-02-23 20:18:29 UTC
Description of problem:
In following doc, TALO was deployed before ZTP workflow (including argocd apps and PGT structures) was updated. 
https://github.com/openshift-kni/cnf-features-deploy/blob/44aa7ebc675dff2a09f5afbdc784e48fdf51624e/ztp/gitops-subscriptions/argocd/Upgrade.md

The problem with that is no cgu will be created automatically because existing policies from 4.9 deployment are all compliant without any wave numbers; and after ztp workflow is updated, all the policies will become NonCompliant. (Also note, because clusters are deployed via 4.9 ztp, they will NOT be removed from managedclusters even if gitops apps are deleted, so TALO will take action right away.)

If we deploy (or restart) TALO after ztp workflow is updated to 4.10, then cgu will be created automatically to apply 4.10 structural changes such as wave annotations, installplanapproval strategy, etc.

Version-Release number of selected component (if applicable):
4.10

How reproducible:
Always (probably)

Steps to Reproduce:
1. Deploy some 4.9 spoke clusters using 4.9 ztp workflow
2. Run following steps as per doc in https://github.com/openshift-kni/cnf-features-deploy/blob/44aa7ebc675dff2a09f5afbdc784e48fdf51624e/ztp/gitops-subscriptions/argocd/Upgrade.md
2.1 Delete gitops apps
2.2 Deploy TALO
2.3 Update PGT to 4.10 structure
2.4 Update ZTP apps to 4.10  

Actual results:
All policies became NonCompliant and no CGU is created

Expected results:
ztp-install CGU is auto created that applies 4.10 ztp changes such as wave annotations. 

Additional info:

Suggest to change ZTP update workflow to below: 
1 Update ZTP apps to 4.10 (if we do this first, then policies will be inform by default, thus we don't need to worry about automatic change in step2)
2 Update PGT to 4.10 structure
3 Deploy TALO (or restart TALO)

Or if deleting argocd apps are necessary, we can do this:
1 Delete gitops apps
2 Update PGT to 4.10 structure
3 Update ZTP apps to 4.10  
4 Deploy TALO (or restart TALO)

Comment 1 yliu1 2022-02-24 15:58:10 UTC
If all the old policies were compliant, then no CGU will be created if TALO was deployed before ztp and pgt got updated. TALO logs as below. 
And if some old policies were NonCompliant, then CGU will be created against old policies, and CGU will likely fail, because old policies were already enforce - if they didn't become compliant, TALO won't make a difference either.

[kni@provisionhost-0-0 ~]$ oc logs -n openshift-cluster-group-upgrades cluster-group-upgrades-controller-manager-75bcc7484d-7sqnd manager
I0223 18:30:18.459515       1 request.go:668] Waited for 1.022655491s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/view.open-cluster-management.io/v1beta1?timeout=32s
2022-02-23T18:30:21.816Z	INFO	controller-runtime.metrics	metrics server is starting to listen	{"addr": "127.0.0.1:8080"}
2022-02-23T18:30:21.825Z	INFO	setup	starting manager
I0223 18:30:21.826051       1 leaderelection.go:243] attempting to acquire leader lease openshift-cluster-group-upgrades/9a2365a3.openshift.io...
2022-02-23T18:30:21.826Z	INFO	controller-runtime.manager	starting metrics server	{"path": "/metrics"}
I0223 18:30:21.853021       1 leaderelection.go:253] successfully acquired lease openshift-cluster-group-upgrades/9a2365a3.openshift.io
2022-02-23T18:30:21.853Z	INFO	controller-runtime.manager.controller.managedclusterForCGU	Starting EventSource	{"reconciler group": "cluster.open-cluster-management.io", "reconciler kind": "ManagedCluster", "source": "kind source: /, Kind="}
2022-02-23T18:30:21.853Z	INFO	controller-runtime.manager.controller.managedclusterForCGU	Starting EventSource	{"reconciler group": "cluster.open-cluster-management.io", "reconciler kind": "ManagedCluster", "source": "kind source: /, Kind="}
2022-02-23T18:30:21.853Z	INFO	controller-runtime.manager.controller.managedclusterForCGU	Starting Controller	{"reconciler group": "cluster.open-cluster-management.io", "reconciler kind": "ManagedCluster"}
2022-02-23T18:30:21.853Z	DEBUG	controller-runtime.manager.events	Normal	{"object": {"kind":"ConfigMap","namespace":"openshift-cluster-group-upgrades","name":"9a2365a3.openshift.io","uid":"346066c2-ffe2-4389-8030-c212c94fc09d","apiVersion":"v1","resourceVersion":"2490961"}, "reason": "LeaderElection", "message": "cluster-group-upgrades-controller-manager-75bcc7484d-7sqnd_bd2d64c6-c4d8-492e-a337-f6c3e921d657 became leader"}
2022-02-23T18:30:21.854Z	DEBUG	controller-runtime.manager.events	Normal	{"object": {"kind":"Lease","namespace":"openshift-cluster-group-upgrades","name":"9a2365a3.openshift.io","uid":"2a9546b1-6e25-4f2e-be37-cbb6f2f70543","apiVersion":"coordination.k8s.io/v1","resourceVersion":"2490967"}, "reason": "LeaderElection", "message": "cluster-group-upgrades-controller-manager-75bcc7484d-7sqnd_bd2d64c6-c4d8-492e-a337-f6c3e921d657 became leader"}
2022-02-23T18:30:21.854Z	INFO	controller-runtime.manager.controller.clustergroupupgrade	Starting EventSource	{"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "source": "kind source: /, Kind="}
2022-02-23T18:30:21.854Z	INFO	controller-runtime.manager.controller.clustergroupupgrade	Starting EventSource	{"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "source": "kind source: apps.open-cluster-management.io/v1, Kind=PlacementRule"}
2022-02-23T18:30:21.854Z	INFO	controller-runtime.manager.controller.clustergroupupgrade	Starting EventSource	{"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "source": "kind source: policy.open-cluster-management.io/v1, Kind=PlacementBinding"}
2022-02-23T18:30:21.854Z	INFO	controller-runtime.manager.controller.clustergroupupgrade	Starting EventSource	{"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "source": "kind source: policy.open-cluster-management.io/v1, Kind=Policy"}
2022-02-23T18:30:21.854Z	INFO	controller-runtime.manager.controller.clustergroupupgrade	Starting Controller	{"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade"}
2022-02-23T18:30:22.055Z	INFO	controller-runtime.manager.controller.managedclusterForCGU	Starting workers	{"reconciler group": "cluster.open-cluster-management.io", "reconciler kind": "ManagedCluster", "worker count": 1}
2022-02-23T18:30:22.055Z	INFO	controllers.ManagedClusterForCGU	Reconciling managedCluster to create clusterGroupUpgrade	{"Request.Name": "helix21-2"}
2022-02-23T18:30:22.056Z	INFO	controllers.ManagedClusterForCGU	cluster is ready	{"Name": "helix21-2"}
2022-02-23T18:30:22.056Z	INFO	controller-runtime.manager.controller.clustergroupupgrade	Starting workers	{"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "worker count": 1}
2022-02-23T18:30:22.157Z	INFO	controllers.ManagedClusterForCGU	No policies need to be managed by ClusterGroupUpgrade operator
2022-02-23T18:30:22.157Z	INFO	controllers.ManagedClusterForCGU	Reconciling managedCluster to create clusterGroupUpgrade	{"Request.Name": "helix21-3"}
2022-02-23T18:30:22.158Z	INFO	controllers.ManagedClusterForCGU	cluster is ready	{"Name": "helix21-3"}
2022-02-23T18:30:22.158Z	INFO	controllers.ManagedClusterForCGU	No policies need to be managed by ClusterGroupUpgrade operator
2022-02-23T18:30:22.158Z	INFO	controllers.ManagedClusterForCGU	Reconciling managedCluster to create clusterGroupUpgrade	{"Request.Name": "local-cluster"}
2022-02-23T18:30:22.158Z	INFO	controllers.ManagedClusterForCGU	cluster is ready	{"Name": "local-cluster"}
2022-02-23T18:30:22.158Z	INFO	controllers.ManagedClusterForCGU	WARN: No child policies found for cluster	{"Name": "local-cluster"}
2022-02-23T18:30:22.158Z	INFO	controllers.ManagedClusterForCGU	Reconciling managedCluster to create clusterGroupUpgrade	{"Request.Name": "helix21-0"}
2022-02-23T18:30:22.158Z	INFO	controllers.ManagedClusterForCGU	cluster is ready	{"Name": "helix21-0"}
2022-02-23T18:30:22.158Z	INFO	controllers.ManagedClusterForCGU	No policies need to be managed by ClusterGroupUpgrade operator
2022-02-23T18:30:22.158Z	INFO	controllers.ManagedClusterForCGU	Reconciling managedCluster to create clusterGroupUpgrade	{"Request.Name": "helix21-1"}
2022-02-23T18:30:22.158Z	INFO	controllers.ManagedClusterForCGU	cluster is ready	{"Name": "helix21-1"}
2022-02-23T18:30:22.158Z	INFO	controllers.ManagedClusterForCGU	No policies need to be managed by ClusterGroupUpgrade operator

Comment 2 Ian Miller 2022-03-11 16:49:13 UTC
Upgrade procedure discussed and changes have been made in upstream documentation.

Comment 4 yliu1 2022-03-11 18:55:14 UTC
Since this is doc change, verification is done using 4.10 ztp.


Note You need to log in before you can comment on or make changes to this bug.