Bug 2114903
| Summary: | TALM pod crashed after first batch timed out | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | yliu1 |
| Component: | Telco Edge | Assignee: | jun |
| Telco Edge sub component: | TALO | QA Contact: | yliu1 |
| Status: | CLOSED DUPLICATE | Docs Contact: | |
| Severity: | high | ||
| Priority: | unspecified | CC: | ijolliff, jun |
| Version: | 4.11 | ||
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-08-03 13:45:44 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
This was introduced by an incomplete fix for Bug 2087125. *** This bug has been marked as a duplicate of bug 2087125 *** |
Description of problem: TALM pod crashed after first batch timed out Version-Release number of selected component (if applicable): 4.11 How reproducible: 100% Steps to Reproduce: - 2 spokes are managed - create cgu to apply a simple config on both clusters with maxConcurrency set to 1, and timeout set to 3 - before original policy became compliant on hub cluster, reboot or power of spoke1 - observe the cgu status indicated timeout (all are expected up until now) Actual results: - no enforce policy was ever created under spoke2 namespace even after spoke1 was recovered. - cluster-group-upgrades-controller-manager pod in crashloopbackoff state with following error in pod logs: 2022-07-27T19:07:50.993Z INFO controllers.ClusterGroupUpgrade [getClustersBySelectors] {"clusterNames": ["ocp-edge87"]} 2022-07-27T19:07:50.993Z INFO controllers.ClusterGroupUpgrade Upgrade is completed 2022-07-27T19:07:51.002Z INFO controllers.ManagedClusterForCGU WARN: No child policies found for cluster {"Name": "local-cluster"} 2022-07-27T19:07:51.002Z INFO controllers.ManagedClusterForCGU Reconciling managedCluster to create clusterGroupUpgrade {"Request.Name": "ocp-edge87"} 2022-07-27T19:07:51.002Z INFO controllers.ManagedClusterForCGU ZTP for the cluster has completed. ztp-done label found. {"Name": "ocp-edge87"} 2022-07-27T19:07:51.002Z INFO controllers.ManagedClusterForCGU Reconciling managedCluster to create clusterGroupUpgrade {"Request.Name": "ocp-edge88"} 2022-07-27T19:07:51.002Z INFO controllers.ManagedClusterForCGU ZTP for the cluster has completed. ztp-done label found. {"Name": "ocp-edge88"} 2022-07-27T19:07:51.016Z INFO controllers.ClusterGroupUpgrade Finish reconciling CGU {"name": "ztp-install/ocp-edge87", "result": {"Requeue":false,"RequeueAfter":0}} 2022-07-27T19:07:51.016Z INFO controllers.ClusterGroupUpgrade Start reconciling CGU {"name": "ztp-install/ocp-edge88"} 2022-07-27T19:07:51.117Z INFO controllers.ClusterGroupUpgrade Loaded CGU {"name": "ztp-install/ocp-edge88", "version": "18340784"} 2022-07-27T19:07:51.118Z INFO controllers.ClusterGroupUpgrade [getClusterBySelectors] {"clustersBySelector": []} 2022-07-27T19:07:51.118Z INFO controllers.ClusterGroupUpgrade [getClustersBySelectors] {"clusterNames": ["ocp-edge88"]} 2022-07-27T19:07:51.118Z INFO controllers.ClusterGroupUpgrade Upgrade is completed 2022-07-27T19:07:51.139Z INFO controllers.ClusterGroupUpgrade Finish reconciling CGU {"name": "ztp-install/ocp-edge88", "result": {"Requeue":false,"RequeueAfter":0}} 2022-07-27T19:07:51.139Z INFO controllers.ClusterGroupUpgrade Start reconciling CGU {"name": "default/test"} 2022-07-27T19:07:51.239Z INFO controllers.ClusterGroupUpgrade Loaded CGU {"name": "default/test", "version": "21234640"} 2022-07-27T19:07:51.239Z INFO controllers.ClusterGroupUpgrade [getClusterBySelectors] {"clustersBySelector": ["ocp-edge87", "ocp-edge88"]} 2022-07-27T19:07:51.239Z INFO controllers.ClusterGroupUpgrade [getClustersBySelectors] {"clusterNames": ["ocp-edge87", "ocp-edge88"]} 2022-07-27T19:07:51.240Z INFO controllers.ClusterGroupUpgrade Finish reconciling CGU {"name": "default/test", "result": {"Requeue":false,"RequeueAfter":0}} panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x13a388d] goroutine 619 [running]: github.com/openshift-kni/cluster-group-upgrades-operator/controllers.(*ClusterGroupUpgradeReconciler).isUpgradeComplete(0xc000e13e80, {0x18ca8d8, 0xc0008b1200}, 0xc0006f6000) /remote-source/app/controllers/clustergroupupgrade_controller.go:1234 +0xed github.com/openshift-kni/cluster-group-upgrades-operator/controllers.(*ClusterGroupUpgradeReconciler).Reconcile(0xc000e13e40, {0x18ca8d8, 0xc0008b1200}, {{{0xc0007dac98, 0x7}, {0xc0007dac94, 0x4}}}) /remote-source/app/controllers/clustergroupupgrade_controller.go:401 +0xbe5 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0000c1220, {0x18ca830, 0xc000b9a080}, {0x154ab60, 0xc00085e140}) /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:298 +0x303 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0000c1220, {0x18ca830, 0xc000b9a080}) /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253 +0x205 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2() /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214 +0x85 created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:210 +0x354 openshift-operators cluster-group-upgrades-controller-manager-6b94f4959-z78g8 1/2 CrashLoopBackOff 7 (3m42s ago) 46h Expected results: CGU moves to batch 2, pod does not crash. Additional info: