Bug 2100826

Summary: [gcp]capg-controller-manager report panic after deleting Provider
Product: OpenShift Container Platform Reporter: sunzhaohua <zhsun>
Component: Cloud ComputeAssignee: Alexander Demicev <ademicev>
Cloud Compute sub component: Other Providers QA Contact: sunzhaohua <zhsun>
Status: CLOSED NEXTRELEASE Docs Contact:
Severity: high    
Priority: high CC: miyadav
Version: 4.11   
Target Milestone: ---   
Target Release: 4.12.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-07-20 12:31:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description sunzhaohua 2022-06-24 11:40:33 UTC
Description of problem:
On gcp capg-controller-manager report panic after deleting coreprovider and InfrastructureProvider, couldn't reproduce every time, checked on another cluster, works well.

Version-Release number of selected component (if applicable):
4.11.0-0.nightly-2022-06-23-092832

How reproducible:
Sometimes

Steps to Reproduce:
1. Enable CAPI by featuregate
2. Delete coreprovider and InfrastructureProvider
$ oc delete coreprovider cluster-api                      
coreprovider.operator.cluster.x-k8s.io "cluster-api" deleted
$ oc delete InfrastructureProvider gcp
infrastructureprovider.operator.cluster.x-k8s.io "gcp" deleted
3. Check pod

Actual results:
 Providers recreated by operator, but capg pod show panic.
$ oc get coreprovider                                                                                                                                           [19:35:36]
NAME          INSTALLEDVERSION   READY
cluster-api   v1.1.2             True

$ oc get InfrastructureProvider                                                                                                                                 [19:34:06]
NAME   INSTALLEDVERSION   READY
gcp    v1.0.0             True

$ oc get po                                                                                                                                                     
NAME                                               READY   STATUS             RESTARTS        AGE
capg-controller-manager-6fcd6dcc88-spfcl           0/1     CrashLoopBackOff   6 (2m49s ago)   11m
capi-controller-manager-5b948c8fc7-vhlhc           1/1     Running            0               19m
capi-operator-controller-manager-7c9fc4fb5-l2m46   2/2     Running            0               19m
cluster-capi-operator-6f79b7f-8xvt2                1/1     Running            0               3h55m

$ oc logs -f capg-controller-manager-6fcd6dcc88-spfcl
I0624 11:26:22.601111       1 reconcile.go:64] controller/gcpcluster "msg"="Deleting loadbalancer resources" "name"="huliu-gcp18-w98g7" "namespace"="openshift-cluster-api" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="GCPCluster"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x20d434b]

goroutine 532 [running]:
sigs.k8s.io/cluster-api-provider-gcp/cloud/scope.(*ClusterScope).ForwardingRuleSpec(0xc000b28d20)
	/build/cloud/scope/cluster.go:267 +0x2b
sigs.k8s.io/cluster-api-provider-gcp/cloud/services/compute/loadbalancers.(*Service).deleteForwardingRule(0xc00016e620, {0x2c8f528, 0xc000465f80})
	/build/cloud/services/compute/loadbalancers/reconcile.go:293 +0x7e
sigs.k8s.io/cluster-api-provider-gcp/cloud/services/compute/loadbalancers.(*Service).Delete(0x2c969e0?, {0x2c8f528, 0xc000465f80})
	/build/cloud/services/compute/loadbalancers/reconcile.go:65 +0x67
sigs.k8s.io/cluster-api-provider-gcp/controllers.(*GCPClusterReconciler).reconcileDelete(0xc00003e600?, {0x2c8f528, 0xc000465f80}, 0xc000b28d20)
	/build/controllers/gcpcluster_controller.go:236 +0x178
sigs.k8s.io/cluster-api-provider-gcp/controllers.(*GCPClusterReconciler).Reconcile(0xc0006ca2a0, {0x2c8f560?, 0xc000697710?}, {{{0xc00078d9c8?, 0x27b3da0?}, {0xc00078d9b0?, 0x30?}}})
	/build/controllers/gcpcluster_controller.go:153 +0x62a
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0xc00013c370, {0x2c8f560, 0xc0006976e0}, {{{0xc00078d9c8?, 0x27b3da0?}, {0xc00078d9b0?, 0x4041f4?}}})
	/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:114 +0x27e
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc00013c370, {0x2c8f4b8, 0xc0004cc140}, {0x24d8900?, 0xc0003f9b00?})
	/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311 +0x349
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc00013c370, {0x2c8f4b8, 0xc0004cc140})
	/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266 +0x1d9
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
	/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
	/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:223 +0x31c

Expected results:
capg-controller-manager no panic.

Additional info:
must-gather: https://drive.google.com/file/d/1iKg569yuXrUKRrvf1uGP5gyNP0Y0nWUG/view?usp=sharing

Comment 4 Joel Speed 2022-07-20 12:31:36 UTC
We will track this issue in Jira going forward, https://issues.redhat.com/browse/OCPCLOUD-1626