noobaa-operator reports panic on creating an invalid Backingstore: Provider - PVC Description of problem (please be detailed as possible and provide log snippests): ------------------------------------------------------------------------- On OCP +OCS 4.2.2 cluster on VMware+VMFS+RHCOS, navigated to Installed Operators->OpenShift Container Storage --> Create Backing Store Tried creating a backingstore (screenshot attached). Backing Store Name* = neha Provider = PVC Storage Class = openshift-storage.noobaa.io Incorrectly, I selected Sc= openshift-storage.noobaa.io instead of SC=ocs-storagecluster-ceph-rbd. The backinstore never came in Ready state(as expected) But the noobaa-operator pod reported continuous panic and the pod started restarting with CLBO. There were total 16 restarts. Deleted the faulty backingstore "neha" and the pod recovered and no more panic was observed in the logs --snip of ocs get csv, oc get pods--- Thu Apr 2 13:53:04 UTC 2020 -------------- ========CSV ====== NAME DISPLAY VERSION REPLACES PHASE elasticsearch-operator.4.2.26-202003230335 Elasticsearch Operator 4.2.26-202003230335 elasticsearch-operator.4.2.24-202003191518 Succeeded lib-bucket-provisioner.v1.0.0 lib-bucket-provisioner 1.0.0 Succeeded ocs-operator.v4.2.2 OpenShift Container Storage 4.2.2 Installing oc get pods -o wide -n openshift-storage|grep noobaa-operator noobaa-operator-7b4cc4fcd6-klj78 0/1 CrashLoopBackOff 3 24h 10.129.0.13 compute-0 <none> <none> -----snip of panic (added in additonal info---- panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x13ebd5b] Version of all relevant components (if applicable): ------------------------------------------------------------------------- OCP version = 4.3.8 OCS version = 4.2.2 (LIVE) [nberry@localhost pods]$ noobaa status INFO[0000] CLI version: 2.0.10 INFO[0000] noobaa-image: noobaa/noobaa-core:5.2.13 INFO[0000] operator-image: noobaa/noobaa-operator:2.0.10 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? ------------------------------------------------------------------------- noobaa-operator pod was in CLBO throughout th time the faulty backinsgtore existed in the cluster Is there any workaround available to the best of your knowledge? ------------------------------------------------------------------------- Deleted the backingstore and the operator pod recovered on its own Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? ------------------------------------------------------------------------- 3 Can this issue reproducible? ------------------------------------------------------------------------- Tested once but it should be reproducible Can this issue reproduce from the UI? ------------------------------------------------------------------------- Yes If this is a regression, please provide more details to justify this: ------------------------------------------------------------------------- Not sure Steps to Reproduce: ------------------------------------------------------------------------- 1. Create an OCP 4.3.8 cluster 2. Install OCS 4.2.2 from LIVE (via UI) 3. With some FIO and pgsql workloads already in progress, navigate to UI-> Installed Operators->OpenShift Container Storage --> Create Backing Store 4. Create a backingstore with Provider = PVC but select an incorrect SC e.g. Backing Store Name* = neha Provider = PVC Storage Class = openshift-storage.noobaa.io instead of recommended ocs-storagecluster-ceph-rbd 5. Click on Create . the backingstore will not come in ready state as incorrect SC was selected 6. Check the status of noobaa-operator pod. it reports panic and CLBO continuously. Actual results: ------------------------------------------------------------------------- If we create an incorrect Backingstore, the noobaa-operator pod panics. Expected results: ------------------------------------------------------------------------- Even if a user creates an incorrect backingstore, the noobaa-operator should not panic. We should get error message but the pod should not keep restarting with CLBO Additional info: ------------------------------------------------------------------------- /go/src/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65 /go/src/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51 /opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/panic.go:522 /opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/panic.go:82 /opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/signal_unix.go:390 /go/src/github.com/noobaa/noobaa-operator/v2/pkg/backingstore/reconciler.go:371 /go/src/github.com/noobaa/noobaa-operator/v2/pkg/backingstore/reconciler.go:229 /go/src/github.com/noobaa/noobaa-operator/v2/pkg/backingstore/reconciler.go:152 /go/src/github.com/noobaa/noobaa-operator/v2/pkg/backingstore/reconciler.go:118 /go/src/github.com/noobaa/noobaa-operator/v2/pkg/controller/backingstore/backingstore_controller.go:29 /go/src/sigs.k8s.io/controller-runtime/pkg/reconcile/reconcile.go:92 /go/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215 /go/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158 /go/src/k8s.io/apimachinery/pkg/util/wait/wait.go:133 /go/src/k8s.io/apimachinery/pkg/util/wait/wait.go:134 /go/src/k8s.io/apimachinery/pkg/util/wait/wait.go:88 /opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/asm_amd64.s:1337 panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x13ebd5b] goroutine 224 [running]: k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) /go/src/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x105 panic(0x17a2ea0, 0x2bff080) /opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/panic.go:522 +0x1b5 github.com/noobaa/noobaa-operator/v2/pkg/backingstore.(*Reconciler).ReadSystemInfo(0xc000965200, 0x19c6450, 0xa) /go/src/github.com/noobaa/noobaa-operator/v2/pkg/backingstore/reconciler.go:371 +0x29b github.com/noobaa/noobaa-operator/v2/pkg/backingstore.(*Reconciler).ReconcilePhaseConnecting(0xc000965200, 0x0, 0x0) /go/src/github.com/noobaa/noobaa-operator/v2/pkg/backingstore/reconciler.go:229 +0x7c github.com/noobaa/noobaa-operator/v2/pkg/backingstore.(*Reconciler).ReconcilePhases(0xc000965200, 0xc0007ad501, 0x19c5301) /go/src/github.com/noobaa/noobaa-operator/v2/pkg/backingstore/reconciler.go:152 +0x4c github.com/noobaa/noobaa-operator/v2/pkg/backingstore.(*Reconciler).Reconcile(0xc000965200, 0x11, 0xc000470780, 0x4, 0x1d2d4a0) /go/src/github.com/noobaa/noobaa-operator/v2/pkg/backingstore/reconciler.go:118 +0x539 github.com/noobaa/noobaa-operator/v2/pkg/controller/backingstore.Add.func1(0xc000e46220, 0x11, 0xc000470780, 0x4, 0xc000526760, 0xc0008a5d88, 0x18, 0xc0008a5d80) /go/src/github.com/noobaa/noobaa-operator/v2/pkg/controller/backingstore/backingstore_controller.go:29 +0x113 sigs.k8s.io/controller-runtime/pkg/reconcile.Func.Reconcile(0xc00079b0c0, 0xc000e46220, 0x11, 0xc000470780, 0x4, 0x2c1b020, 0x3, 0x3, 0x2000000000000) /go/src/sigs.k8s.io/controller-runtime/pkg/reconcile/reconcile.go:92 +0x4e sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc00067e820, 0x0) /go/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215 +0x1cc sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1() /go/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158 +0x36 k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc0008640c0) /go/src/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x54 k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0008640c0, 0x3b9aca00, 0x0, 0x1, 0xc00032e0c0) /go/src/k8s.io/apimachinery/pkg/util/wait/wait.go:134 +0xf8 k8s.io/apimachinery/pkg/util/wait.Until(0xc0008640c0, 0x3b9aca00, 0xc00032e0c0) /go/src/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start /go/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:157 +0x311
Created attachment 1675844 [details] backingstore as created from UI with incorrect SC name
@Nimrod, I'm moving back to 4.4. Let's do a proper triaging if you think this need to be fixed in 4.5+
@raz we have a PR merged US, I don't think its a blocker. If you really want it we can backport the fix, but it really isn't one.
Sure, this wasn't marked as a blocker. Moving to 4.5
Created attachment 1701372 [details] noobaa-operator-log Verified that the noobaa operator pod did not report a panic on selecting incorrect SC type for noobaa PVC backinstore. cluster VMware+RHCOS, cluster channel: stable-4.5 OCP cluster version: 4.5.0-0.nightly-2020-07-14-213353 OCS = ocs-operator.v4.5.0-487.ci INFO[0000] CLI version: 2.3.0 INFO[0000] noobaa-image: noobaa/noobaa-core:5.5.0-rc3 INFO[0000] operator-image: noobaa/noobaa-operator:2.3.0 INFO[0000] Namespace: openshift-storage INFO[0000] Final observation: No panic in noobaa-operator on using incorrect SC name while creating a PVC backed backingstore. PS: We did face an issue on running the backingstore command with the other non-PVC SC, i.e. ocs-storagecluster-ceph-rgw. Will be raising a follow up BZ for the same. Tested in UI ================== navigated to Installed Operators->OpenShift Container Storage --> Create Backing Store 1. Tried creating a backingstore Backing Store Name* = neha Provider = PVC Storage Class = openshift-storage.noobaa.io 2. State of resources (as expected) after the fix: >>oc get pvc neha-noobaa-pvc-bdb188e3 Pending openshift-storage.noobaa.io 17m >>oc get pod neha-noobaa-pod-bdb188e3 0/1 Pending 0 16m <none> <none> <none> <none> >>oc get backingstore -A NAMESPACE NAME TYPE PHASE AGE openshift-storage neha pv-pool Rejected 11m The noobaa-opearor pod did not report panic Verification from CLI ================================= ocalhost auth]$ /usr/local/bin/nooba-cli backingstore create pv-pool pool2 --num-volumes 1 --pv-size-gb 16 --storage-class openshift-storage.noobaa.io INFO[0005] ✅ Exists: NooBaa "noobaa" INFO[0006] ✅ Exists: StorageClass "openshift-storage.noobaa.io" FATA[0006] ❌ Could not set StorageClass "openshift-storage.noobaa.io" for system in namespace "openshift-storage" - as this class reserved for obc only [nberry@localhost auth]$ Successful creation on using correct SC type: ================================================= >>oc get backingstore -A openshift-storage new-bs pv-pool Ready 9m13s >> $ oc get pvc|grep new new-bs-noobaa-pvc-4f606e9d Bound pvc-22548f2d-7394-45ae-a64f-4b2523b9cf39 50Gi RWO ocs-storagecluster-ceph-rbd 54m $ oc get pod|grep new new-bs-noobaa-pod-4f606e9d 1/1 Running 0 55m ____________________________________________________________________________________________________ [nberry@localhost auth]$ /usr/local/bin/nooba-cli status INFO[0000] CLI version: 2.3.0 INFO[0000] noobaa-image: noobaa/noobaa-core:5.5.0-rc3 INFO[0000] operator-image: noobaa/noobaa-operator:2.3.0 INFO[0000] Namespace: openshift-storage INFO[0000] INFO[0000] CRD Status: INFO[0004] ✅ Exists: CustomResourceDefinition "noobaas.noobaa.io" INFO[0005] ✅ Exists: CustomResourceDefinition "backingstores.noobaa.io" INFO[0005] ✅ Exists: CustomResourceDefinition "bucketclasses.noobaa.io" INFO[0006] ✅ Exists: CustomResourceDefinition "objectbucketclaims.objectbucket.io" INFO[0006] ✅ Exists: CustomResourceDefinition "objectbuckets.objectbucket.io" INFO[0006] INFO[0006] Operator Status: INFO[0007] ✅ Exists: Namespace "openshift-storage" INFO[0008] ✅ Exists: ServiceAccount "noobaa" INFO[0009] ✅ Exists: Role "ocs-operator.v4.5.0-487.ci-86797d7d59" INFO[0010] ✅ Exists: RoleBinding "ocs-operator.v4.5.0-487.ci-86797d7d59-5bbd9475c9" INFO[0010] ✅ Exists: ClusterRole "ocs-operator.v4.5.0-487.ci-d4c5fc6b6" INFO[0011] ✅ Exists: ClusterRoleBinding "ocs-operator.v4.5.0-487.ci-d4c5fc6b6-85444fb7cb" INFO[0011] ✅ Exists: Deployment "noobaa-operator" INFO[0011] INFO[0011] System Status: INFO[0012] ✅ Exists: NooBaa "noobaa" INFO[0012] ✅ Exists: StatefulSet "noobaa-core" INFO[0013] ✅ Exists: StatefulSet "noobaa-db" INFO[0013] ✅ Exists: Service "noobaa-mgmt" INFO[0014] ✅ Exists: Service "s3" INFO[0014] ✅ Exists: Service "noobaa-db" INFO[0015] ✅ Exists: Secret "noobaa-server" INFO[0015] ✅ Exists: Secret "noobaa-operator" INFO[0016] ✅ Exists: Secret "noobaa-endpoints" INFO[0016] ✅ Exists: Secret "noobaa-admin" INFO[0017] ✅ Exists: StorageClass "openshift-storage.noobaa.io" INFO[0017] ✅ Exists: BucketClass "noobaa-default-bucket-class" INFO[0018] ✅ Exists: Deployment "noobaa-endpoint" INFO[0018] ✅ Exists: HorizontalPodAutoscaler "noobaa-endpoint" INFO[0019] ✅ (Optional) Exists: BackingStore "noobaa-default-backing-store" INFO[0019] ⬛ (Optional) Not Found: CredentialsRequest "noobaa-aws-cloud-creds" INFO[0020] ⬛ (Optional) Not Found: CredentialsRequest "noobaa-azure-cloud-creds" INFO[0020] ⬛ (Optional) Not Found: Secret "noobaa-azure-container-creds" INFO[0021] ✅ (Optional) Exists: PrometheusRule "noobaa-prometheus-rules" INFO[0021] ✅ (Optional) Exists: ServiceMonitor "noobaa-service-monitor" INFO[0022] ✅ (Optional) Exists: Route "noobaa-mgmt" INFO[0022] ✅ (Optional) Exists: Route "s3" INFO[0023] ✅ Exists: PersistentVolumeClaim "db-noobaa-db-0" INFO[0023] ✅ System Phase is "Ready" INFO[0023] ✅ Exists: "noobaa-admin" #------------------# #- Mgmt Addresses -# #------------------# ExternalDNS : [https://noobaa-mgmt-openshift-storage.apps.sagrawal-dc25.qe.rh-ocs.com] ExternalIP : [] NodePorts : [https://10.1.50.27:32256] InternalDNS : [https://noobaa-mgmt.openshift-storage.svc:443] InternalIP : [https://172.30.44.171:443] PodPorts : [https://10.129.2.16:8443] #--------------------# #- Mgmt Credentials -# #--------------------# email : admin password : Ae/HbjQjLtiti7W3EuPD/A== #----------------# #- S3 Addresses -# #----------------# ExternalDNS : [https://s3-openshift-storage.apps.sagrawal-dc25.qe.rh-ocs.com] ExternalIP : [] NodePorts : [https://10.1.50.27:32042 https://10.1.50.24:32042] InternalDNS : [https://s3.openshift-storage.svc:443] InternalIP : [https://172.30.25.249:443] PodPorts : [https://10.129.2.21:6443 https://10.131.0.28:6443] #------------------# #- S3 Credentials -# #------------------# AWS_ACCESS_KEY_ID : scelO690kcvnbEnat2q2 AWS_SECRET_ACCESS_KEY : S2DjGwc/P5CWO5Mbdl1T6/hBHVuPCAYfZLBrakEt #------------------# #- Backing Stores -# #------------------# NAME TYPE TARGET-BUCKET PHASE AGE neha pv-pool Rejected 59m6s new-bs pv-pool Ready 55m58s noobaa-default-backing-store s3-compatible nb.1594835529533.apps.sagrawal-dc25.qe.rh-ocs.com Ready 17h37m21s pool1 pv-pool Rejected 50m17s #------------------# #- Bucket Classes -# #------------------# NAME PLACEMENT PHASE AGE noobaa-default-bucket-class {Tiers:[{Placement: BackingStores:[noobaa-default-backing-store]}]} Ready 17h37m21s #-----------------# #- Bucket Claims -# #-----------------# No OBCs found.
My bad.. It is seen that the fix is only from CLI... If one uses incorrect SC in UI, the PVC+POD+ Backingstore still gets created and stay in Pending / Creating state. But atleast the noobaa-operator pod is not reporting any panic. Moving the BZ to Assigned state as the fix is only for CLI and we still see issues in UI creation of PVC backed backingstore(with an incorrect SC specified) Tested in UI ================== navigated to Installed Operators->OpenShift Container Storage --> Create Backing Store 1. Tried creating a backingstore Backing Store Name* = neha Provider = PVC Storage Class = openshift-storage.noobaa.io 2. State of resources (as expected) after the fix: >>oc get pvc neha-noobaa-pvc-bdb188e3 Pending openshift-storage.noobaa.io 17m >>oc get pod neha-noobaa-pod-bdb188e3 0/1 Pending 0 16m <none> <none> <none> <none> >>oc get backingstore -A NAMESPACE NAME TYPE PHASE AGE openshift-storage neha pv-pool Rejected 11m
This is not the same issue, and should not have been reopend. NooBaa rejects this when you use the CLI and also doesn't panic when you use the UI. If the request here is for a better experience in the UI its a different bug (a new one). In any case, nothing here to be done on noobaa's side. We don't fail but we also can't prevent the user from setting it in the UI...
After further discussions with Jacky and Nimrod, it seems fair to move this BZ to verified state as the original issue of operator pod panic was never obseved, whether we used invalid SC in UI/CLI. I will raise a separate UI issue for accepting the invalid SC name. Thank you Nimrod and Jacky for all the help. Moving the BZ to verified state based on Comment#10.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Container Storage 4.5.0 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3754