Bug 1820297 - noobaa-operator reports panic on creating an invalid Backingstore: Provider :PVC with incorrect SC name
Summary: noobaa-operator reports panic on creating an invalid Backingstore: Provider :...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: Multi-Cloud Object Gateway
Version: 4.2
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: OCS 4.5.0
Assignee: Jacky Albo
QA Contact: Neha Berry
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-02 17:24 UTC by Neha Berry
Modified: 2020-09-15 10:16 UTC (History)
4 users (show)

Fixed In Version: ocs-olm-operator:4.5.0-419.ci
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-15 10:16:04 UTC
Embargoed:


Attachments (Terms of Use)
backingstore as created from UI with incorrect SC name (141.00 KB, image/png)
2020-04-02 17:58 UTC, Neha Berry
no flags Details
noobaa-operator-log (11.02 MB, text/plain)
2020-07-16 11:38 UTC, Neha Berry
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github noobaa noobaa-operator pull 279 0 None closed fixing BZ1820297 2020-08-11 03:31:45 UTC
Red Hat Product Errata RHBA-2020:3754 0 None None None 2020-09-15 10:16:32 UTC

Description Neha Berry 2020-04-02 17:24:52 UTC
noobaa-operator reports panic on creating an invalid Backingstore: Provider - PVC

Description of problem (please be detailed as possible and provide log
snippests):
-------------------------------------------------------------------------
On OCP +OCS 4.2.2 cluster on VMware+VMFS+RHCOS, navigated to Installed Operators->OpenShift Container Storage --> Create Backing Store 

Tried creating a backingstore (screenshot attached).

Backing Store Name* = neha
Provider = PVC
Storage Class = openshift-storage.noobaa.io

Incorrectly, I selected Sc= openshift-storage.noobaa.io instead of SC=ocs-storagecluster-ceph-rbd. The backinstore never came in Ready state(as expected)

But the noobaa-operator pod reported continuous panic and the pod started restarting with CLBO. There were total 16 restarts.

Deleted the faulty backingstore "neha" and the pod recovered and no more panic was observed in the logs


--snip of ocs get csv, oc get pods---


Thu Apr  2 13:53:04 UTC 2020
--------------
========CSV ======
NAME                                         DISPLAY                       VERSION               REPLACES                                     PHASE
elasticsearch-operator.4.2.26-202003230335   Elasticsearch Operator        4.2.26-202003230335   elasticsearch-operator.4.2.24-202003191518   Succeeded
lib-bucket-provisioner.v1.0.0                lib-bucket-provisioner        1.0.0                                                              Succeeded
ocs-operator.v4.2.2                          OpenShift Container Storage   4.2.2                                                              Installing

oc get pods -o wide -n openshift-storage|grep noobaa-operator
noobaa-operator-7b4cc4fcd6-klj78                                  0/1     CrashLoopBackOff   3          24h   10.129.0.13    compute-0   <none>           <none>


-----snip of panic (added in additonal info----

panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x13ebd5b]




Version of all relevant components (if applicable):
-------------------------------------------------------------------------
OCP version = 4.3.8

OCS version = 4.2.2 (LIVE)

[nberry@localhost pods]$ noobaa status
INFO[0000] CLI version: 2.0.10                          
INFO[0000] noobaa-image: noobaa/noobaa-core:5.2.13      
INFO[0000] operator-image: noobaa/noobaa-operator:2.0.10 



Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
-------------------------------------------------------------------------
noobaa-operator pod was in CLBO throughout th time the faulty backinsgtore existed in the cluster

Is there any workaround available to the best of your knowledge?
-------------------------------------------------------------------------
Deleted the backingstore and the operator pod recovered on its own


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
-------------------------------------------------------------------------
3

Can this issue reproducible?
-------------------------------------------------------------------------
Tested once but it should be reproducible


Can this issue reproduce from the UI?
-------------------------------------------------------------------------
Yes

If this is a regression, please provide more details to justify this:
-------------------------------------------------------------------------
Not sure

Steps to Reproduce:
-------------------------------------------------------------------------
1. Create an OCP 4.3.8 cluster
2. Install OCS 4.2.2 from LIVE (via UI)
3. With some FIO and pgsql workloads already in progress, navigate to
  UI-> Installed Operators->OpenShift Container Storage --> Create Backing Store

4. Create a backingstore with Provider = PVC but select an incorrect SC
e.g. 
Backing Store Name* = neha
Provider = PVC
Storage Class = openshift-storage.noobaa.io instead of recommended ocs-storagecluster-ceph-rbd

5. Click on Create . the backingstore will not come in ready state as incorrect SC was selected

6. Check the status of noobaa-operator pod. it reports panic and CLBO continuously.


Actual results:
-------------------------------------------------------------------------
If we create an incorrect Backingstore, the noobaa-operator pod panics.


Expected results:
-------------------------------------------------------------------------

Even if a user creates an incorrect backingstore, the noobaa-operator should not panic. We should get error message but the pod should not keep restarting with CLBO


Additional info:
-------------------------------------------------------------------------

/go/src/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/panic.go:522
/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/panic.go:82
/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/signal_unix.go:390
/go/src/github.com/noobaa/noobaa-operator/v2/pkg/backingstore/reconciler.go:371
/go/src/github.com/noobaa/noobaa-operator/v2/pkg/backingstore/reconciler.go:229
/go/src/github.com/noobaa/noobaa-operator/v2/pkg/backingstore/reconciler.go:152
/go/src/github.com/noobaa/noobaa-operator/v2/pkg/backingstore/reconciler.go:118
/go/src/github.com/noobaa/noobaa-operator/v2/pkg/controller/backingstore/backingstore_controller.go:29
/go/src/sigs.k8s.io/controller-runtime/pkg/reconcile/reconcile.go:92
/go/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215
/go/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158
/go/src/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/go/src/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/go/src/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/asm_amd64.s:1337
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x13ebd5b]

goroutine 224 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
        /go/src/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x105
panic(0x17a2ea0, 0x2bff080)
        /opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/panic.go:522 +0x1b5
github.com/noobaa/noobaa-operator/v2/pkg/backingstore.(*Reconciler).ReadSystemInfo(0xc000965200, 0x19c6450, 0xa)
        /go/src/github.com/noobaa/noobaa-operator/v2/pkg/backingstore/reconciler.go:371 +0x29b
github.com/noobaa/noobaa-operator/v2/pkg/backingstore.(*Reconciler).ReconcilePhaseConnecting(0xc000965200, 0x0, 0x0)
        /go/src/github.com/noobaa/noobaa-operator/v2/pkg/backingstore/reconciler.go:229 +0x7c
github.com/noobaa/noobaa-operator/v2/pkg/backingstore.(*Reconciler).ReconcilePhases(0xc000965200, 0xc0007ad501, 0x19c5301)
        /go/src/github.com/noobaa/noobaa-operator/v2/pkg/backingstore/reconciler.go:152 +0x4c
github.com/noobaa/noobaa-operator/v2/pkg/backingstore.(*Reconciler).Reconcile(0xc000965200, 0x11, 0xc000470780, 0x4, 0x1d2d4a0)
        /go/src/github.com/noobaa/noobaa-operator/v2/pkg/backingstore/reconciler.go:118 +0x539
github.com/noobaa/noobaa-operator/v2/pkg/controller/backingstore.Add.func1(0xc000e46220, 0x11, 0xc000470780, 0x4, 0xc000526760, 0xc0008a5d88, 0x18, 0xc0008a5d80)
        /go/src/github.com/noobaa/noobaa-operator/v2/pkg/controller/backingstore/backingstore_controller.go:29 +0x113
sigs.k8s.io/controller-runtime/pkg/reconcile.Func.Reconcile(0xc00079b0c0, 0xc000e46220, 0x11, 0xc000470780, 0x4, 0x2c1b020, 0x3, 0x3, 0x2000000000000)
        /go/src/sigs.k8s.io/controller-runtime/pkg/reconcile/reconcile.go:92 +0x4e
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc00067e820, 0x0)
        /go/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215 +0x1cc
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1()
        /go/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158 +0x36
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc0008640c0)
        /go/src/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x54
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0008640c0, 0x3b9aca00, 0x0, 0x1, 0xc00032e0c0)
        /go/src/k8s.io/apimachinery/pkg/util/wait/wait.go:134 +0xf8
k8s.io/apimachinery/pkg/util/wait.Until(0xc0008640c0, 0x3b9aca00, 0xc00032e0c0)
        /go/src/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start
        /go/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:157 +0x311

Comment 3 Neha Berry 2020-04-02 17:58:14 UTC
Created attachment 1675844 [details]
backingstore as created from UI with incorrect SC name

Comment 4 Raz Tamir 2020-04-12 13:35:24 UTC
@Nimrod,

I'm moving back to 4.4.
Let's do a proper triaging if you think this need to be fixed in 4.5+

Comment 5 Nimrod Becker 2020-04-13 08:49:34 UTC
@raz we have a PR merged US, I don't think its a blocker. If you really want it we can backport the fix, but it really isn't one.

Comment 6 Raz Tamir 2020-04-13 08:59:37 UTC
Sure,

this wasn't marked as a blocker.
Moving to 4.5

Comment 10 Neha Berry 2020-07-16 11:38:14 UTC
Created attachment 1701372 [details]
noobaa-operator-log

Verified that the noobaa operator pod did not report a panic on selecting incorrect SC type for noobaa PVC backinstore.


cluster  VMware+RHCOS, 

cluster channel: stable-4.5
OCP cluster version: 4.5.0-0.nightly-2020-07-14-213353
OCS = ocs-operator.v4.5.0-487.ci

INFO[0000] CLI version: 2.3.0                           
INFO[0000] noobaa-image: noobaa/noobaa-core:5.5.0-rc3   
INFO[0000] operator-image: noobaa/noobaa-operator:2.3.0 
INFO[0000] Namespace: openshift-storage                 
INFO[0000]                                       



Final observation:

No panic in noobaa-operator on using incorrect SC name while creating a PVC backed backingstore.

PS: We did face an issue on running the backingstore command with the other non-PVC SC, i.e. ocs-storagecluster-ceph-rgw. Will be raising a follow up BZ for the same.


Tested in UI
==================

navigated to Installed Operators->OpenShift Container Storage --> Create Backing Store 

1. Tried creating a backingstore

Backing Store Name* = neha
Provider = PVC
Storage Class = openshift-storage.noobaa.io

2. State of resources (as expected) after the fix:

>>oc get pvc

neha-noobaa-pvc-bdb188e3       Pending                                                                        openshift-storage.noobaa.io   17m

>>oc get pod
neha-noobaa-pod-bdb188e3                                          0/1     Pending     0          16m     <none>        <none>      <none>           <none>

>>oc get backingstore -A
NAMESPACE           NAME                           TYPE            PHASE      AGE
openshift-storage   neha                           pv-pool         Rejected   11m


The noobaa-opearor pod did not report panic 

Verification from CLI
=================================


ocalhost auth]$ /usr/local/bin/nooba-cli backingstore create pv-pool pool2 --num-volumes 1 --pv-size-gb 16 --storage-class openshift-storage.noobaa.io
INFO[0005] ✅ Exists: NooBaa "noobaa"                    
INFO[0006] ✅ Exists: StorageClass "openshift-storage.noobaa.io" 
FATA[0006] ❌ Could not set StorageClass "openshift-storage.noobaa.io" for system in namespace "openshift-storage" - as this class reserved for obc only 
[nberry@localhost auth]$ 


Successful creation on using correct SC type:
=================================================
>>oc get backingstore -A

openshift-storage   new-bs                         pv-pool         Ready      9m13s

>> $ oc get pvc|grep new
new-bs-noobaa-pvc-4f606e9d     Bound     pvc-22548f2d-7394-45ae-a64f-4b2523b9cf39   50Gi       RWO            ocs-storagecluster-ceph-rbd   54m

$ oc get pod|grep new
new-bs-noobaa-pod-4f606e9d                                        1/1     Running     0          55m


____________________________________________________________________________________________________


[nberry@localhost auth]$ /usr/local/bin/nooba-cli status
INFO[0000] CLI version: 2.3.0                           
INFO[0000] noobaa-image: noobaa/noobaa-core:5.5.0-rc3   
INFO[0000] operator-image: noobaa/noobaa-operator:2.3.0 
INFO[0000] Namespace: openshift-storage                 
INFO[0000]                                              
INFO[0000] CRD Status:                                  
INFO[0004] ✅ Exists: CustomResourceDefinition "noobaas.noobaa.io" 
INFO[0005] ✅ Exists: CustomResourceDefinition "backingstores.noobaa.io" 
INFO[0005] ✅ Exists: CustomResourceDefinition "bucketclasses.noobaa.io" 
INFO[0006] ✅ Exists: CustomResourceDefinition "objectbucketclaims.objectbucket.io" 
INFO[0006] ✅ Exists: CustomResourceDefinition "objectbuckets.objectbucket.io" 
INFO[0006]                                              
INFO[0006] Operator Status:                             
INFO[0007] ✅ Exists: Namespace "openshift-storage"      
INFO[0008] ✅ Exists: ServiceAccount "noobaa"            
INFO[0009] ✅ Exists: Role "ocs-operator.v4.5.0-487.ci-86797d7d59" 
INFO[0010] ✅ Exists: RoleBinding "ocs-operator.v4.5.0-487.ci-86797d7d59-5bbd9475c9" 
INFO[0010] ✅ Exists: ClusterRole "ocs-operator.v4.5.0-487.ci-d4c5fc6b6" 
INFO[0011] ✅ Exists: ClusterRoleBinding "ocs-operator.v4.5.0-487.ci-d4c5fc6b6-85444fb7cb" 
INFO[0011] ✅ Exists: Deployment "noobaa-operator"       
INFO[0011]                                              
INFO[0011] System Status:                               
INFO[0012] ✅ Exists: NooBaa "noobaa"                    
INFO[0012] ✅ Exists: StatefulSet "noobaa-core"          
INFO[0013] ✅ Exists: StatefulSet "noobaa-db"            
INFO[0013] ✅ Exists: Service "noobaa-mgmt"              
INFO[0014] ✅ Exists: Service "s3"                       
INFO[0014] ✅ Exists: Service "noobaa-db"                
INFO[0015] ✅ Exists: Secret "noobaa-server"             
INFO[0015] ✅ Exists: Secret "noobaa-operator"           
INFO[0016] ✅ Exists: Secret "noobaa-endpoints"          
INFO[0016] ✅ Exists: Secret "noobaa-admin"              
INFO[0017] ✅ Exists: StorageClass "openshift-storage.noobaa.io" 
INFO[0017] ✅ Exists: BucketClass "noobaa-default-bucket-class" 
INFO[0018] ✅ Exists: Deployment "noobaa-endpoint"       
INFO[0018] ✅ Exists: HorizontalPodAutoscaler "noobaa-endpoint" 
INFO[0019] ✅ (Optional) Exists: BackingStore "noobaa-default-backing-store" 
INFO[0019] ⬛ (Optional) Not Found: CredentialsRequest "noobaa-aws-cloud-creds" 
INFO[0020] ⬛ (Optional) Not Found: CredentialsRequest "noobaa-azure-cloud-creds" 
INFO[0020] ⬛ (Optional) Not Found: Secret "noobaa-azure-container-creds" 
INFO[0021] ✅ (Optional) Exists: PrometheusRule "noobaa-prometheus-rules" 
INFO[0021] ✅ (Optional) Exists: ServiceMonitor "noobaa-service-monitor" 
INFO[0022] ✅ (Optional) Exists: Route "noobaa-mgmt"     
INFO[0022] ✅ (Optional) Exists: Route "s3"              
INFO[0023] ✅ Exists: PersistentVolumeClaim "db-noobaa-db-0" 
INFO[0023] ✅ System Phase is "Ready"                    
INFO[0023] ✅ Exists:  "noobaa-admin"                    

#------------------#
#- Mgmt Addresses -#
#------------------#

ExternalDNS : [https://noobaa-mgmt-openshift-storage.apps.sagrawal-dc25.qe.rh-ocs.com]
ExternalIP  : []
NodePorts   : [https://10.1.50.27:32256]
InternalDNS : [https://noobaa-mgmt.openshift-storage.svc:443]
InternalIP  : [https://172.30.44.171:443]
PodPorts    : [https://10.129.2.16:8443]

#--------------------#
#- Mgmt Credentials -#
#--------------------#

email    : admin
password : Ae/HbjQjLtiti7W3EuPD/A==

#----------------#
#- S3 Addresses -#
#----------------#

ExternalDNS : [https://s3-openshift-storage.apps.sagrawal-dc25.qe.rh-ocs.com]
ExternalIP  : []
NodePorts   : [https://10.1.50.27:32042 https://10.1.50.24:32042]
InternalDNS : [https://s3.openshift-storage.svc:443]
InternalIP  : [https://172.30.25.249:443]
PodPorts    : [https://10.129.2.21:6443 https://10.131.0.28:6443]

#------------------#
#- S3 Credentials -#
#------------------#

AWS_ACCESS_KEY_ID     : scelO690kcvnbEnat2q2
AWS_SECRET_ACCESS_KEY : S2DjGwc/P5CWO5Mbdl1T6/hBHVuPCAYfZLBrakEt

#------------------#
#- Backing Stores -#
#------------------#

NAME                           TYPE            TARGET-BUCKET                                       PHASE      AGE         
neha                           pv-pool                                                             Rejected   59m6s       
new-bs                         pv-pool                                                             Ready      55m58s      
noobaa-default-backing-store   s3-compatible   nb.1594835529533.apps.sagrawal-dc25.qe.rh-ocs.com   Ready      17h37m21s   
pool1                          pv-pool                                                             Rejected   50m17s      

#------------------#
#- Bucket Classes -#
#------------------#

NAME                          PLACEMENT                                                             PHASE   AGE         
noobaa-default-bucket-class   {Tiers:[{Placement: BackingStores:[noobaa-default-backing-store]}]}   Ready   17h37m21s   

#-----------------#
#- Bucket Claims -#
#-----------------#

No OBCs found.

Comment 11 Neha Berry 2020-07-16 15:22:37 UTC
My bad.. It is seen that the fix is only from CLI... If one uses incorrect SC in UI, the PVC+POD+ Backingstore still gets created and stay in Pending / Creating state.

But atleast the noobaa-operator pod is not reporting any panic.


Moving the BZ to Assigned state as the fix is only for CLI and we still see issues in UI creation of PVC backed backingstore(with an incorrect SC specified)



Tested in UI
==================

navigated to Installed Operators->OpenShift Container Storage --> Create Backing Store 

1. Tried creating a backingstore

Backing Store Name* = neha
Provider = PVC
Storage Class = openshift-storage.noobaa.io

2. State of resources (as expected) after the fix:

>>oc get pvc

neha-noobaa-pvc-bdb188e3       Pending                                                                        openshift-storage.noobaa.io   17m

>>oc get pod
neha-noobaa-pod-bdb188e3                                          0/1     Pending     0          16m     <none>        <none>      <none>           <none>

>>oc get backingstore -A
NAMESPACE           NAME                           TYPE            PHASE      AGE
openshift-storage   neha                           pv-pool         Rejected   11m

Comment 13 Nimrod Becker 2020-07-16 16:01:35 UTC
This is not the same issue, and should not have been reopend.
NooBaa rejects this when you use the CLI and also doesn't panic when you use the UI.

If the request here is for a better experience in the UI its a different bug (a new one). In any case, nothing here to be done on noobaa's side. We don't fail but we also can't prevent the user from setting it in the UI...

Comment 14 Neha Berry 2020-07-16 16:25:22 UTC
After further discussions with Jacky and Nimrod, it seems fair to move this BZ to verified state as the original issue of operator pod panic was never obseved, whether we used invalid SC in UI/CLI.

I will raise a separate UI issue for accepting the invalid SC name.

Thank you Nimrod and Jacky for all the help.

Moving the BZ to verified state based on Comment#10.

Comment 17 errata-xmlrpc 2020-09-15 10:16:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Container Storage 4.5.0 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3754


Note You need to log in before you can comment on or make changes to this bug.