Bug 2019577

Summary: SystemStore: load failed Error: NON_EXISTING_ROOT_KEY
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: khover
Component: Multi-Cloud Object GatewayAssignee: Romy Ayalon <rayalon>
Status: CLOSED NOTABUG QA Contact: Raz Tamir <ratamir>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.7CC: etamir, mmuench, nbecker, ocs-bugs, odf-bz-bot, pbyregow, rayalon, tdesala
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-16 07:53:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description khover 2021-11-02 20:54:57 UTC
Description of problem (please be detailed as possible and provide log
snippests):

After rebuild of noobaa following this RH doc.

https://access.redhat.com/solutions/5948631

Noobaa endpoint pod never came back up and this error was logged in the noobaa-core pod logs

Nov-2 20:15:22.094 [/16] [ERROR] core.server.system_services.system_store:: SystemStore: load failed Error: NON_EXISTING_ROOT_KEY
    at MasterKeysManager.load_root_key (/root/node_modules/noobaa-core/src/server/system_services/master_key_manager.js:64:40)
    at /root/node_modules/noobaa-core/src/server/system_services/system_store.js:427:47
    at Semaphore.surround (/root/node_modules/noobaa-core/src/util/semaphore.js:40:26)
    at processTicksAndRejections (internal/process/task_queues.js:95:5)
Nov-2 20:15:22.095 [/16] [ERROR] core.server.system_services.system_store:: SystemStore: load failed Error: NON_EXISTING_ROOT_KEY
    at MasterKeysManager.load_root_key (/root/node_modules/noobaa-core/src/server/system_services/master_key_manager.js:64:40)
    at /root/node_modules/noobaa-core/src/server/system_services/system_store.js:427:47
    at Semaphore.surround (/root/node_modules/noobaa-core/src/util/semaphore.js:40:26)
    at processTicksAndRejections (internal/process/task_queues.js:95:5)
Nov-2 20:15:22.095 [/16] [ERROR] UPGRADE:: failed to load system store!! Error: NON_EXISTING_ROOT_KEY
    at MasterKeysManager.load_root_key (/root/node_modules/noobaa-core/src/server/system_services/master_key_manager.js:64:40)
    at /root/node_modules/noobaa-core/src/server/system_services/system_store.js:427:47
    at Semaphore.surround (/root/node_modules/noobaa-core/src/util/semaphore.js:40:26)
    at processTicksAndRejections (internal/process/task_queues.js:95:5)
Nov-2 20:15:22.096 [/16] [ERROR] UPGRADE:: failed to init upgrade process!!
upgrade_manager failed with exit code 1
noobaa_init failed with exit code 1. aborting



Version of all relevant components (if applicable):

ocs-operator.v4.7.5


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

MCG is not healthy  

Is there any workaround available to the best of your knowledge?

Unknown

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?



Can this issue reproducible?

Unknown

Can this issue reproduce from the UI?

Unknown

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1.
2.
3.


Actual results:


Expected results:


Additional info:

Comment 2 khover 2021-11-04 19:35:37 UTC
Tried to rebuild noobaa using 

https://access.redhat.com/solutions/5948631

11047* while true; do oc scale deployment noobaa-operator --replicas=0 ; done
11048  c
11049  cd
11050  o get pods | grep noobaa
11051  oc delete deployments.apps noobaa-endpoint
11052  o get deployments
11053  c
11054  o get pods
11055  c
11056  o get pods | grep nooba
11057  o delete pods noobaa-core-0
11058  o delete pods noobaa-db-pg-0
11059  o get pods | grep nooba
11060  o get pods | grep operator
11061  o get pods | grep nooba
11062  oc delete statefulsets.apps noobaa-db noobaa-core
11063  o get statefulsets.apps
11064  oc delete statefulsets.apps noobaa-db-pg-0
11065  oc delete statefulsets.apps noobaa-db-pg
11066  c
11067  o get pvc | grep nooba
11068  o delete pvc db-noobaa-db-pg-0
11069  o get pvc | grep nooba


➜  ~ oc delete bucketclasses.noobaa.io --all
No resources found
➜  ~ o get bucketclasses.noobaa.io
No resources found.
➜  ~ oc delete backingstores.noobaa.io --all
No resources found

➜  ~ oc delete secrets noobaa-admin noobaa-endpoints noobaa-operator noobaa-server
secret "noobaa-admin" deleted
secret "noobaa-endpoints" deleted
secret "noobaa-operator" deleted
secret "noobaa-server" deleted



time="2021-11-04T18:33:26Z" level=info msg="❌ Not Found: Service \"noobaa-db\"\n"
time="2021-11-04T18:33:26Z" level=info msg="ReconcileObject: Done - unchanged Service /api/v1/namespaces/openshift-storage/services/noobaa-mgmt" sys=openshift-storage/noobaa
time="2021-11-04T18:33:26Z" level=info msg="ReconcileObject: Done - unchanged StatefulSet /apis/apps/v1/namespaces/openshift-storage/statefulsets/noobaa-core" sys=openshift-storage/noobaa
time="2021-11-04T18:33:26Z" level=info msg="ReconcileObject: Done - unchanged Route " sys=openshift-storage/noobaa
time="2021-11-04T18:33:26Z" level=info msg="ReconcileObject: Done - unchanged Service /api/v1/namespaces/openshift-storage/services/s3" sys=openshift-storage/noobaa
time="2021-11-04T18:33:26Z" level=info msg="ReconcileObject: Done - unchanged Route " sys=openshift-storage/noobaa
time="2021-11-04T18:33:26Z" level=info msg="Reconciling Backing Store Credentials" sys=openshift-storage/noobaa


➜  ~ o get svc | grep nooba
noobaa-db-pg               ClusterIP      100.81.190.102   <none>          5432/TCP                                                   22d
noobaa-mgmt                LoadBalancer   100.81.232.93    35.196.46.67    80:31537/TCP,443:31445/TCP,8445:30583/TCP,8446:31818/TCP   22d
➜  ~


➜  ~ o get pvc
NAME                          STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
db-noobaa-db-pg-0             Bound     pvc-bfbb5fbd-d7fb-48f0-becc-ac1924824110   50Gi       RWO            ocs-storagecluster-ceph-rbd   20m


apiVersion: v1
kind: Service
metadata:
  annotations:
    service.alpha.openshift.io/serving-cert-secret-name: noobaa-db-serving-cert
    service.alpha.openshift.io/serving-cert-signed-by: openshift-service-serving-signer@1628006524
    service.beta.openshift.io/serving-cert-secret-name: noobaa-db-serving-cert
    service.beta.openshift.io/serving-cert-signed-by: openshift-service-serving-signer@1628006524
  labels:
    app: noobaa
  name: noobaa-db-pg
  namespace: openshift-storage
spec:
  clusterIP: 100.81.190.102
  clusterIPs:
  - 100.81.190.102
  ports:
  - name: postgres
    port: 5432
  selector:
    noobaa-db: postgres

Comment 4 khover 2021-11-08 16:59:43 UTC
@rayalon

Hi Romy,

This is a fresh OCS install and noobaa has never worked.

The customer did confirm vault is not being used.

Ill grab a new must gather and notify when uploaded.

Will the secret be searchable in the must gather or ?

Is there a command that custy will need to be run to see if that secret exists in -n openshift-storage ?

Comment 6 khover 2021-11-08 18:49:50 UTC
@rayalon

Hi Romy,

This is a fresh OCS install and noobaa has never worked -> but if the customer performed rebuilding there was something before,
what was the reason of the rebuilding?


The Reason for rebuild revolves around ocs-operator 0/1 bc of noobaa and the following errors which guided me to our KCS for rebuild.

https://access.redhat.com/solutions/5948631



time="2021-10-29T15:32:26Z" level=error msg="⚠️  RPC: system.read_system() Response Error: Code=UNAUTHORIZED Message=account not found 61670facd517a0002e28035f"
time="2021-10-29T15:32:26Z" level=info msg="SetPhase: temporary error during phase \"Connecting\"" backingstore=openshift-storage/noobaa-default-backing-store


./noobaa bucket status noobaabucket8udyk
INFO[0001] ✅ Exists: NooBaa "noobaa"
INFO[0002] ✅ Exists: Service "noobaa-mgmt"
INFO[0002] ✅ Exists: Secret "noobaa-operator"
INFO[0002] ✅ Exists: Secret "noobaa-admin"
INFO[0002] ✈️  RPC: bucket.read_bucket() Request: {Name:noobaabucket8udyk}
WARN[0002] RPC: GetConnection creating connection to wss://localhost:54834/rpc/ 0xc0001bca50
INFO[0002] RPC: Connecting websocket (0xc0001bca50) &{RPC:0xc000185900 Address:wss://localhost:54834/rpc/ State:init WS:<nil> PendingRequests:map[] NextRequestID:0 Lock:{state:1 sema:0} ReconnectDelay:0s}
INFO[0003] RPC: Connected websocket (0xc0001bca50) &{RPC:0xc000185900 Address:wss://localhost:54834/rpc/ State:init WS:<nil> PendingRequests:map[] NextRequestID:0 Lock:{state:1 sema:0} ReconnectDelay:0s}
ERRO[0003] ⚠️  RPC: bucket.read_bucket() Response Error: Code=UNAUTHORIZED Message=account not found 61670facd517a0002e28035f
FATA[0003] account not found 61670facd517a0002e28035f

./noobaa bucket list
INFO[0001] ✅ Exists: NooBaa "noobaa"
INFO[0002] ✅ Exists: Service "noobaa-mgmt"
INFO[0002] ✅ Exists: Secret "noobaa-operator"
INFO[0002] ✅ Exists: Secret "noobaa-admin"
INFO[0002] ✈️  RPC: bucket.list_buckets() Request: <nil>
WARN[0002] RPC: GetConnection creating connection to wss://localhost:54930/rpc/ 0xc000c90d70
INFO[0002] RPC: Connecting websocket (0xc000c90d70) &{RPC:0xc0001bd090 Address:wss://localhost:54930/rpc/ State:init WS:<nil> PendingRequests:map[] NextRequestID:0 Lock:{state:1 sema:0} ReconnectDelay:0s}
INFO[0003] RPC: Connected websocket (0xc000c90d70) &{RPC:0xc0001bd090 Address:wss://localhost:54930/rpc/ State:init WS:<nil> PendingRequests:map[] NextRequestID:0 Lock:{state:1 sema:0} ReconnectDelay:0s}
ERRO[0003] ⚠️  RPC: bucket.list_buckets() Response Error: Code=UNAUTHORIZED Message=account not found 61670facd517a0002e28035f
2021-10-29 13:38:27.543284 I | account not found 61670facd517a0002e28035f

Comment 8 khover 2021-11-09 13:43:10 UTC
@rayalon

Hi Romy,

Must gather failed but customer did state the following.

Let me know if thew need to re-run must gather. 


We don't use vault with noobaa. The noobaa-root-master-key secret exists in the openshift-storage namespace

Comment 10 khover 2021-11-09 19:29:57 UTC
@rayalon

Hi Romy,


The customer has provided the requested outputs.

The information is at /cases/03061246


drwxrwxrwx+ 3 yank yank   33 Nov  9 19:11 0050-must-gather.tar.gz



Yes, there is a secret in the noobaa-root-master-key secret.  

data:
  cipher_key_b64: ******

Comment 14 khover 2021-11-11 14:53:52 UTC
@rayalon

Hi Romy,

So if I understand correctly, workaround is remove 

noobaa-root-master-key secret.  

data:
  cipher_key_b64: ******

Then rebuild noobaa ?

Comment 19 khover 2021-11-12 15:29:44 UTC
Thank you Romy and Parikshith for your assistance on this case.

We can close

Comment 21 khover 2021-11-15 15:13:49 UTC
Thank you Romy

rebuild  doc has been updated

We can close

Comment 22 Nimrod Becker 2021-11-16 07:53:39 UTC
Closing per last comment