Description of problem (please be detailed as possible and provide log snippests): After rebuild of noobaa following this RH doc. https://access.redhat.com/solutions/5948631 Noobaa endpoint pod never came back up and this error was logged in the noobaa-core pod logs Nov-2 20:15:22.094 [/16] [ERROR] core.server.system_services.system_store:: SystemStore: load failed Error: NON_EXISTING_ROOT_KEY at MasterKeysManager.load_root_key (/root/node_modules/noobaa-core/src/server/system_services/master_key_manager.js:64:40) at /root/node_modules/noobaa-core/src/server/system_services/system_store.js:427:47 at Semaphore.surround (/root/node_modules/noobaa-core/src/util/semaphore.js:40:26) at processTicksAndRejections (internal/process/task_queues.js:95:5) Nov-2 20:15:22.095 [/16] [ERROR] core.server.system_services.system_store:: SystemStore: load failed Error: NON_EXISTING_ROOT_KEY at MasterKeysManager.load_root_key (/root/node_modules/noobaa-core/src/server/system_services/master_key_manager.js:64:40) at /root/node_modules/noobaa-core/src/server/system_services/system_store.js:427:47 at Semaphore.surround (/root/node_modules/noobaa-core/src/util/semaphore.js:40:26) at processTicksAndRejections (internal/process/task_queues.js:95:5) Nov-2 20:15:22.095 [/16] [ERROR] UPGRADE:: failed to load system store!! Error: NON_EXISTING_ROOT_KEY at MasterKeysManager.load_root_key (/root/node_modules/noobaa-core/src/server/system_services/master_key_manager.js:64:40) at /root/node_modules/noobaa-core/src/server/system_services/system_store.js:427:47 at Semaphore.surround (/root/node_modules/noobaa-core/src/util/semaphore.js:40:26) at processTicksAndRejections (internal/process/task_queues.js:95:5) Nov-2 20:15:22.096 [/16] [ERROR] UPGRADE:: failed to init upgrade process!! upgrade_manager failed with exit code 1 noobaa_init failed with exit code 1. aborting Version of all relevant components (if applicable): ocs-operator.v4.7.5 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? MCG is not healthy Is there any workaround available to the best of your knowledge? Unknown Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Unknown Can this issue reproduce from the UI? Unknown If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Tried to rebuild noobaa using https://access.redhat.com/solutions/5948631 11047* while true; do oc scale deployment noobaa-operator --replicas=0 ; done 11048 c 11049 cd 11050 o get pods | grep noobaa 11051 oc delete deployments.apps noobaa-endpoint 11052 o get deployments 11053 c 11054 o get pods 11055 c 11056 o get pods | grep nooba 11057 o delete pods noobaa-core-0 11058 o delete pods noobaa-db-pg-0 11059 o get pods | grep nooba 11060 o get pods | grep operator 11061 o get pods | grep nooba 11062 oc delete statefulsets.apps noobaa-db noobaa-core 11063 o get statefulsets.apps 11064 oc delete statefulsets.apps noobaa-db-pg-0 11065 oc delete statefulsets.apps noobaa-db-pg 11066 c 11067 o get pvc | grep nooba 11068 o delete pvc db-noobaa-db-pg-0 11069 o get pvc | grep nooba ➜ ~ oc delete bucketclasses.noobaa.io --all No resources found ➜ ~ o get bucketclasses.noobaa.io No resources found. ➜ ~ oc delete backingstores.noobaa.io --all No resources found ➜ ~ oc delete secrets noobaa-admin noobaa-endpoints noobaa-operator noobaa-server secret "noobaa-admin" deleted secret "noobaa-endpoints" deleted secret "noobaa-operator" deleted secret "noobaa-server" deleted time="2021-11-04T18:33:26Z" level=info msg="❌ Not Found: Service \"noobaa-db\"\n" time="2021-11-04T18:33:26Z" level=info msg="ReconcileObject: Done - unchanged Service /api/v1/namespaces/openshift-storage/services/noobaa-mgmt" sys=openshift-storage/noobaa time="2021-11-04T18:33:26Z" level=info msg="ReconcileObject: Done - unchanged StatefulSet /apis/apps/v1/namespaces/openshift-storage/statefulsets/noobaa-core" sys=openshift-storage/noobaa time="2021-11-04T18:33:26Z" level=info msg="ReconcileObject: Done - unchanged Route " sys=openshift-storage/noobaa time="2021-11-04T18:33:26Z" level=info msg="ReconcileObject: Done - unchanged Service /api/v1/namespaces/openshift-storage/services/s3" sys=openshift-storage/noobaa time="2021-11-04T18:33:26Z" level=info msg="ReconcileObject: Done - unchanged Route " sys=openshift-storage/noobaa time="2021-11-04T18:33:26Z" level=info msg="Reconciling Backing Store Credentials" sys=openshift-storage/noobaa ➜ ~ o get svc | grep nooba noobaa-db-pg ClusterIP 100.81.190.102 <none> 5432/TCP 22d noobaa-mgmt LoadBalancer 100.81.232.93 35.196.46.67 80:31537/TCP,443:31445/TCP,8445:30583/TCP,8446:31818/TCP 22d ➜ ~ ➜ ~ o get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE db-noobaa-db-pg-0 Bound pvc-bfbb5fbd-d7fb-48f0-becc-ac1924824110 50Gi RWO ocs-storagecluster-ceph-rbd 20m apiVersion: v1 kind: Service metadata: annotations: service.alpha.openshift.io/serving-cert-secret-name: noobaa-db-serving-cert service.alpha.openshift.io/serving-cert-signed-by: openshift-service-serving-signer@1628006524 service.beta.openshift.io/serving-cert-secret-name: noobaa-db-serving-cert service.beta.openshift.io/serving-cert-signed-by: openshift-service-serving-signer@1628006524 labels: app: noobaa name: noobaa-db-pg namespace: openshift-storage spec: clusterIP: 100.81.190.102 clusterIPs: - 100.81.190.102 ports: - name: postgres port: 5432 selector: noobaa-db: postgres
@rayalon Hi Romy, This is a fresh OCS install and noobaa has never worked. The customer did confirm vault is not being used. Ill grab a new must gather and notify when uploaded. Will the secret be searchable in the must gather or ? Is there a command that custy will need to be run to see if that secret exists in -n openshift-storage ?
@rayalon Hi Romy, This is a fresh OCS install and noobaa has never worked -> but if the customer performed rebuilding there was something before, what was the reason of the rebuilding? The Reason for rebuild revolves around ocs-operator 0/1 bc of noobaa and the following errors which guided me to our KCS for rebuild. https://access.redhat.com/solutions/5948631 time="2021-10-29T15:32:26Z" level=error msg="⚠️ RPC: system.read_system() Response Error: Code=UNAUTHORIZED Message=account not found 61670facd517a0002e28035f" time="2021-10-29T15:32:26Z" level=info msg="SetPhase: temporary error during phase \"Connecting\"" backingstore=openshift-storage/noobaa-default-backing-store ./noobaa bucket status noobaabucket8udyk INFO[0001] ✅ Exists: NooBaa "noobaa" INFO[0002] ✅ Exists: Service "noobaa-mgmt" INFO[0002] ✅ Exists: Secret "noobaa-operator" INFO[0002] ✅ Exists: Secret "noobaa-admin" INFO[0002] ✈️ RPC: bucket.read_bucket() Request: {Name:noobaabucket8udyk} WARN[0002] RPC: GetConnection creating connection to wss://localhost:54834/rpc/ 0xc0001bca50 INFO[0002] RPC: Connecting websocket (0xc0001bca50) &{RPC:0xc000185900 Address:wss://localhost:54834/rpc/ State:init WS:<nil> PendingRequests:map[] NextRequestID:0 Lock:{state:1 sema:0} ReconnectDelay:0s} INFO[0003] RPC: Connected websocket (0xc0001bca50) &{RPC:0xc000185900 Address:wss://localhost:54834/rpc/ State:init WS:<nil> PendingRequests:map[] NextRequestID:0 Lock:{state:1 sema:0} ReconnectDelay:0s} ERRO[0003] ⚠️ RPC: bucket.read_bucket() Response Error: Code=UNAUTHORIZED Message=account not found 61670facd517a0002e28035f FATA[0003] account not found 61670facd517a0002e28035f ./noobaa bucket list INFO[0001] ✅ Exists: NooBaa "noobaa" INFO[0002] ✅ Exists: Service "noobaa-mgmt" INFO[0002] ✅ Exists: Secret "noobaa-operator" INFO[0002] ✅ Exists: Secret "noobaa-admin" INFO[0002] ✈️ RPC: bucket.list_buckets() Request: <nil> WARN[0002] RPC: GetConnection creating connection to wss://localhost:54930/rpc/ 0xc000c90d70 INFO[0002] RPC: Connecting websocket (0xc000c90d70) &{RPC:0xc0001bd090 Address:wss://localhost:54930/rpc/ State:init WS:<nil> PendingRequests:map[] NextRequestID:0 Lock:{state:1 sema:0} ReconnectDelay:0s} INFO[0003] RPC: Connected websocket (0xc000c90d70) &{RPC:0xc0001bd090 Address:wss://localhost:54930/rpc/ State:init WS:<nil> PendingRequests:map[] NextRequestID:0 Lock:{state:1 sema:0} ReconnectDelay:0s} ERRO[0003] ⚠️ RPC: bucket.list_buckets() Response Error: Code=UNAUTHORIZED Message=account not found 61670facd517a0002e28035f 2021-10-29 13:38:27.543284 I | account not found 61670facd517a0002e28035f
@rayalon Hi Romy, Must gather failed but customer did state the following. Let me know if thew need to re-run must gather. We don't use vault with noobaa. The noobaa-root-master-key secret exists in the openshift-storage namespace
@rayalon Hi Romy, The customer has provided the requested outputs. The information is at /cases/03061246 drwxrwxrwx+ 3 yank yank 33 Nov 9 19:11 0050-must-gather.tar.gz Yes, there is a secret in the noobaa-root-master-key secret. data: cipher_key_b64: ******
@rayalon Hi Romy, So if I understand correctly, workaround is remove noobaa-root-master-key secret. data: cipher_key_b64: ****** Then rebuild noobaa ?
Thank you Romy and Parikshith for your assistance on this case. We can close
Thank you Romy rebuild doc has been updated We can close
Closing per last comment