Bug 2133547

Summary: [GSS] Extermal OCS - unable to create user rook-ceph-object-user-ocs-external-storagecluster-cephobjectstore-noobaa-ceph-objectstore-user
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: kelwhite
Component: Multi-Cloud Object GatewayAssignee: Nimrod Becker <nbecker>
Status: CLOSED INSUFFICIENT_DATA QA Contact: krishnaram Karthick <kramdoss>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.8CC: bhull, bkunal, brgardne, csharpe, etamir, hnallurv, jthottan, lema, lsantann, madam, milverma, mparida, nbecker, nigoyal, ocs-bugs, odf-bz-bot, paarora, thottanjiffin, tnielsen, usrivast
Target Milestone: ---Flags: brgardne: needinfo-
brgardne: needinfo-
brgardne: needinfo-
brgardne: needinfo-
paarora: needinfo-
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-19 07:24:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 11 Nimrod Becker 2022-11-15 08:05:09 UTC
Per last comment, moving to Rook

Comment 48 Parth Arora 2022-12-14 13:03:11 UTC
After adding the secret I see that the `CephObjectStore` is created but the `CephObjectStoreUser` is still Reconciled failed,

In the rook logs, I see : `ObjectStore resource not ready in namespace "openshift-storage", retrying in "10s". failed to detect if object store "" is initialized: CephObjectStore "" could not be found`

I think it is because the the `cephobjectStoreUser` has store name is spec,
`spec:
  displayName: my display name`

PLease add it like 
oc edit cephobjectStoreUser
`spec:
  store: ocs-external-storagecluster-cephobjectstore
  displayName: my display name`

Adding @jiffin for keeping me honest

Comment 51 Parth Arora 2022-12-14 15:26:27 UTC
Making the comment#48 public so might help further resolving

Comment 57 Blaine Gardner 2022-12-15 20:36:41 UTC
The issue appears to be with the configuration of the noobaa-default-backing-store. It is using the old CephObjectStoreUser secret name (rook-ceph-object-user--noobaa-ceph-objectstore-user) from when the `store` wasn't specified. It should be using the new value: rook-ceph-object-user-ocs-external-storagecluster-cephobjectstore-noobaa-ceph-objectstore-user. 

I see several log items like this in the noobaa operator logs:
  2022-12-14T14:43:25.036263434Z time="2022-12-14T14:43:25Z" level=info msg="✅ Exists: BackingStore \"noobaa-default-backing-store\"\n"
  2022-12-14T14:43:25.036263434Z time="2022-12-14T14:43:25Z" level=info msg="Backing store noobaa-default-backing-store already exists. skipping ReconcileCloudCredentials" func=ReconcileDefaultBackingStore sys=openshift-storage/noobaa

As a workaround for the customer, try deleting the noobaa-default-backing-store. The Noobaa operator should create it again once it realizes the backing store no longer exists. If it doesn't, delete the noobaa operator pod, and it should be fixed soon.

This does expose what I would consider a bug in noobaa that it still creates the default backing store even if the CephObjectStoreUser isn't in Ready state. I will move this BZ to the noobaa team. Since a workaround exists (delete the default backing store), this doesn't seem like a blocker to me.

Comment 59 Blaine Gardner 2022-12-15 22:43:38 UTC
Noobaa is having trouble with the default bucket class, but I can't quite tell why. It is at least looking at the correct secret now, so the issue I was seeing earlier is fixed. Below you can see thqt the secret exists, but noobaa-default-backing-store-noobaa-noobaa isn't found.

time="2022-12-15T21:17:26Z" level=info msg="✅ Exists: Secret \"rook-ceph-object-user-ocs-external-storagecluster-cephobjectstore-noobaa-ceph-objectstore-user\"\n"
time="2022-12-15T21:17:26Z" level=info msg="❌ Not Found:  \"noobaa-default-backing-store-noobaa-noobaa\"\n"


Collecting another ODF/OCS must-gather will help us figure out if the issue is still with Noobaa or if it's a further issue with Rook. I also copied a relevant chunk of Noobaa logs here if it'll help the Noobaa team debug more quickly.



time="2022-12-15T21:17:24Z" level=info msg="Start BucketClass Reconcile..." bucketclass=openshift-storage/noobaa-default-bucket-class
time="2022-12-15T21:17:24Z" level=info msg="✅ Exists: NooBaa \"noobaa\"\n"
time="2022-12-15T21:17:24Z" level=info msg="✅ Exists: BucketClass \"noobaa-default-bucket-class\"\n"
time="2022-12-15T21:17:24Z" level=info msg="SetPhase: Verifying" bucketclass=openshift-storage/noobaa-default-bucket-class
time="2022-12-15T21:17:24Z" level=info msg="✅ Exists: BackingStore \"noobaa-default-backing-store\"\n"
time="2022-12-15T21:17:24Z" level=info msg="SetPhase: temporary error during phase \"Verifying\"" bucketclass=openshift-storage/noobaa-default-bucket-class
time="2022-12-15T21:17:24Z" level=warning msg="⏳ Temporary Error: NooBaa BackingStore \"noobaa-default-backing-store\" is not yet ready" bucketclass=openshift-storage/noobaa-default-bucket-class
time="2022-12-15T21:17:24Z" level=info msg="UpdateStatus: Done" bucketclass=openshift-storage/noobaa-default-bucket-class
time="2022-12-15T21:17:26Z" level=info msg="Start BackingStore Reconcile ..." backingstore=openshift-storage/noobaa-default-backing-store
time="2022-12-15T21:17:26Z" level=info msg="✅ Exists: NooBaa \"noobaa\"\n"
time="2022-12-15T21:17:26Z" level=info msg="✅ Exists: BackingStore \"noobaa-default-backing-store\"\n"
time="2022-12-15T21:17:26Z" level=info msg="✅ Exists: Secret \"rook-ceph-object-user-ocs-external-storagecluster-cephobjectstore-noobaa-ceph-objectstore-user\"\n"
time="2022-12-15T21:17:26Z" level=info msg="❌ Not Found:  \"noobaa-default-backing-store-noobaa-noobaa\"\n"
time="2022-12-15T21:17:26Z" level=info msg="SetPhase: Verifying" backingstore=openshift-storage/noobaa-default-backing-store
time="2022-12-15T21:17:26Z" level=info msg="SetPhase: Connecting" backingstore=openshift-storage/noobaa-default-backing-store
time="2022-12-15T21:17:26Z" level=info msg="✅ Exists: NooBaa \"noobaa\"\n"
time="2022-12-15T21:17:26Z" level=info msg="✅ Exists: Service \"noobaa-mgmt\"\n"
time="2022-12-15T21:17:26Z" level=info msg="✅ Exists: Secret \"noobaa-operator\"\n"
time="2022-12-15T21:17:26Z" level=info msg="✅ Exists: Secret \"noobaa-admin\"\n"
time="2022-12-15T21:17:26Z" level=info msg="✈️  RPC: system.read_system() Request: <nil>"
time="2022-12-15T21:17:26Z" level=info msg="✅ RPC: system.read_system() Response OK: took 24.9ms"
time="2022-12-15T21:17:26Z" level=warning msg="using existing pool but connection mismatch &{Name:noobaa-default-backing-store EndpointType:S3_COMPATIBLE Endpoint:http://rook-ceph-rgw-ocs-external-storagecluster-cephobjectstore.openshift-storage.svc:8080 Identity:NCMG52UUPB7ZM4TN66M6 Secret:Gg3Dt11M2b0rw124WIKsfmrSyhB6x1LqiXD8gx1X AuthMethod:AWS_V4} pool &{Name:noobaa-default-backing-store ResourceType:CLOUD Mode:OPTIMAL Region: PoolNodeType:BLOCK_STORE_S3 Undeletable:IN_USE CloudInfo:0xc000e3a380 MongoInfo:<nil> HostInfo:<nil> Hosts:<nil>} &{EndpointType:S3_COMPATIBLE Endpoint:http://10.20.55.72:8080 TargetBucket:nb.1642022501736.apps.ocp4.kohlerco.com Identity: NodeName: CreatedBy:operator Host: AuthMethod:AWS_V4}" backingstore=openshift-storage/noobaa-default-backing-store
time="2022-12-15T21:17:26Z" level=info msg="SetPhase: Creating" backingstore=openshift-storage/noobaa-default-backing-store
time="2022-12-15T21:17:26Z" level=info msg="✈️  RPC: account.check_external_connection() Request: {Name:noobaa-default-backing-store EndpointType:S3_COMPATIBLE Endpoint:http://rook-ceph-rgw-ocs-external-storagecluster-cephobjectstore.openshift-storage.svc:8080 Identity:NCMG52UUPB7ZM4TN66M6 Secret:Gg3Dt11M2b0rw124WIKsfmrSyhB6x1LqiXD8gx1X AuthMethod:AWS_V4}"
time="2022-12-15T21:17:26Z" level=error msg="⚠️  RPC: account.check_external_connection() Response Error: Code=CONNECTION_ALREADY_EXIST Message=Connection name already exists: noobaa-default-backing-store"
time="2022-12-15T21:17:26Z" level=info msg="SetPhase: temporary error during phase \"Creating\"" backingstore=openshift-storage/noobaa-default-backing-store
time="2022-12-15T21:17:26Z" level=warning msg="⏳ Temporary Error: Connection name already exists: noobaa-default-backing-store" backingstore=openshift-storage/noobaa-default-backing-store
time="2022-12-15T21:17:26Z" level=info msg="UpdateStatus: Done" backingstore=openshift-storage/noobaa-default-backing-store
time="2022-12-15T21:17:26Z" level=info msg="RPC: Ping (0xc00180a1e0) &{RPC:0xc00009fc20 Address:wss://noobaa-mgmt.openshift-storage.svc.cluster.local:443/rpc/ State:connected WS:0xc000368e00 PendingRequests:map[] NextRequestID:499 Lock:{state:0 sema:0} ReconnectDelay:0s cancelPings:0x63b440}"

Comment 61 Blaine Gardner 2022-12-16 16:33:17 UTC
I took a look into the must-gather, and everything seems to be working with Rook. Something is not set up correctly with Noobaa, and I'm not sure what else to suggest to try to get Noobaa into a healthy state. Unfortunately, I don't think I'd be able to help on a troubleshooting session. I was hoping that a Noobaa developer might have been assigned to and looked at this BZ already given that most operate in Israel Standard Time.

You can try increasing the Noobaa operator log verbosity level and checking those logs again. That may suggest if there is an underlying issue that could possibly be fixed. That is the best next step I can suggest.

Comment 63 kelwhite 2022-12-16 17:57:07 UTC
How do you increase noobaa operator logging verbosity? We don't have this documented anywhere from what I can find.

Comment 65 Blaine Gardner 2022-12-16 18:57:26 UTC
I'm sorry Kelson. I have never worked with Noobaa. Please direct Noobaa-related needinfos to the assignee.