.Multicloud Object Gateway instance fails to finish initialization
Due to a race in timing between the pod code run and OpenShift loading the Certificate Authority (CA) bundle into the pod, the pod is unable to communicate with the cloud storage service. As a result, default backing store cannot be created.
Workaround: Restart the Multicloud Object Gateway (MCG) operator pod:
----
$ oc delete pod noobaa-operator-<ID
----
With the workaround the backing store is reconciled and works.
Description of problem (please be detailed as possible and provide log
snippests):
platform: AZURE IPI 3AZ RHCOS 3M 3W Cluster
storagecluster is in Progressing state due to Noobaa instance to finish initialization
Version of all relevant components (if applicable):
ocs-registry:4.15.0-157
Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, not able to install ODF
Is there any workaround available to the best of your knowledge?
No
Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1
Can this issue reproducible?
1/1
Can this issue reproduce from the UI?
Not tried
If this is a regression, please provide more details to justify this:
Yes
Steps to Reproduce:
1. install ODF using ocs-ci
2. check storagecluster status
3.
Actual results:
$ oc get storagecluster ocs-storagecluster
NAME AGE PHASE EXTERNAL CREATED AT VERSION
ocs-storagecluster 34m Progressing 2024-03-07T12:19:53Z 4.15.0
status:
conditions:
- lastHeartbeatTime: "2024-03-07T12:19:53Z"
lastTransitionTime: "2024-03-07T12:19:53Z"
message: Version check successful
reason: VersionMatched
status: "False"
type: VersionMismatch
- lastHeartbeatTime: "2024-03-07T12:57:05Z"
lastTransitionTime: "2024-03-07T12:23:42Z"
message: Reconcile completed successfully
reason: ReconcileCompleted
status: "True"
type: ReconcileComplete
- lastHeartbeatTime: "2024-03-07T12:19:54Z"
lastTransitionTime: "2024-03-07T12:19:54Z"
message: Initializing StorageCluster
reason: Init
status: "False"
type: Available
- lastHeartbeatTime: "2024-03-07T12:57:05Z"
lastTransitionTime: "2024-03-07T12:19:54Z"
message: Waiting on Nooba instance to finish initialization
reason: NoobaaInitializing
status: "True"
type: Progressing
- lastHeartbeatTime: "2024-03-07T12:19:54Z"
lastTransitionTime: "2024-03-07T12:19:54Z"
message: Initializing StorageCluster
reason: Init
status: "False"
type: Degraded
- lastHeartbeatTime: "2024-03-07T12:25:23Z"
lastTransitionTime: "2024-03-07T12:23:39Z"
message: 'CephCluster is creating: Processing OSD 2 on PVC "ocs-deviceset-0-data-08n6b4"'
reason: ClusterStateCreating
status: "False"
type: Upgradeable
currentMonCount: 3
failureDomain: zone
failureDomainKey: topology.kubernetes.io/zone
failureDomainValues:
- eastus-1
- eastus-2
- eastus-3
Expected results:
storagecluster should be in Ready state
Additional info:
> noobaa operator log
time="2024-03-07T13:17:16Z" level=info msg="SetPhase: temporary error during phase \"Configuring\"" sys=openshift-storage/noobaa
time="2024-03-07T13:17:16Z" level=warning msg="⏳ Temporary Error: failed to start creating storage account: azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://ma
nagement.azure.com/subscriptions/9bef6367-8ff5-4f08-84c9-3da195c53762/resourceGroups/j-274zi3c33-uo-cd5pt-rg/providers/Microsoft.Storage/storageAccounts/noobaaaccountqxbx9?api-version=2019-06-01: StatusCo
de=0 -- Original Error: adal: Failed to execute the refresh request. Error = 'Post \"https://login.microsoftonline.com/9cf78105-e3e9-4321-b88d-b001b66c762b/oauth2/token?api-version=1.0\": tls: failed to v
erify certificate: x509: certificate signed by unknown authority'" sys=openshift-storage/noobaa
job: https://url.corp.redhat.com/9bb3eb4
must-gather: https://url.corp.redhat.com/af4afd2
latest noobaa operator log: https://url.corp.redhat.com/83c60e2
Description of problem (please be detailed as possible and provide log snippests): platform: AZURE IPI 3AZ RHCOS 3M 3W Cluster storagecluster is in Progressing state due to Noobaa instance to finish initialization Version of all relevant components (if applicable): ocs-registry:4.15.0-157 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes, not able to install ODF Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? 1/1 Can this issue reproduce from the UI? Not tried If this is a regression, please provide more details to justify this: Yes Steps to Reproduce: 1. install ODF using ocs-ci 2. check storagecluster status 3. Actual results: $ oc get storagecluster ocs-storagecluster NAME AGE PHASE EXTERNAL CREATED AT VERSION ocs-storagecluster 34m Progressing 2024-03-07T12:19:53Z 4.15.0 status: conditions: - lastHeartbeatTime: "2024-03-07T12:19:53Z" lastTransitionTime: "2024-03-07T12:19:53Z" message: Version check successful reason: VersionMatched status: "False" type: VersionMismatch - lastHeartbeatTime: "2024-03-07T12:57:05Z" lastTransitionTime: "2024-03-07T12:23:42Z" message: Reconcile completed successfully reason: ReconcileCompleted status: "True" type: ReconcileComplete - lastHeartbeatTime: "2024-03-07T12:19:54Z" lastTransitionTime: "2024-03-07T12:19:54Z" message: Initializing StorageCluster reason: Init status: "False" type: Available - lastHeartbeatTime: "2024-03-07T12:57:05Z" lastTransitionTime: "2024-03-07T12:19:54Z" message: Waiting on Nooba instance to finish initialization reason: NoobaaInitializing status: "True" type: Progressing - lastHeartbeatTime: "2024-03-07T12:19:54Z" lastTransitionTime: "2024-03-07T12:19:54Z" message: Initializing StorageCluster reason: Init status: "False" type: Degraded - lastHeartbeatTime: "2024-03-07T12:25:23Z" lastTransitionTime: "2024-03-07T12:23:39Z" message: 'CephCluster is creating: Processing OSD 2 on PVC "ocs-deviceset-0-data-08n6b4"' reason: ClusterStateCreating status: "False" type: Upgradeable currentMonCount: 3 failureDomain: zone failureDomainKey: topology.kubernetes.io/zone failureDomainValues: - eastus-1 - eastus-2 - eastus-3 Expected results: storagecluster should be in Ready state Additional info: > noobaa operator log time="2024-03-07T13:17:16Z" level=info msg="SetPhase: temporary error during phase \"Configuring\"" sys=openshift-storage/noobaa time="2024-03-07T13:17:16Z" level=warning msg="⏳ Temporary Error: failed to start creating storage account: azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://ma nagement.azure.com/subscriptions/9bef6367-8ff5-4f08-84c9-3da195c53762/resourceGroups/j-274zi3c33-uo-cd5pt-rg/providers/Microsoft.Storage/storageAccounts/noobaaaccountqxbx9?api-version=2019-06-01: StatusCo de=0 -- Original Error: adal: Failed to execute the refresh request. Error = 'Post \"https://login.microsoftonline.com/9cf78105-e3e9-4321-b88d-b001b66c762b/oauth2/token?api-version=1.0\": tls: failed to v erify certificate: x509: certificate signed by unknown authority'" sys=openshift-storage/noobaa job: https://url.corp.redhat.com/9bb3eb4 must-gather: https://url.corp.redhat.com/af4afd2 latest noobaa operator log: https://url.corp.redhat.com/83c60e2