Bug 2275049 - Nooba service stuck in 'NoobaaInitializing' state due to missing Azure credentials during ODF deployment with cluster-wide encryption using Azure KMS
Summary: Nooba service stuck in 'NoobaaInitializing' state due to missing Azure creden...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: Multi-Cloud Object Gateway
Version: 4.16
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ODF 4.16.0
Assignee: Vinayak Hariharmath
QA Contact: Tiffany Nguyen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-04-15 07:00 UTC by Parag Kamble
Modified: 2024-07-17 13:19 UTC (History)
4 users (show)

Fixed In Version: 4.16.0-78
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-07-17 13:19:24 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github noobaa noobaa-operator pull 1343 0 None Merged Fixing AzureClientCertPath in kms for azure 2024-04-16 07:49:15 UTC
Red Hat Product Errata RHSA-2024:4591 0 None None None 2024-07-17 13:19:28 UTC

Description Parag Kamble 2024-04-15 07:00:00 UTC
Description of problem (please be detailed as possible and provide log
snippests):

When deploying ODF with cluster-wide encryption enabled using Azure KMS, the Nooba service gets stuck in the 'NoobaaInitializing' state indefinitely. This issue occurs because the required Azure credentials (AZURE_SECRET_ID or AZURE_CLIENT_CERT_PATH) are not set. This prevents the Nooba service from initializing properly, leading to deployment failure.


Version of all relevant components (if applicable): 4.16


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)? Yes


Is there any workaround available to the best of your knowledge? No


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible? Yes


Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to reproduce:
1. Start deployment of ODF cluster on Azure cloud platform with version 4.16
2. During storagecluster setup, configure the clusterwide encryption with Azure KMS service
3. Configure the required parameter for the Azure KMS connection.
4. Complete the steps and wait for the storagecluster to reach the 'Ready' state.


Actual results:

1. storage cluster is stuck in a 'Progressing' state.
2. Nooba service stuck in 'NoobaaInitializing' state.

Expected results:
1. Storage cluster should be in the 'Ready' state.

Additional info:

Cluster version info
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> ocs get csv
NAME                                        DISPLAY                            VERSION            REPLACES   PHASE
mcg-operator.v4.16.0-73.stable              NooBaa Operator                    4.16.0-73.stable              Succeeded
ocs-client-operator.v4.16.0-73.stable       OpenShift Data Foundation Client   4.16.0-73.stable              Succeeded
ocs-operator.v4.16.0-73.stable              OpenShift Container Storage        4.16.0-73.stable              Succeeded
odf-csi-addons-operator.v4.16.0-73.stable   CSI Addons                         4.16.0-73.stable              Succeeded
odf-operator.v4.16.0-73.stable              OpenShift Data Foundation          4.16.0-73.stable              Succeeded
odf-prometheus-operator.v4.16.0-73.stable   Prometheus Operator                4.16.0-73.stable              Succeeded
rook-ceph-operator.v4.16.0-73.stable        Rook-Ceph                          4.16.0-73.stable              Succeeded

Storage Cluster state
-=-=-=-=-=-=-=-=-=-=-=-=-=-
> sc
NAME                 AGE   PHASE         EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   3d    Progressing              2024-04-12T06:28:37Z   4.16.0


Storagecluster details
-=-=-=-=-=-=-=-=-=-=-=--=-=

> ocs describe storagecluster ocs-storagecluster
Name:         ocs-storagecluster
Namespace:    openshift-storage
Labels:       <none>
Annotations:  uninstall.ocs.openshift.io/cleanup-policy: delete
              uninstall.ocs.openshift.io/mode: graceful
API Version:  ocs.openshift.io/v1
Kind:         StorageCluster
Metadata:
  Creation Timestamp:  2024-04-12T06:28:37Z
  Finalizers:
    storagecluster.ocs.openshift.io
  Generation:  2
  Owner References:
    API Version:     odf.openshift.io/v1alpha1
    Kind:            StorageSystem
    Name:            ocs-storagecluster-storagesystem
    UID:             57695244-3004-4620-b54a-c96b24fb9a2a
  Resource Version:  4496657
  UID:               f0a09c5a-6893-4f21-ba5c-a39f7b14b840
Spec:
  Arbiter:
  Encryption:
    Cluster Wide:  true
    Enable:        true
    Key Rotation:
      Schedule:  @weekly
    Kms:
      Enable:  true
  External Storage:
  Managed Resources:
    Ceph Block Pools:
    Ceph Cluster:
    Ceph Config:
    Ceph Dashboard:
    Ceph Filesystems:
      Data Pool Spec:
        Application:
        Erasure Coded:
          Coding Chunks:  0
          Data Chunks:    0
        Mirroring:
        Quotas:
        Replicated:
          Size:  0
        Status Check:
          Mirror:
    Ceph Non Resilient Pools:
      Count:  1
      Resources:
      Volume Claim Template:
        Metadata:
        Spec:
          Resources:
        Status:
    Ceph Object Store Users:
    Ceph Object Stores:
    Ceph RBD Mirror:
      Daemon Count:  1
    Ceph Toolbox:
  Mirroring:
  Network:
    Connections:
      Encryption:
    Multi Cluster Service:
  Node Topologies:
  Resource Profile:  balanced
  Storage Device Sets:
    Config:
    Count:  1
    Data PVC Template:
      Metadata:
      Spec:
        Access Modes:
          ReadWriteOnce
        Resources:
          Requests:
            Storage:         512Gi
        Storage Class Name:  managed-csi
        Volume Mode:         Block
      Status:
    Name:  ocs-deviceset-managed-csi
    Placement:
    Portable:  true
    Prepare Placement:
    Replica:  3
    Resources:
Status:
  Conditions:
    Last Heartbeat Time:   2024-04-12T06:28:37Z
    Last Transition Time:  2024-04-12T06:28:37Z
    Message:               Version check successful
    Reason:                VersionMatched
    Status:                False
    Type:                  VersionMismatch
    Last Heartbeat Time:   2024-04-15T06:54:42Z
    Last Transition Time:  2024-04-14T21:50:07Z
    Message:               Reconcile completed successfully
    Reason:                ReconcileCompleted
    Status:                True
    Type:                  ReconcileComplete
    Last Heartbeat Time:   2024-04-12T06:28:37Z
    Last Transition Time:  2024-04-12T06:28:37Z
    Message:               Initializing StorageCluster
    Reason:                Init
    Status:                False
    Type:                  Available
    Last Heartbeat Time:   2024-04-15T06:54:42Z
    Last Transition Time:  2024-04-12T06:28:37Z
    Message:               Waiting on Nooba instance to finish initialization
    Reason:                NoobaaInitializing
    Status:                True
    Type:                  Progressing
    Last Heartbeat Time:   2024-04-12T06:28:37Z
    Last Transition Time:  2024-04-12T06:28:37Z
    Message:               Initializing StorageCluster
    Reason:                Init
    Status:                False
    Type:                  Degraded
    Last Heartbeat Time:   2024-04-12T06:34:36Z
    Last Transition Time:  2024-04-12T06:32:34Z
    Message:               CephCluster is creating: Processing OSD 2 on PVC "ocs-deviceset-managed-csi-0-data-0dh498"
    Reason:                ClusterStateCreating
    Status:                False
    Type:                  Upgradeable
  Current Mon Count:       3
  Failure Domain:          zone
  Failure Domain Key:      topology.kubernetes.io/zone
  Failure Domain Values:
    eastus-1
    eastus-2
    eastus-3
  Images:
    Ceph:
      Actual Image:   registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:500a744b3be913216d8164131ab97e1b29e112491709be65b30d8fb2d7f61ca0
      Desired Image:  registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:500a744b3be913216d8164131ab97e1b29e112491709be65b30d8fb2d7f61ca0
    Noobaa Core:
      Actual Image:   registry.redhat.io/odf4/mcg-core-rhel9@sha256:b30a5087373a5b3378fd09807399dd0340973e891a410b1bfa74bac634926621
      Desired Image:  registry.redhat.io/odf4/mcg-core-rhel9@sha256:b30a5087373a5b3378fd09807399dd0340973e891a410b1bfa74bac634926621
    Noobaa DB:
      Actual Image:   registry.redhat.io/rhel9/postgresql-15@sha256:76ff2541e3ff13b7f5feb1662597f33283bf9dc80e110bef2fb39633e8bbac00
      Desired Image:  registry.redhat.io/rhel9/postgresql-15@sha256:76ff2541e3ff13b7f5feb1662597f33283bf9dc80e110bef2fb39633e8bbac00
  Kms Server Connection:
    Kms Server Address:           https://ocsqe-azure-kv.vault.azure.net/
  Last Applied Resource Profile:  balanced
  Node Topologies:
    Labels:
      kubernetes.io/hostname:
        pakamble-az-8k8vp-worker-eastus1-c4j6n
        pakamble-az-8k8vp-worker-eastus2-rn7q9
        pakamble-az-8k8vp-worker-eastus3-cwhsz
      topology.kubernetes.io/region:
        eastus
      topology.kubernetes.io/zone:
        eastus-1
        eastus-2
        eastus-3
  Phase:  Progressing
  Related Objects:
    API Version:       ceph.rook.io/v1
    Kind:              CephCluster
    Name:              ocs-storagecluster-cephcluster
    Namespace:         openshift-storage
    Resource Version:  4496144
    UID:               6b1cc477-d876-4d35-af90-fafc53f1df71
    API Version:       noobaa.io/v1alpha1
    Kind:              NooBaa
    Name:              noobaa
    Namespace:         openshift-storage
    Resource Version:  4496639
    UID:               976f5fa0-2c38-4f07-86ac-58931ba3d738
  Version:             4.16.0
Events:                <none>


Nooba service details
-=-=-=-=-=-=-=-=-=-=-=-=-

> ocs describe noobaas.noobaa.io
Name:         noobaa
Namespace:    openshift-storage
Labels:       app=noobaa
Annotations:  <none>
API Version:  noobaa.io/v1alpha1
Kind:         NooBaa
Metadata:
  Creation Timestamp:  2024-04-12T06:32:27Z
  Finalizers:
    noobaa.io/graceful_finalizer
  Generation:  1
  Owner References:
    API Version:           ocs.openshift.io/v1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  StorageCluster
    Name:                  ocs-storagecluster
    UID:                   f0a09c5a-6893-4f21-ba5c-a39f7b14b840
  Resource Version:        4497145
  UID:                     976f5fa0-2c38-4f07-86ac-58931ba3d738
Spec:
  Affinity:
    Node Affinity:
      Required During Scheduling Ignored During Execution:
        Node Selector Terms:
          Match Expressions:
            Key:       cluster.ocs.openshift.io/openshift-storage
            Operator:  Exists
  Autoscaler:
    Autoscaler Type:       hpav2
    Prometheus Namespace:  openshift-monitoring
  Cleanup Policy:
  Core Resources:
    Limits:
      Cpu:     999m
      Memory:  4Gi
    Requests:
      Cpu:     999m
      Memory:  4Gi
  Db Image:    registry.redhat.io/rhel9/postgresql-15@sha256:76ff2541e3ff13b7f5feb1662597f33283bf9dc80e110bef2fb39633e8bbac00
  Db Resources:
    Limits:
      Cpu:     500m
      Memory:  4Gi
    Requests:
      Cpu:           500m
      Memory:        4Gi
  Db Storage Class:  ocs-storagecluster-ceph-rbd
  Db Type:           postgres
  Db Volume Resources:
    Requests:
      Storage:  50Gi
  Endpoints:
    Max Count:  2
    Min Count:  1
    Resources:
      Limits:
        Cpu:     999m
        Memory:  2Gi
      Requests:
        Cpu:     999m
        Memory:  2Gi
  Image:         registry.redhat.io/odf4/mcg-core-rhel9@sha256:b30a5087373a5b3378fd09807399dd0340973e891a410b1bfa74bac634926621
  Labels:
    Monitoring:
  Load Balancer Source Subnets:
  Pv Pool Default Storage Class:  ocs-storagecluster-ceph-rbd
  Security:
    Kms:
      Connection Details:
        AZURE_CERT_SECRET_NAME:  azure-ocs-xtcg1e53
        AZURE_CLIENT_ID:         ec78e481-8052-4ba1-b01d-ce5a47827ab5
        AZURE_TENANT_ID:         9cf78105-e3e9-4321-b88d-b001b66c762b
        AZURE_VAULT_URL:         https://ocsqe-azure-kv.vault.azure.net/
        KMS_PROVIDER:            azure-kv
        KMS_SERVICE_NAME:        Azure-kv-connection
      Schedule:                  @weekly
  Tolerations:
    Effect:    NoSchedule
    Key:       node.ocs.openshift.io/storage
    Operator:  Equal
    Value:     true
Status:
  Accounts:
    Admin:
      Secret Ref:
  Actual Image:  registry.redhat.io/odf4/mcg-core-rhel9@sha256:b30a5087373a5b3378fd09807399dd0340973e891a410b1bfa74bac634926621
  Conditions:
    Last Heartbeat Time:   2024-04-15T06:55:33Z
    Last Transition Time:  2024-04-12T06:32:27Z
    Message:               AZURE_SECRET_ID or AZURE_CLIENT_CERT_PATH not set
    Reason:                TemporaryError
    Status:                False
    Type:                  Available
    Last Heartbeat Time:   2024-04-15T06:55:33Z
    Last Transition Time:  2024-04-12T06:32:27Z
    Message:               AZURE_SECRET_ID or AZURE_CLIENT_CERT_PATH not set
    Reason:                TemporaryError
    Status:                True
    Type:                  Progressing
    Last Heartbeat Time:   2024-04-15T06:55:33Z
    Last Transition Time:  2024-04-12T06:32:27Z
    Message:               AZURE_SECRET_ID or AZURE_CLIENT_CERT_PATH not set
    Reason:                TemporaryError
    Status:                False
    Type:                  Degraded
    Last Heartbeat Time:   2024-04-15T06:55:33Z
    Last Transition Time:  2024-04-12T06:32:27Z
    Message:               AZURE_SECRET_ID or AZURE_CLIENT_CERT_PATH not set
    Reason:                TemporaryError
    Status:                False
    Type:                  Upgradeable
    Last Heartbeat Time:   2024-04-15T06:55:33Z
    Last Transition Time:  2024-04-12T06:32:27Z
    Status:                Invalid
    Type:                  KMS-Status
  Observed Generation:     1
  Phase:                   Creating
  Readme:

  NooBaa operator is still working to reconcile this system.
  Check out the system status.phase, status.conditions, and events with:

    kubectl -n openshift-storage describe noobaa
    kubectl -n openshift-storage get noobaa -o yaml
    kubectl -n openshift-storage get events --sort-by=metadata.creationTimestamp

  You can wait for a specific condition with:

    kubectl -n openshift-storage wait noobaa/noobaa --for condition=available --timeout -1s

  NooBaa Core Version:     master-20240314
  NooBaa Operator Version: 5.17.0

  Services:
    Service Mgmt:
    serviceS3:
    Service Sts:
    Service Syslog:
Events:  <none>

Comment 8 Tiffany Nguyen 2024-04-18 16:16:19 UTC
Verified with ODF 4.16.0-78.  Deployment was completed without any issue. 
$ oc get storagecluster
NAME                 AGE     PHASE   EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   5h31m   Ready              2024-04-18T10:43:02Z   4.16.0

$ oc get pod
NAME                                                              READY   STATUS      RESTARTS   AGE
console-7f45ffc7d7-zg5cc                                          1/1     Running     0          5h34m
csi-addons-controller-manager-7f46789597-dvnkh                    2/2     Running     0          5h34m
csi-cephfsplugin-7fwwb                                            2/2     Running     0          5h32m
csi-cephfsplugin-provisioner-77dd4b4978-5pqb8                     6/6     Running     0          5h32m
csi-cephfsplugin-provisioner-77dd4b4978-tlgbv                     6/6     Running     0          5h32m
csi-cephfsplugin-swtz2                                            2/2     Running     0          5h32m
csi-cephfsplugin-xptcg                                            2/2     Running     0          5h32m
csi-rbdplugin-76c4k                                               3/3     Running     0          5h32m
csi-rbdplugin-f72tr                                               3/3     Running     0          5h32m
csi-rbdplugin-provisioner-7cb98fd4cf-87lt4                        6/6     Running     0          5h32m
csi-rbdplugin-provisioner-7cb98fd4cf-9c5pk                        6/6     Running     0          5h32m
csi-rbdplugin-w6fcm                                               3/3     Running     0          5h32m
noobaa-core-0                                                     2/2     Running     0          5h28m
noobaa-db-pg-0                                                    1/1     Running     0          5h28m
noobaa-endpoint-7d79f779cb-585nq                                  1/1     Running     0          5h27m
noobaa-operator-7f67cf86fb-pqx6m                                  1/1     Running     0          5h34m
ocs-client-operator-console-7f45ffc7d7-9s6zm                      1/1     Running     0          5h34m
ocs-client-operator-controller-manager-fbd4c858f-fwtpf            2/2     Running     0          5h34m
ocs-metrics-exporter-78c77bdfff-cckq4                             1/1     Running     0          5h28m
ocs-operator-8644dfb4fc-r86qm                                     1/1     Running     0          5h33m
odf-console-69579fbbf9-dnbxf                                      1/1     Running     0          5h34m
odf-operator-controller-manager-d9c7696bc-g2pwb                   2/2     Running     0          5h34m
rook-ceph-crashcollector-pakamble-az-dsqk9-worker-eastus1-6nhnf   1/1     Running     0          5h30m
rook-ceph-crashcollector-pakamble-az-dsqk9-worker-eastus2-8p8dh   1/1     Running     0          5h30m
rook-ceph-crashcollector-pakamble-az-dsqk9-worker-eastus3-ccxz6   1/1     Running     0          5h30m
rook-ceph-exporter-pakamble-az-dsqk9-worker-eastus1-f74tz-h4jbq   1/1     Running     0          5h30m
rook-ceph-exporter-pakamble-az-dsqk9-worker-eastus2-wn4pv-r2wfd   1/1     Running     0          5h30m
rook-ceph-exporter-pakamble-az-dsqk9-worker-eastus3-2bxx4-xhfvg   1/1     Running     0          5h30m
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-5f7bf45bdzp4d   2/2     Running     0          5h29m
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-65b779cb4ct7s   2/2     Running     0          5h29m
rook-ceph-mgr-a-7bff64bfb9-nvhl8                                  3/3     Running     0          5h30m
rook-ceph-mgr-b-fdcc97c7d-k2lkj                                   3/3     Running     0          5h30m
rook-ceph-mon-a-5c8db74579-h5s6r                                  2/2     Running     0          5h31m
rook-ceph-mon-b-584b6678d7-hw646                                  2/2     Running     0          5h31m
rook-ceph-mon-c-7f6556f5d5-nvmtb                                  2/2     Running     0          5h30m
rook-ceph-operator-5b76cf76b7-ckfbj                               1/1     Running     0          5h33m
rook-ceph-osd-0-7c8ddc5564-tmlxm                                  2/2     Running     0          5h27m
rook-ceph-osd-1-796884ff6d-l5hjj                                  2/2     Running     0          5h27m
rook-ceph-osd-2-5957585cc7-g2tg7                                  2/2     Running     0          5h26m
rook-ceph-osd-prepare-70e15aa433d0c7ee511e0b867b50ad1b-66v4z      0/1     Completed   0          5h30m
rook-ceph-osd-prepare-b02b497d667ab52861c8ce6b484242e0-4c7q5      0/1     Completed   0          5h30m
rook-ceph-osd-prepare-d1cbfe39935e78481f6b7657a4ce7948-t7vln      0/1     Completed   0          5h30m
ux-backend-server-78db4d8c5-7g882                                 2/2     Running     0          5h33m

Comment 10 errata-xmlrpc 2024-07-17 13:19:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.0 security, enhancement & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:4591


Note You need to log in before you can comment on or make changes to this bug.