Description of problem (please be detailed as possible and provide log snippests): ----------------------------------------------------------------------- Debugging of a failed NSFS regression test on OCS-CI reveals that when we create a NSFS Namespacestore, it remains stuck in Rejected with the following error at the YAML: ``` ... ... status: conditions: - lastHeartbeatTime: "2024-06-03T12:17:14Z" lastTransitionTime: "2024-06-03T12:17:14Z" message: NamespaceStorePhaseRejected reason: 'Namespace store mode: STORAGE_NOT_EXIST' status: Unknown type: Available ``` However, after pausing the test at this phase I was still able to create the required MCG account that uses the namespacestore for its default buckets, and then create and write to/from the said bucket. It appears the Namespacestore is still functional despite the error message. Version of all relevant components (if applicable): ----------------------------------------------------------------------- OCP: 4.16.0-0.nightly-2024-06-02-000851 ODF: 4.16.0-113 ceph: 18.2.1-188.el9cp (b1ae9c989e2f41dcfec0e680c11d1d9465b1db0e) reef (stable) rook: v4.16.0-0.a2396a5186cc038b22154e857e0f7865e709d06a noobaa core: 5.16.0-03db21f noobaa operator: 5.16.0-705652b55ddaabc6bbdf16cb648c4f9a72345cf1 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? ----------------------------------------------------------------------- It fails the regression test, but otherwise the feature still seems functional. Is there any workaround available to the best of your knowledge? ----------------------------------------------------------------------- No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? ----------------------------------------------------------------------------- 3 Can this issue reproducible? ----------------------------------------------------------------------------- Yep Can this issue reproduce from the UI? ----------------------------------------------------------------------------- N/A If this is a regression, please provide more details to justify this: ----------------------------------------------------------------------------- The OCS-CI tests in https://github.com/red-hat-storage/ocs-ci/blob/master/tests/functional/object/mcg/test_nsfs.py have been passing before 4.16, and are now failing because the Namespacestore appears unfunctional. Steps to Reproduce: ------------------------------------------------------------------------------ 1. Create an RWX CephFS PVC in the openshift-storage project: 2. Create a deployment that mounts PVC at mountpath "/nsfs": ``` apiVersion: v1 kind: PersistentVolumeClaim metadata: name: my-pvc namespace: openshift-storage spec: accessModes: - ReadWriteMany resources: requests: storage: 25Gi storageClassName: ocs-storagecluster-cephfs ``` apiVersion: apps/v1 kind: Deployment metadata: name: nsfs-interface namespace: openshift-storage spec: replicas: 1 selector: matchLabels: app: nsfs-interface template: metadata: labels: app: nsfs-interface spec: containers: - command: - /bin/sh image: registry.access.redhat.com/ubi8/ubi:8.5-214 imagePullPolicy: IfNotPresent name: ubi8 stdin: true tty: true volumeMounts: - mountPath: /nsfs name: my-pvc securityContext: runAsUser: 1000620000 volumes: - name: my-pvc persistentVolumeClaim: claimName: my-pvc ``` 3. Create an NSFS Namespacestore: ``` apiVersion: noobaa.io/v1alpha1 kind: NamespaceStore metadata: name: nsfs-nss namespace: openshift-storage labels: app: noobaa finalizers: - noobaa.io/finalizer spec: type: nsfs nsfs: pvcName: my-pvc subpath: "nsfs" ``` 4. Create a new MCG account that uses the NSFS NSS by default: ``` noobaa account create nsfs-account --allow_bucket_create=True --default_resource nsfs-nss --gid 1234 --new_buckets_path / --nsfs_account_config=True --nsfs_only=False --uid 5678 -n openshift-storage ``` 5. Use its credentials to create a new NSFS bucket on the NSS via S3 and make some I.O against it: ``` ACC_NAME=nsfs-account S3_ENDPOINT=https://$(oc get route s3 -n openshift-storage -o json | jq -r '.status.ingress[0].host') S3_ACCESS_KEY=$(kubectl get secret noobaa-account-$ACC_NAME -n openshift-storage -o json | jq -r '.data.AWS_ACCESS_KEY_ID|@base64d') S3_SECRET_KEY=$(kubectl get secret noobaa-account-$ACC_NAME -n openshift-storage -o json | jq -r '.data.AWS_SECRET_ACCESS_KEY|@base64d') alias my_s3="AWS_ACCESS_KEY_ID=$S3_ACCESS_KEY AWS_SECRET_ACCESS_KEY=$S3_SECRET_KEY aws --endpoint $S3_ENDPOINT --no-verify-ssl s3 my_s3 mb s3://nsfs-bucket --region=us-east-2 my_s3 sync test_objects/ s3://nsfs-bucket/ my_s3 ls s3://nsfs-bucket/ ``` Actual results: --------------------------------------------------------------- The Namespacestore appears Rejected after its creation, but the bucket creation at the last step and the I.O against it works in spite of the status of the NSS. Expected results: --------------------------------------------------------------- The NSFS Namespacestore should remain in the Ready phase shortly after its creation, and the bucket creation and I.O against it at the last step should work. Additional info: --------------------------------------------------------------- - A live cluster with the issue should be available here until the 5th of June: https://url.corp.redhat.com/c406c3c - Due to an automation issue, the OCS-CI test requires the following fix that hasn't been merged yet in order to reach the step at the test in which we see the NSS remains Rejected: https://github.com/red-hat-storage/ocs-ci/pull/9892
A small but important correction - in the OCS-CI test, the NSFS NSS is created via the following CLI command: ``` noobaa namespacestore create nsfs nsfs-ns-store-d4ef270ee828427a9f337eced0 --pvc-name nsfs-persistentvolumeclaim-40eef6c340d74 -n openshift-storage ``` Which apparently sets a blank mountPath at the YAML of the Namespacestore: ``` $ oc get namespacestore nsfs-ns-store-d4ef270ee828427a9f337eced0 -o yaml apiVersion: noobaa.io/v1alpha1 kind: NamespaceStore metadata: creationTimestamp: "2024-06-03T14:26:25Z" finalizers: - noobaa.io/finalizer generation: 1 labels: app: noobaa name: nsfs-ns-store-d4ef270ee828427a9f337eced0 namespace: openshift-storage ownerReferences: - apiVersion: noobaa.io/v1alpha1 blockOwnerDeletion: true controller: true kind: NooBaa name: noobaa uid: ebb3b038-b3af-4474-b8d0-8265c8bb73e6 resourceVersion: "1152201" uid: c188d11c-b620-4530-b1d0-ee104c4d9644 spec: accessMode: ReadWrite nsfs: pvcName: nsfs-persistentvolumeclaim-40eef6c340d74 subPath: "" type: nsfs status: conditions: ... ... mode: modeCode: STORAGE_NOT_EXIST timeStamp: 2024-06-03 14:26:55.733037139 +0000 UTC m=+103976.967130090 phase: Rejected ``` If an empty mount path is not supposed to be accepted, then the actual bugs are here: 1) The noobaa-cli command should have failed since we didn't specify the mount path 2) That we were still able to use the Namespacestore as the default resource for an MCG account, and then were able to create the bucket in the first place
Hi, I looked into it- -The STORAGE_NOT_EXIST state is a result of a monitoring report on this namestore resource. -Since this is a relatively simple scenario, I don't think problem is related to scenario but rather to interaction with underlying FS. -The only relevant thing about the logs are the start date of the endpoints, which somehow after the creation of the namespace store. It's weird, but I can't tie it directly to the problem. -Unfortunately, the logs do not contain the monitoring reports. If possible, please run with higher log level- nb system set-debug-level 3. This will give us more info about source of STORAGE_NOT_EXIST state. The relevant log lines should have "update_issues_report:". thanks, Amit
Generally, waiting until pods are available before proceeding to next step is a good practice. Assuming we want to continue this- The scenarios in #17 and original description has different order of namespacestore and deployment creation (original description first create deployment, then namespacestore, but #17 first create namespacestore). I'm guessing the original description is more accurate? If we're going by original scenario -is it possible to get the state of the namespacestore before and after each step? I think that would help pinning in on the problematic step. If we're going by #17- The deployment is for cephFS, right? If so, maybe it issues some s3 commands to noobaa upon creation? As mentioned in #10 ans #12, this might be the trigger for failure that results in rejected namespacestore status.