Bug 2124950

Summary: Unable to provision Cephfs storage within single stack IPv6 OCP 4.9 + ODF + Ceph environment.
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Travis Morgan <tmorgan>
Component: csi-driverAssignee: Madhu Rajanna <mrajanna>
Status: CLOSED WORKSFORME QA Contact: krishnaram Karthick <kramdoss>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.9CC: madam, mrajanna, ocs-bugs, odf-bz-bot, swilson
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-09-09 02:09:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Travis Morgan 2022-09-07 14:27:28 UTC
Description of problem (please be detailed as possible and provide log
snippests):
Working on an IPv6 single stack OCP cluster and deploying ODF with external Ceph (also IPv6 single stack) cluster. During testing the addition of rbd storage works fine but the additional of cephfs storage does not. Cephfs storage will provision and become bound, but an error is given trying to mount it to the container.

```
[kni@deployer external]$ oc get pods
NAME                         READY   STATUS              RESTARTS   AGE
anodftest-567b9b5f64-5h5f9   0/1     ContainerCreating   0          20h

[kni@deployer external]$ oc get pvc
NAME               STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                           AGE
anodftest-cephfs   Bound    pvc-ed0fc436-2e94-4855-84d5-b6097cd6803d   5Gi        RWX            ocs-external-storagecluster-cephfs     4d16h
anodftest-rbd      Bound    pvc-1d1510d3-59af-4f71-b492-190235e00980   5Gi        RWO            ocs-external-storagecluster-ceph-rbd   4d16h


LAST SEEN   TYPE      REASON        OBJECT                           MESSAGE
24m         Warning   FailedMount   pod/anodftest-567b9b5f64-5h5f9   MountVolume.MountDevice failed for volume "pvc-ed0fc436-2e94-4855-84d5-b6097cd6803d" : rpc error: code = Internal desc = rados: ret=-1, Operation not permitted
74m         Warning   FailedMount   pod/anodftest-567b9b5f64-5h5f9   Unable to attach or mount volumes: unmounted volumes=[anodftest-cephfs], unattached volumes=[kube-api-access-rghgg anodftest-rbd anodftest-cephfs]: timed out waiting for the condition
14m         Warning   FailedMount   pod/anodftest-567b9b5f64-5h5f9   Unable to attach or mount volumes: unmounted volumes=[anodftest-cephfs], unattached volumes=[anodftest-cephfs kube-api-access-rghgg anodftest-rbd]: timed out waiting for the condition
4m30s       Warning   FailedMount   pod/anodftest-567b9b5f64-5h5f9   Unable to attach or mount volumes: unmounted volumes=[anodftest-cephfs], unattached volumes=[anodftest-rbd anodftest-cephfs kube-api-access-rghgg]: timed out waiting for the condition
```

Also, may be unrelated, but it fails to parse the monitor endpoint IP for cluster monitoring:

```
[kni@deployer external]$ oc get cephcluster
NAME                                      DATADIRHOSTPATH   MONCOUNT   AGE     PHASE         MESSAGE                                                                                                                                                                                                                                                                                                                                                                                                                                           HEALTH   EXTERNAL
ocs-external-storagecluster-cephcluster                                5d13h   Progressing   failed to configure external cluster monitoring: failed to configure external metrics endpoint: failed to create or update mgr endpoint: failed to create endpoint "rook-ceph-mgr-external". Endpoints "rook-ceph-mgr-external" is invalid: [subsets[0].addresses[0].ip: Invalid value: "[2001": must be a valid IP address, (e.g. 10.9.8.7 or 2001:db8::ffff), subsets[0].addresses[0].ip: Invalid value: "[2001": must be a valid IP address]            true

  {
    "name": "rook-ceph-mon-endpoints",
    "kind": "ConfigMap",
    "data": {
      "data": "<redacted_hostname>=[2001:1900:2200:9349:9640:c9ff:fe83:510]:6789",
      "maxMonId": "0",
      "mapping": "{}"
    }
  }
```

Version of all relevant components (if applicable):
```
[kni@deployer external]$ oc version
Client Version: 4.10.0-202202160843.p0.gf93da17.assembly.stream-f93da17
Server Version: 4.9.28
Kubernetes Version: v1.22.5+a36406b

[kni@deployer external]$ oc describe operators odf-operator.openshift-storage | grep -A1 ClusterServiceVersion
      Kind:                    ClusterServiceVersion
      Name:                    odf-operator.v4.9.10
[kni@deployer external]$ oc describe operators ocs-operator.openshift-storage | grep -A1 ClusterServiceVersion
      Kind:                    ClusterServiceVersion
      Name:                    ocs-operator.v4.9.10

[kni@deployer external]$ ceph version
ceph version 16.2.8-84.el8cp (c2980f2fd700e979d41b4bad2939bb90f0fe435c) pacific (stable)
```

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes. Customer requires cephfs storage and it cannot be provided at this point.

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
4

Can this issue reproducible?
Expect that this would be an issue on any single stack IPv6 environment. Access to the environment in question is still available.

Can this issue reproduce from the UI?
UI was used to add the storage to the deployment.

If this is a regression, please provide more details to justify this:
IPv6 is listed as "Dev-Preview" in 4.9, 4.10, and 4.11. Telco customers rolling out 5G are requiring this due to the size of their address spaces.

Steps to Reproduce:
1.Deploy single stack IPv6 OCP and Ceph external clusters.
2.Add ODF operator with external Ceph storage.
3.Create a deployment.
4.Add a cephfs backed pvc to the deployment.


Actual results:
Cephfs storage provisions but unable to mount it in the containers.

Expected results:
Should mount up.

Additional info:
RBD works.

Comment 4 Madhu Rajanna 2022-09-08 05:21:55 UTC
From the cephfspluing logs looks like its a permission issue.

>2022-09-07T18:41:45.180468318Z I0907 18:41:45.180263       1 utils.go:177] ID: 3082 Req-ID: 0001-0011-openshift-storage-0000000000000002-5b1a50cd-2b00-11ed-b0b8-0a58d6b8621b GRPC call: /csi.v1.Node/NodeStageVolume
2022-09-07T18:41:45.180468318Z I0907 18:41:45.180337       1 utils.go:181] ID: 3082 Req-ID: 0001-0011-openshift-storage-0000000000000002-5b1a50cd-2b00-11ed-b0b8-0a58d6b8621b GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-ed0fc436-2e94-4855-84d5-b6097cd6803d/globalmount","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":5}},"volume_context":{"clusterID":"openshift-storage","fsName":"silver-fs","pool":"cephfs.silver-fs.data","storage.kubernetes.io/csiProvisionerIdentity":"1662076724671-8081-openshift-storage.cephfs.csi.ceph.com","subvolumeName":"csi-vol-5b1a50cd-2b00-11ed-b0b8-0a58d6b8621b","subvolumePath":"/volumes/csi/csi-vol-5b1a50cd-2b00-11ed-b0b8-0a58d6b8621b/914b5be7-c22c-4e83-b4bd-44bbd11bbc2e"},"volume_id":"0001-0011-openshift-storage-0000000000000002-5b1a50cd-2b00-11ed-b0b8-0a58d6b8621b"}
2022-09-07T18:41:45.184283693Z E0907 18:41:45.184042       1 utils.go:186] ID: 3082 Req-ID: 0001-0011-openshift-storage-0000000000000002-5b1a50cd-2b00-11ed-b0b8-0a58d6b8621b GRPC error: rpc error: code = Internal desc = rados: ret=-1, Operation not permitted



Can you please provide `ceph auth ls` from the ceph cluster? We don't have it in must-gather.

Is this a customer cluster or a cluster deployed in labs? is it possible to get access to debug why the permission error we are getting?

Comment 9 Madhu Rajanna 2022-09-09 02:09:05 UTC
Thanks, Closing it as works for me,