Bug 2180921
Summary: | Deployment with external cluster in ODF 4.13 with unable to use cephfs as backing store for image_registry | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Petr Balogh <pbalogh> |
Component: | ocs-operator | Assignee: | Malay Kumar parida <mparida> |
Status: | CLOSED ERRATA | QA Contact: | Elad <ebenahar> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.13 | CC: | mparida, ocs-bugs, odf-bz-bot, paarora, sbalusu, tnielsen |
Target Milestone: | --- | Keywords: | Automation, Regression |
Target Release: | ODF 4.13.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-06-21 15:24:39 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Petr Balogh
2023-03-22 16:13:24 UTC
The csi configmap is getting the msgr2 ports: % oc get cm rook-ceph-csi-config -o yaml apiVersion: v1 data: csi-cluster-config-json: '[{"clusterID":"openshift-storage","monitors":["10.1.115.103:3300","10.1.115.104:3300","10.1.115.107:3300"],"namespace":"openshift-storage"}]' While the mon endpoints configmap correctly still contains the v1 ports: % oc get cm rook-ceph-mon-endpoints -oyaml apiVersion: v1 data: csi-cluster-config-json: '[{"clusterID":"openshift-storage","monitors":["10.1.115.104:6789","10.1.115.107:6789","10.1.115.103:6789"],"namespace":""}]' data: rhcs-1-node-1=10.1.115.104:6789,rhcs-1-node-2=10.1.115.107:6789,rhcs-1-node-3=10.1.115.103:6789 The cause of changing to the msgr2 endpoints is from this call [1]: monEndpoints := csi.MonEndpoints(cluster.ClusterInfo.Monitors, cluster.Spec.RequireMsgr2()) For external clusters, rook should really be ignoring the RequireMsgr2 setting and we should just use the same endpoints that were given to connect to the provider cluster. There would not be a scenario where we need to change these to the msgr2 ports. Madhu, how about if we just change this param to false? monEndpoints := csi.MonEndpoints(cluster.ClusterInfo.Monitors, false) And we may also need a fix in the ocs operator to not set the ms_mode=prefer-crc as a cephfs mount option for external clusters. [1] https://github.com/rook/rook/blob/master/pkg/operator/ceph/cluster/cluster_external.go#L116 Yes Madhu the v2 port adrresses you see in the rook-ceph-csi-config cm was changed by me. It was having the v1 ports 6789 & I changed it manually to 3300. Then the mounts worked & pod got to running. The relevant discussion for this bug is happening here https://chat.google.com/room/AAAAREGEba8/QSwFEz4GICM We are mostly investigating why rook is not setting the csi-configmap to v2 ports on external clusters like on internal mode clusters. If we are able to debug that we will fix that, If it doesn't happen we will turn off require msgr2 for external ceph clusters & won't pass any kernel_mount_options. Per discussion with csi team, for external cluster it will be less risky to set RequireMsgr2: false, in case the provider cluster still needs to use msgr1. This means the ms_mode cannot be set for cephfs for external clusters either. *** Bug 2183073 has been marked as a duplicate of this bug. *** Changing the backing store for image registry to use cephfs works as expected: 2023-04-13 22:49:03 19:49:02 - MainThread - ocs_ci.ocs.resources.ocs - INFO - Adding PersistentVolumeClaim with name registry-cephfs-rwx-pvc 2023-04-13 22:49:03 19:49:02 - MainThread - ocs_ci.utility.templating - INFO - apiVersion: v1 2023-04-13 22:49:03 kind: PersistentVolumeClaim 2023-04-13 22:49:03 metadata: 2023-04-13 22:49:03 name: registry-cephfs-rwx-pvc 2023-04-13 22:49:03 namespace: openshift-image-registry 2023-04-13 22:49:03 spec: 2023-04-13 22:49:03 accessModes: 2023-04-13 22:49:03 - ReadWriteMany 2023-04-13 22:49:03 resources: 2023-04-13 22:49:03 requests: 2023-04-13 22:49:03 storage: 100Gi 2023-04-13 22:49:03 storageClassName: ocs-external-storagecluster-cephfs 2023-04-13 22:49:03 19:49:02 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n openshift-image-registry create -f /tmp/PersistentVolumeClaim1zygleng -o yaml 2023-04-13 22:49:03 19:49:03 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc patch configs.imageregistry.operator.openshift.io/cluster -p '[{"op": "add", "path": "/spec/storage", "value": {"pvc": {"claim": "registry-cephfs-rwx-pvc"}}}]' --type json 2023-04-13 22:50:15 19:50:15 - MainThread - ocs_ci.ocs.registry - INFO - Verified pvc is mounted on image-registry-868f8dc7c-8gc7j pod ===================== Verified with: ODF 4.13.0-162 Ceph Version 16.2.8-85.el8cp (0bdc6db9a80af40dd496b05674a938d406a9f6f5) pacific (stable) Cluster Version 4.13.0-0.nightly-2023-04-13-122023 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Data Foundation 4.13.0 enhancement and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:3742 |