Description of problem (please be detailed as possible and provide log snippests): OCS has hard-coded to use port 9091 today which clashes with Calico in RedHat OpenShift kubernetes service (ROKS). This needs to be made configurable so the OCS installation can take ports which are open in ROKS. Version of all relevant components (if applicable): Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? OCS deploy fails as port 9091 is already in use by Calico. Is there any workaround available to the best of your knowledge? I change port of calico component manually so I can deploy OCS. Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 3 Can this issue reproducible? Yes Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1.Deploy OCS on RedHat OpenShift kubernetes service (ROKS) Actual results: The Ceph RBD and Ceph FS daemonsets fail as port 9091 is in use. Expected results: Additional info: We have a GHE open for this issue https://github.com/openshift/ocs-operator/issues/451
Not a blocker for the release. Moving accordingly
Similarly not a blocker for OCS 4.4, moving to OCS 4.5. Travis, do you know if this is configurable in Rook-Ceph today, and if so what we'd need to do in the ocs-operator to make use of it?
Assigning this to Umanga temporarily so he looks into this a bit more.
Yes, this is configurable in Rook today with the operator setting found here: https://github.com/rook/rook/blob/master/cluster/examples/kubernetes/ceph/operator.yaml#L90 Madhu can answer any questions around it as well for the csi settings.
@umanga can we make sure this is going into OCS 4.5?
@Umanga, The rook release-4.5 branch is already updated with the latest release-1.3 upstream changes. Is there another changes you are waiting for?
(In reply to Travis Nielsen from comment #9) > @Umanga, The rook release-4.5 branch is already updated with the latest > release-1.3 upstream changes. Is there another changes you are waiting for? > I am waiting for OCS Operator dependency to be updated to v1.3. Currently it is v1.2.4. Without this change, PR can not be merged to OCS Operator.
Tested in OpenShift Container Storage 4.5.0-482.ci Cluster version is 4.5.0-0.nightly-2020-07-07-210042 AWS platform. ---------------------------------------------------------------------------------------------------------------------------------- Before updating rook-ceph-operator-config configmap. 1. Output from a csi-cephfsplugin pod which shows metricsport=9091 (default) in csi-cephfsplugin container args csi-cephfsplugin: Container ID: cri-o://4ade59a41e1ec9c71655ce6428661fcec9df3fdc90c4c0699ae24755c02a9e51 Image: quay.io/rhceph-dev/cephcsi@sha256:d909420cf801be463e7aaaa95217cd90011d0003c021161d2eae1e640935b8b1 Image ID: quay.io/rhceph-dev/cephcsi@sha256:17ed0d09bddaed5f0368a9200960b9531aac1edee3c9ac318dda798279459a5f Port: <none> Host Port: <none> Args: --nodeid=$(NODE_ID) --type=cephfs --endpoint=$(CSI_ENDPOINT) --v=0 --nodeserver=true --drivername=openshift-storage.cephfs.csi.ceph.com --metadatastorage=k8s_configmap --mountcachedir=/mount-cache-dir --pidlimit=-1 --metricsport=9091 --forcecephkernelclient=true --metricspath=/metrics --enablegrpcmetrics=true 2. From worker node. # lsof -i -P -n | grep cephcsi cephcsi 36475 root 7u IPv4 219549 0t0 TCP 10.0.137.45:9090 (LISTEN) cephcsi 36476 root 7u IPv4 218375 0t0 TCP 10.0.137.45:9091 (LISTEN) cephcsi 36665 root 3u IPv4 214909 0t0 TCP 10.0.137.45:9081 (LISTEN) cephcsi 36668 root 3u IPv4 223434 0t0 TCP 10.0.137.45:9080 (LISTEN) ---------------------------------------------------------------------------------------------------------------------------------- Verification steps performed on an existing cluster: 1. Edit configmap rook-ceph-operator-config and add this parameter under 'data'. The parameter CSI_CEPHFS_GRPC_METRICS_PORT itself is not present in rook-ceph-operator-config. The default value will be 9091 (Value --metricsport=9091 from a csi-cephfsplugin pod describe output). So add a different port which is not in use. data: CSI_CEPHFS_GRPC_METRICS_PORT: "9062" 2. Wait for csi-cephfsplugin and csi-cephfsplugin-provisioner pods to re-spin. 3. Do oc describe of csi-cephfsplugin and csi-cephfsplugin-provisioner pods and check the value of metricsport in csi-cephfsplugin container args. The port should be updated to the value 9062 given in step 1. This proves the port is configurable. csi-cephfsplugin: Container ID: cri-o://9da599079de3782ffd5e140e9d2dd0d6a735b2b5f74f1097c07e1b6f32c3e723 Image: quay.io/rhceph-dev/cephcsi@sha256:d909420cf801be463e7aaaa95217cd90011d0003c021161d2eae1e640935b8b1 Image ID: quay.io/rhceph-dev/cephcsi@sha256:17ed0d09bddaed5f0368a9200960b9531aac1edee3c9ac318dda798279459a5f Port: <none> Host Port: <none> Args: --nodeid=$(NODE_ID) --type=cephfs --endpoint=$(CSI_ENDPOINT) --v=0 --controllerserver=true --drivername=openshift-storage.cephfs.csi.ceph.com --metadatastorage=k8s_configmap --pidlimit=-1 --metricsport=9062 --forcecephkernelclient=true --metricspath=/metrics --enablegrpcmetrics=true 4. Check the port in worker nodes. Port 9091 is now changed to 9062. # lsof -i -P -n | grep cephcsi cephcsi 36475 root 7u IPv4 219549 0t0 TCP 10.0.137.45:9090 (LISTEN) cephcsi 36668 root 3u IPv4 223434 0t0 TCP 10.0.137.45:9080 (LISTEN) cephcsi 791774 root 5u IPv4 3629877 0t0 TCP 10.0.137.45:9062 (LISTEN) cephcsi 791863 root 3u IPv4 3643614 0t0 TCP 10.0.137.45:9081 (LISTEN) I will test this in a fresh installation and update here.
Hi Umanga, Please update doc text to document the steps to be performed in existing cluster and in a fresh installation.
Github Issue for context on how to use it : https://github.com/openshift/ocs-operator/issues/451#issuecomment-638685472
Tested in OpenShift Container Storage 4.5.0-493.ci Cluster version is 4.5.0-0.nightly-2020-07-17-014709 AWS platform. Test: Change CSI_CEPHFS_GRPC_METRICS_PORT port before installing OCS storage cluster when it is identified that the port 9091 is in use by another application ----------------------------------------------------------------------------------------------------------------------------------------------------------- Before installing storage cluster $ oc get csv NAME DISPLAY VERSION REPLACES PHASE ocs-operator.v4.5.0-493.ci OpenShift Container Storage 4.5.0-493.ci Succeeded $ oc get configmap rook-ceph-operator-config -o yaml apiVersion: v1 data: CSI_PLUGIN_TOLERATIONS: |2- - key: node.ocs.openshift.io/storage operator: Equal value: "true" effect: NoSchedule CSI_PROVISIONER_TOLERATIONS: |2- - key: node.ocs.openshift.io/storage operator: Equal value: "true" effect: NoSchedule kind: ConfigMap metadata: creationTimestamp: "2020-07-17T06:55:43Z" managedFields: - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:data: .: {} f:CSI_PLUGIN_TOLERATIONS: {} f:CSI_PROVISIONER_TOLERATIONS: {} manager: ocs-operator operation: Update time: "2020-07-17T06:55:43Z" name: rook-ceph-operator-config namespace: openshift-storage resourceVersion: "37945" selfLink: /api/v1/namespaces/openshift-storage/configmaps/rook-ceph-operator-config uid: 58be63bc-4e71-49bb-9b70-56c4353bd861 Step 1: Edit configmap rook-ceph-operator-config and add this parameter under 'data'. The parameter CSI_CEPHFS_GRPC_METRICS_PORT itself is not present in rook-ceph-operator-config. data: CSI_CEPHFS_GRPC_METRICS_PORT: "9061" Step 2: Verify the value is present in rook-ceph-operator-config yaml. $ oc get configmap rook-ceph-operator-config -o yaml apiVersion: v1 data: CSI_CEPHFS_GRPC_METRICS_PORT: "9061" CSI_PLUGIN_TOLERATIONS: |2- - key: node.ocs.openshift.io/storage operator: Equal value: "true" effect: NoSchedule CSI_PROVISIONER_TOLERATIONS: |2- - key: node.ocs.openshift.io/storage operator: Equal value: "true" effect: NoSchedule kind: ConfigMap metadata: creationTimestamp: "2020-07-17T06:55:43Z" managedFields: - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:data: .: {} f:CSI_PLUGIN_TOLERATIONS: {} f:CSI_PROVISIONER_TOLERATIONS: {} manager: ocs-operator operation: Update time: "2020-07-17T06:55:43Z" - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:data: f:CSI_CEPHFS_GRPC_METRICS_PORT: {} manager: oc operation: Update time: "2020-07-17T07:04:41Z" name: rook-ceph-operator-config namespace: openshift-storage resourceVersion: "41660" selfLink: /api/v1/namespaces/openshift-storage/configmaps/rook-ceph-operator-config uid: 58be63bc-4e71-49bb-9b70-56c4353bd861 Step 3: Create OCS Storage Cluster Step 4: Do oc describe of csi-cephfsplugin and csi-cephfsplugin-provisioner pods and check the value of metricsport in csi-cephfsplugin container args. The port should be updated to the value 9061 given in step 1. This proves the port is configurable. csi-cephfsplugin: Container ID: cri-o://c5f6fbd04067f400c1475dad850a789861a4e121735cbadf0180587aea65cece Image: quay.io/rhceph-dev/cephcsi@sha256:b4e7caf299762bd78f40f174a166d0d8399eef00593e6afcb9696b241cd3ceb0 Image ID: quay.io/rhceph-dev/cephcsi@sha256:241b67c2f2b3fe347a75e745a074d4723f6fead3631ebd560ab85d604a26d321 Port: <none> Host Port: <none> Args: --nodeid=$(NODE_ID) --type=cephfs --endpoint=$(CSI_ENDPOINT) --v=0 --nodeserver=true --drivername=openshift-storage.cephfs.csi.ceph.com --metadatastorage=k8s_configmap --mountcachedir=/mount-cache-dir --pidlimit=-1 --metricsport=9061 --forcecephkernelclient=true --metricspath=/metrics --enablegrpcmetrics=true State: Running Step 5: Check the port in worker nodes. Port 9061 should be listening # lsof -i -P -n | grep cephcsi | grep 9061 cephcsi 305192 root 5u IPv4 1341096 0t0 TCP 10.0.131.15:9061 (LISTEN)
Hi Umanga, As the fix for this bug makes all the below ports configurable, I think we can mention that in doc. It will be helpful if any other port among the below default metrics ports is in use by another application. CSI_CEPHFS_GRPC_METRICS_PORT: "9091" CSI_CEPHFS_LIVENESS_METRICS_PORT: "9081" CSI_RBD_GRPC_METRICS_PORT: "9090" CSI_RBD_LIVENESS_METRICS_PORT: "9080" Tested in 4.5.0-493. $ oc get configmap rook-ceph-operator-config -o yaml apiVersion: v1 data: CSI_CEPHFS_GRPC_METRICS_PORT: "9061" CSI_CEPHFS_LIVENESS_METRICS_PORT: "9050" CSI_PLUGIN_TOLERATIONS: |2- - key: node.ocs.openshift.io/storage operator: Equal value: "true" effect: NoSchedule CSI_PROVISIONER_TOLERATIONS: |2- - key: node.ocs.openshift.io/storage operator: Equal value: "true" effect: NoSchedule CSI_RBD_GRPC_METRICS_PORT: "9041" CSI_RBD_LIVENESS_METRICS_PORT: "9030" kind: ConfigMap metadata: creationTimestamp: "2020-07-17T06:55:43Z" managedFields: - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:data: .: {} f:CSI_PLUGIN_TOLERATIONS: {} f:CSI_PROVISIONER_TOLERATIONS: {} manager: ocs-operator operation: Update time: "2020-07-17T06:55:43Z" - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:data: f:CSI_CEPHFS_GRPC_METRICS_PORT: {} f:CSI_CEPHFS_LIVENESS_METRICS_PORT: {} f:CSI_RBD_GRPC_METRICS_PORT: {} f:CSI_RBD_LIVENESS_METRICS_PORT: {} manager: oc operation: Update time: "2020-07-17T07:09:31Z" name: rook-ceph-operator-config namespace: openshift-storage resourceVersion: "43589" selfLink: /api/v1/namespaces/openshift-storage/configmaps/rook-ceph-operator-config uid: 58be63bc-4e71-49bb-9b70-56c4353bd861 # lsof -i -P -n | grep cephcsi cephcsi 305192 root 5u IPv4 1341096 0t0 TCP 10.0.131.15:9061 (LISTEN) cephcsi 305201 root 6u IPv4 1342019 0t0 TCP 10.0.131.15:9041 (LISTEN) cephcsi 305402 root 3u IPv4 1352597 0t0 TCP 10.0.131.15:9030 (LISTEN) cephcsi 305403 root 3u IPv4 1343862 0t0 TCP 10.0.131.15:9050 (LISTEN)
Based on #comment16 and #comment18 , moving this bug to verified state.
(In reply to Jilju Joy from comment #20) > Based on #comment16 and #comment18 , moving this bug to verified state. Correction : Based on #comment15 and #comment18 , moving this bug to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Container Storage 4.5.0 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3754