Bug 2076457 - After node replacement[provider], connection issue between consumer and provider if the provider node which was referenced MON-endpoint configmap (on consumer) is lost
Summary: After node replacement[provider], connection issue between consumer and provi...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: rook
Version: 4.10
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ODF 4.11.0
Assignee: Madhu Rajanna
QA Contact: Oded
URL:
Whiteboard:
Depends On:
Blocks: 2078713 2083637
TreeView+ depends on / blocked
 
Reported: 2022-04-19 06:07 UTC by Oded
Modified: 2023-08-09 17:03 UTC (History)
14 users (show)

Fixed In Version: 4.11.0-66
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 2078713 (view as bug list)
Environment:
Last Closed: 2022-08-24 13:51:03 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage rook pull 372 0 None Merged Resync from upstream release-1.9 to downstream 4.11 2022-04-22 20:15:06 UTC
Github rook rook pull 10135 0 None open csi: add/remove mon IP from csi config 2022-04-22 16:26:36 UTC
Red Hat Product Errata RHSA-2022:6156 0 None None None 2022-08-24 13:51:52 UTC

Internal Links: 2083637 2086485

Description Oded 2022-04-19 06:07:19 UTC
Description of problem:
After node replacement procedure [on provider cluster] we got a new mon IP and old mon removed.
On consumer the mon endpoints still use the IP of the deleted node.

Version-Release number of selected component (if applicable):
Provider:
OCP Version: 4.10.8
ODF Version: 4.10.0-221
Deployer: 
$ oc describe csv ocs-osd-deployer.v2.0.1|grep -i image
    Mediatype:   image/svg+xml
                Image:  quay.io/openshift/origin-kube-rbac-proxy:4.10.0
                Image:             quay.io/osd-addons/ocs-osd-deployer:2.0.1-1
                Image:             quay.io/osd-addons/ocs-osd-deployer:2.0.1-1
Consumer:
OCP Version: 4.10.8
ODF Version: 4.10.0-221
$ oc describe csv ocs-osd-deployer.v2.0.1|grep -i image
    Mediatype:   image/svg+xml
                Image:  quay.io/openshift/origin-kube-rbac-proxy:4.10.0
                Image:             quay.io/osd-addons/ocs-osd-deployer:2.0.1-1
                Image:             quay.io/osd-addons/ocs-osd-deployer:2.0.1-1

How reproducible:


Steps to Reproduce:
1.Get pods running on ip-10-0-221-86.ec2.internal
$ oc get pods -o wide | grep ip-10-0-221-86.ec2.internal
rook-ceph-mgr-a-667fffcd68-szdhn                                  2/2     Running     0          4h54m   10.0.221.86    ip-10-0-221-86.ec2.internal    <none>           <none>
rook-ceph-mon-a-6dd655b489-nlw9k                                  2/2     Running     0          4h54m   10.0.221.86                                2/2     Running     0          5h1m    10.0.221.86    

2.Mark the node as unschedulable using the following command:
$ oc adm cordon ip-10-0-221-86.ec2.internal
node/ip-10-0-221-86.ec2.internal cordoned
[odedviner@localhost auth]$ oc get node ip-10-0-221-86.ec2.internal
NAME                          STATUS                     ROLES    AGE   VERSION
ip-10-0-221-86.ec2.internal   Ready,SchedulingDisabled   worker   11h   v1.23.5+1f952b3

3.Drain the node using the following command:
$ oc adm drain ip-10-0-221-86.ec2.internal --force --delete-local-data --ignore-daemonsets
Flag --delete-local-data has been deprecated, This option is deprecated and will be deleted. Use --delete-emptydir-data.
node/ip-10-0-221-86.ec2.internal already cordoned

4.Click Compute → Machines. Search for the required machine. Besides the required machine, click the Action menu (⋮) → Delete Machine.

5.Check node status via cli:
$ oc get nodes ip-10-0-221-86.ec2.internal
Error from server (NotFound): nodes "ip-10-0-221-86.ec2.internal" not found

6.Click Compute → Nodes, confirm if the new node is in Ready state.
$ oc get nodes ip-10-0-207-181.ec2.internal
NAME                           STATUS   ROLES    AGE   VERSION
ip-10-0-207-181.ec2.internal   Ready    worker   93s   v1.23.5+1f952b3

7. Skipping step “Apply the OpenShift Data Foundation label to the new node using any one of the following:
Add cluster.ocs.openshift.io/openshift-storage” [No relevant for MS]

8.Verify that new OSD pods are running on the replacement node.
$ oc get pods -o wide | grep ip-10-0-207-181.ec2.internal

7. Check ceph status:
sh-4.4$ ceph status
    health: HEALTH_OK

8.Create PVC on consumer cluster 1: PVC stuck on Pending state
Events:
  Type    Reason                Age               From                                                                                                                Message
  ----    ------                ----              ----                                                                                                                -------
  Normal  Provisioning          73s               openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-6455fd4867-s2mnd_a9acc72e-8645-4021-b094-4b0b40a11585  External provisioner is provisioning volume for claim "io-pods1/simple-pvc"
  Normal  ExternalProvisioning  1s (x6 over 73s)  persistentvolume-controller                                                                                         waiting for a volume to be created, either by external provisioner "openshift-storage.rbd.csi.ceph.com" or manually created by system administrator

9.Check mon on provider
ceph mon dump
epoch 5
fsid 73a3682f-5652-4488-a647-5430149f0952
last_changed 2022-04-18T19:19:08.145721+0000
created 2022-04-18T08:03:59.335342+0000
min_mon_release 16 (pacific)
election_strategy: 1
0: [v2:10.0.187.174:3300/0,v1:10.0.187.174:6789/0] mon.b
1: [v2:10.0.157.87:3300/0,v1:10.0.157.87:6789/0] mon.c
2: [v2:10.0.207.181:3300/0,v1:10.0.207.181:6789/0] mon.d
dumped monmap epoch 5

10.On consumer the mon endpoints still use the IP of the deleted node
oc get cm rook-ceph-csi-config -oyaml
apiVersion: v1
data:
  csi-cluster-config-json: '[{"clusterID":"openshift-storage","monitors":["10.0.221.86:6789"]},{"clusterID":"6f5b8d799f0aa56a706ac347cace5e2c","monitors":["10.0.221.86:6789"],"cephFS":{"subvolumeGroup":"cephfilesystemsubvolumegroup-storageconsumer-458e6608-e7b7-4924-bf9f-92b14422bc09"}}]'
kind: ConfigMap
metadata:
  creationTimestamp: "2022-04-18T08:59:52Z"
  name: rook-ceph-csi-config
  namespace: openshift-storage
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: false
    controller: true
    kind: Deployment
    name: rook-ceph-operator
    uid: ecb8b476-1a61-4d73-a6b9-e524650499e2
  resourceVersion: "626365"
  uid: b9519161-c54e-44cb-9cb2-b7cc0e3e9855
oc get cm rook-ceph-mon-endpoints -oyaml
apiVersion: v1
data:
  data: a=10.0.221.86:6789
  mapping: '{}'
  maxMonId: "0"
kind: ConfigMap
metadata:
  creationTimestamp: "2022-04-18T08:59:50Z"
  finalizers:
  - ceph.rook.io/disaster-protection
  name: rook-ceph-mon-endpoints
  namespace: openshift-storage
  ownerReferences:
  - apiVersion: ceph.rook.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: CephCluster
    name: ocs-storagecluster-cephcluster
    uid: dae6a140-a42e-4bd6-96f0-5d364f78e91b
  resourceVersion: "626270"
  uid: ae42a228-0f7f-47ea-8109-481505f6e2d9
  

Actual results:
On consumer the mon endpoints still use the IP of the deleted node.


Expected results:
On consumer the mon endpoints use the IP of the new node.

Additional info:
https://docs.google.com/document/d/1XFhV13neKOFuF8_Z1rg2WU9XAq9R2mgcnOirgk4rT8U/edit

Comment 2 Ohad 2022-04-19 09:36:59 UTC
The behavior to update the mon list on the cephcluster exists and is handled inside the product (ocs-operator). 
I don't see a point to have this bug on the odf-managed-service component. 
Can we move it, please?

Comment 6 Oded 2022-04-19 15:17:23 UTC
The bug does not reproduce on another cluster. [same version]
move severity to high

Comment 7 Neha Berry 2022-04-20 06:23:27 UTC
(In reply to Oded from comment #6)
> The bug does not reproduce on another cluster. [same version]
> move severity to high

thanks Oded.

Comment 8 Oded 2022-04-20 13:06:40 UTC
This issue is reproduced again:
SetUp:
ODF

Test Process:
1.Get rook-ceph-mon-endpoints on C1 and C2
C1:
$ oc get cm rook-ceph-mon-endpoints -oyaml -n openshift-storage
apiVersion: v1
data:
  data: b=10.0.141.80:6789

C2:
$ oc get cm rook-ceph-mon-endpoints -oyaml -n openshift-storage
apiVersion: v1
data:
  data: b=10.0.141.80:6789


2.Replace node 10.0.141.80 on provider cluster [ip-10-0-141-80.ec2.internal]
$ oc get pods -o wide | grep mon-b
rook-ceph-mon-b-5b5944464b-fmmcs                                  2/2     Running     0              175m   10.0.141.80    ip-10-0-141-80.ec2.internal    <none>           <none>

3. Create PVC [FS+RBD]
PVC_FS - moved to Bound state
PVC_RBD - Stuck on Pending state

$ oc get pvc -n oded
NAME         STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
simple-pvc   Pending                                                                        ocs-storagecluster-ceph-rbd   32m
test2        Bound     pvc-694766e8-ec4e-4e83-92a7-571a177962a8   2Gi        RWO            ocs-storagecluster-cephfs     4m54s
test3        Pending                                                                        ocs-storagecluster-ceph-rbd   4m8s

Events:
  Type     Reason                Age                     From                                                                                                                Message
  ----     ------                ----                    ----                                                                                                                -------
  Warning  ProvisioningFailed    7m9s                    openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-6455fd4867-ccpx5_bcd9ac5d-a6ed-43ae-a908-88e3178a6ff6  failed to provision volume with StorageClass "ocs-storagecluster-ceph-rbd": rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Normal   ExternalProvisioning  3m26s (x26 over 9m39s)  persistentvolume-controller                                                                                         waiting for a volume to be created, either by external provisioner "openshift-storage.rbd.csi.ceph.com" or manually created by system administrator
  Normal   Provisioning          113s (x11 over 9m39s)   openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-6455fd4867-ccpx5_bcd9ac5d-a6ed-43ae-a908-88e3178a6ff6  External provisioner is provisioning volume for claim "oded/simple-pvc"
  Warning  ProvisioningFailed    113s (x10 over 7m8s)    openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-6455fd4867-ccpx5_bcd9ac5d-a6ed-43ae-a908-88e3178a6ff6  failed to provision volume with StorageClass "ocs-storagecluster-ceph-rbd": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-cf937385-0e7e-447b-b7cb-504588e568a7 already exists


4.Check rook-ceph-mon-endpoints on C1 and C2:

C1:
$ oc get cm rook-ceph-mon-endpoints -oyaml -n openshift-storage
apiVersion: v1
data:
  data: b=10.0.141.80:6789

C2:
$ oc get cm rook-ceph-mon-endpoints -oyaml -n openshift-storage
apiVersion: v1
data:
  data: b=10.0.141.80:6789

5. Run WA on C1: 
a. Respin ocs-operator
observation - rook-ceph-mon-endpoint configmap got updated with new IP but CSI-rook-config stayed unupdated

b. Then respinned the rook-ceph-operator and csi-configmap also got updated with new IPs

7.Check rook-ceph-mon-endpoints on C1 and C2:
$ oc get cm rook-ceph-mon-endpoints -oyaml -n openshift-storage
apiVersion: v1
data:
  data: a=10.0.212.251:6789

$ oc get cm rook-ceph-mon-endpoints -oyaml -n openshift-storage
apiVersion: v1
data:
  data: b=10.0.141.80:6789


8. PVCs on C1 moved to bound state [RBD+FS]
$ oc get pvc -n oded
NAME          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
test55        Bound    pvc-bb290310-1cee-44ae-8a6d-88812020c383   2Gi        RWO            ocs-storagecluster-cephfs     101m
test6         Bound    pvc-118e3689-dc84-4d85-9023-81507a13d578   2Gi        RWO            ocs-storagecluster-ceph-rbd   93m
test9         Bound    pvc-561b2021-54ce-416a-8bda-a3143946d49e   1Gi        RWX            ocs-storagecluster-ceph-rbd   92m


Logs:
Logs from all the 3 clusters - http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/PVC-issue-mon-endpoint-2076457/reproduce-node-repl/

zipped:
http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/PVC-issue-mon-endpoint-2076457/reproduce-node-repl.zip

Logs from C1 after the Workaround:
http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/PVC-issue-mon-endpoint-2076457/reproduce-node-repl/C1-WA-applied-few-logs/

Comment 10 Nitin Goyal 2022-04-21 12:51:29 UTC
We had a call yesterday b/w me, Seb, Ohad, Parth and we decided to provide a custom build of API server which will return all mon endpoints instead of 1. We verified that the custom build of the API server is working and providing all endpoints. But we are still not able to create a file PVCs on the consumer even though we have all 3 correct endpoints in the `rook-ceph-mon-endpoints` configmap. Ceph Cluster state is also healthy.


Chat thread for more info:
https://chat.google.com/room/AAAASHA9vWs/TJcGGyOuBJU


@shan Can you pls take a look at the setup again?


We also realized that the `rook-ceph-csi-config` configmap still has one old mon endpoints `10.0.141.80` which is not available anymore.


$ oc get cm rook-ceph-csi-config rook-ceph-mon-endpoints -o yaml
apiVersion: v1
items:
- apiVersion: v1
  data:
    csi-cluster-config-json: '[{"clusterID":"b7658893757ca59147f3d74393e36215","monitors":["10.0.141.80:6789"],"cephFS":{"subvolumeGroup":"cephfilesystemsubvolumegroup-storageconsumer-d63b8440-f498-4d72-ae05-1306d6e6cc83"}},{"clusterID":"openshift-storage","monitors":["10.0.144.250:6789","10.0.171.89:6789","10.0.197.112:6789"]}]'
  kind: ConfigMap
  metadata:
    creationTimestamp: "2022-04-20T08:07:59Z"
    name: rook-ceph-csi-config
    namespace: openshift-storage
    ownerReferences:
    - apiVersion: apps/v1
      blockOwnerDeletion: false
      controller: true
      kind: Deployment
      name: rook-ceph-operator
      uid: e3268a48-8f99-45d8-aaee-abc0e268d363
    resourceVersion: "1420666"
    uid: ba502d8e-a5c8-4810-905c-33a1d85adf68
- apiVersion: v1
  data:
    data: c=10.0.171.89:6789,e=10.0.197.112:6789,d=10.0.144.250:6789
    mapping: '{}'
    maxMonId: "0"
  kind: ConfigMap
  metadata:
    creationTimestamp: "2022-04-20T08:07:57Z"
    finalizers:
    - ceph.rook.io/disaster-protection
    name: rook-ceph-mon-endpoints
    namespace: openshift-storage
    ownerReferences:
    - apiVersion: ceph.rook.io/v1
      blockOwnerDeletion: true
      controller: true
      kind: CephCluster
      name: ocs-storagecluster-cephcluster
      uid: c1c61bd0-1da1-48b5-b174-96323a6274d1
    resourceVersion: "1404875"
    uid: 66f3c208-45d2-49bf-919f-7685b43f43fe
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""


@paarora Pls add if I am missing something.

Comment 11 Travis Nielsen 2022-04-21 16:46:58 UTC
Discussed with Madhu and we need an update to rook and the csi driver to handle the csi configmap differently. The update is tracked here: https://github.com/rook/rook/issues/10126

Comment 13 Travis Nielsen 2022-04-22 16:26:36 UTC
PR is in review now for updating the mon endpoints when there is a mon failover.

Comment 14 Travis Nielsen 2022-04-22 20:15:06 UTC
Fix is merged downstream now with https://github.com/red-hat-storage/rook/pull/372

Comment 15 Sahina Bose 2022-04-25 06:55:17 UTC
Do we have a bug tracking for backport to 4.10?

Comment 16 Madhu Rajanna 2022-04-26 05:29:08 UTC
Created Clone BZ, removing NI on me.

Comment 19 Parth Arora 2022-05-12 14:35:32 UTC
For all mons IP being not updated in the rook-ceph-mon-endpoints cm 
We need to have a change in OCS-operator, to override it correctly while reconciling
https://github.com/red-hat-storage/ocs-operator/pull/1673

Comment 21 Oded 2022-06-28 20:33:15 UTC
Bug fixed

SetUp Provider:
OCP Version: 4.10.18
ODF Version: 4.11.0-104

SetUp Consumer:
OCP Version: 4.10.18
ODF Version: 4.11.0-105

for more info https://docs.google.com/document/d/11s__tQ_mxSGRl-I4Z6Y4L-lwjPcf1aGv5CtfkQvZXGE/edit

Comment 25 errata-xmlrpc 2022-08-24 13:51:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.11.0 security, enhancement, & bugfix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:6156


Note You need to log in before you can comment on or make changes to this bug.