2043349 – mon c is taking long time to come up

Bug 2043349 - mon c is taking long time to come up

Summary: mon c is taking long time to come up

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	rook
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Travis Nielsen
QA Contact:	Elad
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-01-21 05:00 UTC by Vijay Avuthu
Modified:	2023-08-09 17:03 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-01-24 20:05:21 UTC
Embargoed:

Attachments	(Terms of Use)

Description Vijay Avuthu 2022-01-21 05:00:19 UTC

Description of problem (please be detailed as possible and provide log
snippests):

mon c is not coming up within prescribed time ( 10 minutes )  which leads to deployment failures


Version of all relevant components (if applicable):

openshift installer (4.10.0-0.nightly-2022-01-18-044014)
ODF: 4.10.0-79



Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

production jobs are failing due to timeout

Is there any workaround available to the best of your knowledge?
Yes

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
2/2

Can this issue reproduce from the UI?
No

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. install odf using ocs-ci
2. 
3.


Actual results:

deployment fails due to mon not comeup within time ( 10 min )


Expected results:

All mons should come up within time ( 10 min )


Additional info:

2022-01-20 00:13:54  18:43:54 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n openshift-storage get Pod  -n openshift-storage --selector=app=rook-ceph-mon -o yaml
2022-01-20 00:13:54  18:43:54 - MainThread - ocs_ci.ocs.ocp - INFO - status of  at column STATUS - item(s) were [], but we were waiting for all 3 of them to be Running
2022-01-20 00:13:54  18:43:54 - MainThread - ocs_ci.utility.utils - INFO - Going to sleep for 3 seconds before next iteration
.
.
.
2022-01-20 00:23:56  18:53:56 - MainThread - ocs_ci.ocs.ocp - INFO - status of  at column STATUS - item(s) were ['Running', 'Running', 'Init:0/2'], but we were waiting for all 3 of them to be Running
2022-01-20 00:23:56  18:53:56 - MainThread - ocs_ci.ocs.ocp - ERROR - timeout expired: Timed out after 600s running get("", True, "app=rook-ceph-mon")

must gather logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-147ai3c33-a/j-147ai3c33-a_20220119T181148/logs/failed_testcase_ocs_logs_1642616149/test_deployment_ocs_logs/


Job: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/2995//consoleFull

Comment 4 Subham Rai 2022-01-21 05:34:20 UTC

I looked at all the three job `pod -o wide` list. Mon's are running

```
rook-ceph-mon-a-65f96fd76-z5jd2                                   2/2     Running                0          19m     10.128.2.32    ip-10-0-204-241.us-east-2.compute.internal   <none>           <none>
rook-ceph-mon-b-657d487cf6-p4dqx                                  2/2     Running                0          14m     10.129.2.23    ip-10-0-171-88.us-east-2.compute.internal    <none>           <none>
rook-ceph-mon-c-5d7d874fc6-dxpgn                                  2/2     Running                0          11m     10.131.0.22    ip-10-0-150-93.us-east-2.compute.internal    <none>           <none>
```

```
rook-ceph-mon-a-74679bb5f9-8ntzp                                  2/2     Running     0          16m     10.128.4.6     ip-10-0-154-239.us-east-2.compute.internal   <none>           <none>
rook-ceph-mon-b-57fccc84c7-2qghs                                  2/2     Running     0          12m     10.131.2.7     ip-10-0-165-28.us-east-2.compute.internal    <none>           <none>
rook-ceph-mon-c-84bcc64fb7-ml7xf                                  2/2     Running     0          9m49s   10.130.2.7     ip-10-0-211-40.us-east-2.compute.internal    <none>           <none>

```

```
rook-ceph-mon-a-5f886d99dd-xhdgd                                  2/2     Running                0          23m     10.131.0.27    ip-10-0-154-201.us-east-2.compute.internal   <none>           <none>
rook-ceph-mon-b-5899cb546d-dfwlv                                  2/2     Running                0          18m     10.129.2.27    ip-10-0-180-236.us-east-2.compute.internal   <none>           <none>
rook-ceph-mon-c-d4965988-lvpts                                    2/2     Running                0          18m     10.128.2.26    ip-10-0-193-74.us-east-2.compute.internal    <none>           <none>
```

so, it is basically the timeout limit.

Comment 5 Vijay Avuthu 2022-01-21 05:56:14 UTC

(In reply to Subham Rai from comment #4)
> I looked at all the three job `pod -o wide` list. Mon's are running
> 
> ```
> rook-ceph-mon-a-65f96fd76-z5jd2                                   2/2    
> Running                0          19m     10.128.2.32   
> ip-10-0-204-241.us-east-2.compute.internal   <none>           <none>
> rook-ceph-mon-b-657d487cf6-p4dqx                                  2/2    
> Running                0          14m     10.129.2.23   
> ip-10-0-171-88.us-east-2.compute.internal    <none>           <none>
> rook-ceph-mon-c-5d7d874fc6-dxpgn                                  2/2    
> Running                0          11m     10.131.0.22   
> ip-10-0-150-93.us-east-2.compute.internal    <none>           <none>
> ```
> 
> ```
> rook-ceph-mon-a-74679bb5f9-8ntzp                                  2/2    
> Running     0          16m     10.128.4.6    
> ip-10-0-154-239.us-east-2.compute.internal   <none>           <none>
> rook-ceph-mon-b-57fccc84c7-2qghs                                  2/2    
> Running     0          12m     10.131.2.7    
> ip-10-0-165-28.us-east-2.compute.internal    <none>           <none>
> rook-ceph-mon-c-84bcc64fb7-ml7xf                                  2/2    
> Running     0          9m49s   10.130.2.7    
> ip-10-0-211-40.us-east-2.compute.internal    <none>           <none>
> 
> ```
> 
> ```
> rook-ceph-mon-a-5f886d99dd-xhdgd                                  2/2    
> Running                0          23m     10.131.0.27   
> ip-10-0-154-201.us-east-2.compute.internal   <none>           <none>
> rook-ceph-mon-b-5899cb546d-dfwlv                                  2/2    
> Running                0          18m     10.129.2.27   
> ip-10-0-180-236.us-east-2.compute.internal   <none>           <none>
> rook-ceph-mon-c-d4965988-lvpts                                    2/2    
> Running                0          18m     10.128.2.26   
> ip-10-0-193-74.us-east-2.compute.internal    <none>           <none>
> ```
> 
> so, it is basically the timeout limit.


Eventually all mons are up but not within timeout of ocs-ci ( 10 min ).  In previous runs we don't have this issue

Comment 6 Subham Rai 2022-01-21 06:03:19 UTC

```
2022-01-19T18:45:59.778073100Z 2022-01-19 18:45:59.778047 E | ceph-spec: failed to update cluster condition to {Type:Progressing Status:True Reason:ClusterProgressing Message:Configuring the Ceph cluster LastHeartbeatTime:2022-01-19 18:45:59.764508577 +0000 UTC m=+51.268847474 LastTransitionTime:2022-01-19 18:45:59.764508483 +0000 UTC m=+51.268847404}. failed to update object "openshift-storage/ocs-storagecluster-cephcluster" status: Operation cannot be fulfilled on cephclusters.ceph.rook.io "ocs-storagecluster-cephcluster": the object has been modified; please apply your changes to the latest version and try again
2022-01-19T18:45:59.822985205Z 2022-01-19 18:45:59.822956 I | op-mon: start running mons
2022-01-19T18:45:59.878277636Z 2022-01-19 18:45:59.878242 I | op-mon: creating mon secrets for a new cluster
2022-01-19T18:45:59.896235963Z 2022-01-19 18:45:59.896212 I | op-mon: existing maxMonID not found or failed to load. configmaps "rook-ceph-mon-endpoints" not found
2022-01-19T18:45:59.905490139Z 2022-01-19 18:45:59.905460 I | op-mon: saved mon endpoints to config map map[csi-cluster-config-json:[{"clusterID":"openshift-storage","monitors":[]}] data: mapping:{"node":{}} maxMonId:-1]
2022-01-19T18:46:00.295065789Z 2022-01-19 18:46:00.295026 I | cephclient: writing config file /var/lib/rook/openshift-storage/openshift-storage.config
2022-01-19T18:46:00.295160022Z 2022-01-19 18:46:00.295142 I | cephclient: generated admin config in /var/lib/rook/openshift-storage
2022-01-19T18:46:01.268762642Z 2022-01-19 18:46:01.268714 E | clusterdisruption-controller: failed to check cluster health: failed to get status. . unable to get monitor info from DNS SRV with service name: ceph-mon
2022-01-19T18:46:01.268762642Z 2022-01-19T18:46:01.266+0000 7f3717608700 -1 failed for service _ceph-mon._tcp
2022-01-19T18:46:01.268762642Z 2022-01-19T18:46:01.266+0000 7f3717608700 -1 monclient: get_monmap_and_config cannot identify monitors to contact
2022-01-19T18:46:01.268762642Z [errno 2] RADOS object not found (error connecting to the cluster): exit status 1
2022-01-19T18:46:01.378317042Z 2022-01-19 18:46:01.378277 E | clusterdisruption-controller: failed to check cluster health: failed to get status. . unable to get monitor info from DNS SRV with service name: ceph-mon
2022-01-19T18:46:01.378317042Z 2022-01-19T18:46:01.376+0000 7fbd066af700 -1 failed for service _ceph-mon._tcp
2022-01-19T18:46:01.378317042Z 2022-01-19T18:46:01.376+0000 7fbd066af700 -1 monclient: get_monmap_and_config cannot identify monitors to contact
2022-01-19T18:46:01.378317042Z [errno 2] RADOS object not found (error connecting to the cluster): exit status 1
2022-01-19T18:46:01.489660405Z 2022-01-19 18:46:01.489608 E | clusterdisruption-controller: failed to check cluster health: failed to get status. . unable to get monitor info from DNS SRV with service name: ceph-mon
2022-01-19T18:46:01.489660405Z 2022-01-19T18:46:01.487+0000 7f17a9ed4700 -1 failed for service _ceph-mon._tcp
2022-01-19T18:46:01.489660405Z 2022-01-19T18:46:01.487+0000 7f17a9ed4700 -1 monclient: get_monmap_and_config cannot identify monitors to contact
2022-01-19T18:46:01.489660405Z [errno 2] RADOS object not found (error connecting to the cluster): exit status 1
2022-01-19T18:46:01.498866557Z 2022-01-19 18:46:01.498833 I | op-mon: targeting the mon count 3
2022-01-19T18:46:01.610831354Z 2022-01-19 18:46:01.610788 E | clusterdisruption-controller: failed to check cluster health: failed to get status. . unable to get monitor info from DNS SRV with service name: ceph-mon
2022-01-19T18:46:01.610831354Z 2022-01-19T18:46:01.608+0000 7f9a8f414700 -1 failed for service _ceph-mon._tcp
2022-01-19T18:46:01.610831354Z 2022-01-19T18:46:01.608+0000 7f9a8f414700 -1 monclient: get_monmap_and_config cannot identify monitors to contact
2022-01-19T18:46:01.610831354Z [errno 2] RADOS object not found (error connecting to the cluster): exit status 1
2022-01-19T18:46:01.705903884Z 2022-01-19 18:46:01.705870 I | op-mon: created canary monitor rook-ceph-mon-a-canary pvc rook-ceph-mon-a
2022-01-19T18:46:01.717935407Z 2022-01-19 18:46:01.717902 I | op-mon: created canary deployment rook-ceph-mon-a-canary
2022-01-19T18:46:01.754452637Z 2022-01-19 18:46:01.754409 E | clusterdisruption-controller: failed to check cluster health: failed to get status. . unable to get monitor info from DNS SRV with service name: ceph-mon
2022-01-19T18:46:01.754452637Z 2022-01-19T18:46:01.752+0000 7f6ccabd5700 -1 failed for service _ceph-mon._tcp
2022-01-19T18:46:01.754452637Z 2022-01-19T18:46:01.752+0000 7f6ccabd5700 -1 monclient: get_monmap_and_config cannot identify monitors to contact
2022-01-19T18:46:01.754452637Z [errno 2] RADOS object not found (error connecting to the cluster): exit status 1
2022-01-19T18:46:01.942231333Z 2022-01-19 18:46:01.942184 E | clusterdisruption-controller: failed to check cluster health: failed to get status. . unable to get monitor info from DNS SRV with service name: ceph-mon
2022-01-19T18:46:01.942231333Z 2022-01-19T18:46:01.939+0000 7f902f1e9700 -1 failed for service _ceph-mon._tcp
2022-01-19T18:46:01.942231333Z 2022-01-19T18:46:01.939+0000 7f902f1e9700 -1 monclient: get_monmap_and_config cannot identify monitors to contact
2022-01-19T18:46:01.942231333Z [errno 2] RADOS object not found (error connecting to the cluster): exit status 1
2022-01-19T18:46:02.101964896Z 2022-01-19 18:46:02.101930 I | op-mon: created canary monitor rook-ceph-mon-b-canary pvc rook-ceph-mon-b
2022-01-19T18:46:02.114259672Z 2022-01-19 18:46:02.114222 I | op-mon: created canary deployment rook-ceph-mon-b-canary
2022-01-19T18:46:02.207186987Z 2022-01-19 18:46:02.207138 E | clusterdisruption-controller: failed to check cluster health: failed to get status. . unable to get monitor info from DNS SRV with service name: ceph-mon
2022-01-19T18:46:02.207186987Z 2022-01-19T18:46:02.205+0000 7f64541bc700 -1 failed for service _ceph-mon._tcp
2022-01-19T18:46:02.207186987Z 2022-01-19T18:46:02.205+0000 7f64541bc700 -1 monclient: get_monmap_and_config cannot identify monitors to contact
2022-01-19T18:46:02.207186987Z [errno 2] RADOS object not found (error connecting to the cluster): exit status 1
2022-01-19T18:46:02.500750916Z 2022-01-19 18:46:02.500711 I | op-mon: created canary monitor rook-ceph-mon-c-canary pvc rook-ceph-mon-c
2022-01-19T18:46:02.520347381Z 2022-01-19 18:46:02.520318 I | op-mon: created canary deployment rook-ceph-mon-c-canary
2022-01-19T18:46:02.632430320Z 2022-01-19 18:46:02.632390 E | clusterdisruption-controller: failed to check cluster health: failed to get status. . unable to get monitor info from DNS SRV with service name: ceph-mon
2022-01-19T18:46:02.632430320Z 2022-01-19T18:46:02.630+0000 7fef45a9a700 -1 failed for service _ceph-mon._tcp
2022-01-19T18:46:02.632430320Z 2022-01-19T18:46:02.630+0000 7fef45a9a700 -1 monclient: get_monmap_and_config cannot identify monitors to contact
2022-01-19T18:46:02.632430320Z [errno 2] RADOS object not found (error connecting to the cluster): exit status 1
```

This is a common error in all rook-ceph-operator. 
Question: these are fresh clusters?

Comment 7 Vijay Avuthu 2022-01-21 06:36:37 UTC

(In reply to Subham Rai from comment #6)
> ```
> 2022-01-19T18:45:59.778073100Z 2022-01-19 18:45:59.778047 E | ceph-spec:
> failed to update cluster condition to {Type:Progressing Status:True
> Reason:ClusterProgressing Message:Configuring the Ceph cluster
> LastHeartbeatTime:2022-01-19 18:45:59.764508577 +0000 UTC m=+51.268847474
> LastTransitionTime:2022-01-19 18:45:59.764508483 +0000 UTC m=+51.268847404}.
> failed to update object "openshift-storage/ocs-storagecluster-cephcluster"
> status: Operation cannot be fulfilled on cephclusters.ceph.rook.io
> "ocs-storagecluster-cephcluster": the object has been modified; please apply
> your changes to the latest version and try again
> 2022-01-19T18:45:59.822985205Z 2022-01-19 18:45:59.822956 I | op-mon: start
> running mons
> 2022-01-19T18:45:59.878277636Z 2022-01-19 18:45:59.878242 I | op-mon:
> creating mon secrets for a new cluster
> 2022-01-19T18:45:59.896235963Z 2022-01-19 18:45:59.896212 I | op-mon:
> existing maxMonID not found or failed to load. configmaps
> "rook-ceph-mon-endpoints" not found
> 2022-01-19T18:45:59.905490139Z 2022-01-19 18:45:59.905460 I | op-mon: saved
> mon endpoints to config map
> map[csi-cluster-config-json:[{"clusterID":"openshift-storage","monitors":
> []}] data: mapping:{"node":{}} maxMonId:-1]
> 2022-01-19T18:46:00.295065789Z 2022-01-19 18:46:00.295026 I | cephclient:
> writing config file /var/lib/rook/openshift-storage/openshift-storage.config
> 2022-01-19T18:46:00.295160022Z 2022-01-19 18:46:00.295142 I | cephclient:
> generated admin config in /var/lib/rook/openshift-storage
> 2022-01-19T18:46:01.268762642Z 2022-01-19 18:46:01.268714 E |
> clusterdisruption-controller: failed to check cluster health: failed to get
> status. . unable to get monitor info from DNS SRV with service name: ceph-mon
> 2022-01-19T18:46:01.268762642Z 2022-01-19T18:46:01.266+0000 7f3717608700 -1
> failed for service _ceph-mon._tcp
> 2022-01-19T18:46:01.268762642Z 2022-01-19T18:46:01.266+0000 7f3717608700 -1
> monclient: get_monmap_and_config cannot identify monitors to contact
> 2022-01-19T18:46:01.268762642Z [errno 2] RADOS object not found (error
> connecting to the cluster): exit status 1
> 2022-01-19T18:46:01.378317042Z 2022-01-19 18:46:01.378277 E |
> clusterdisruption-controller: failed to check cluster health: failed to get
> status. . unable to get monitor info from DNS SRV with service name: ceph-mon
> 2022-01-19T18:46:01.378317042Z 2022-01-19T18:46:01.376+0000 7fbd066af700 -1
> failed for service _ceph-mon._tcp
> 2022-01-19T18:46:01.378317042Z 2022-01-19T18:46:01.376+0000 7fbd066af700 -1
> monclient: get_monmap_and_config cannot identify monitors to contact
> 2022-01-19T18:46:01.378317042Z [errno 2] RADOS object not found (error
> connecting to the cluster): exit status 1
> 2022-01-19T18:46:01.489660405Z 2022-01-19 18:46:01.489608 E |
> clusterdisruption-controller: failed to check cluster health: failed to get
> status. . unable to get monitor info from DNS SRV with service name: ceph-mon
> 2022-01-19T18:46:01.489660405Z 2022-01-19T18:46:01.487+0000 7f17a9ed4700 -1
> failed for service _ceph-mon._tcp
> 2022-01-19T18:46:01.489660405Z 2022-01-19T18:46:01.487+0000 7f17a9ed4700 -1
> monclient: get_monmap_and_config cannot identify monitors to contact
> 2022-01-19T18:46:01.489660405Z [errno 2] RADOS object not found (error
> connecting to the cluster): exit status 1
> 2022-01-19T18:46:01.498866557Z 2022-01-19 18:46:01.498833 I | op-mon:
> targeting the mon count 3
> 2022-01-19T18:46:01.610831354Z 2022-01-19 18:46:01.610788 E |
> clusterdisruption-controller: failed to check cluster health: failed to get
> status. . unable to get monitor info from DNS SRV with service name: ceph-mon
> 2022-01-19T18:46:01.610831354Z 2022-01-19T18:46:01.608+0000 7f9a8f414700 -1
> failed for service _ceph-mon._tcp
> 2022-01-19T18:46:01.610831354Z 2022-01-19T18:46:01.608+0000 7f9a8f414700 -1
> monclient: get_monmap_and_config cannot identify monitors to contact
> 2022-01-19T18:46:01.610831354Z [errno 2] RADOS object not found (error
> connecting to the cluster): exit status 1
> 2022-01-19T18:46:01.705903884Z 2022-01-19 18:46:01.705870 I | op-mon:
> created canary monitor rook-ceph-mon-a-canary pvc rook-ceph-mon-a
> 2022-01-19T18:46:01.717935407Z 2022-01-19 18:46:01.717902 I | op-mon:
> created canary deployment rook-ceph-mon-a-canary
> 2022-01-19T18:46:01.754452637Z 2022-01-19 18:46:01.754409 E |
> clusterdisruption-controller: failed to check cluster health: failed to get
> status. . unable to get monitor info from DNS SRV with service name: ceph-mon
> 2022-01-19T18:46:01.754452637Z 2022-01-19T18:46:01.752+0000 7f6ccabd5700 -1
> failed for service _ceph-mon._tcp
> 2022-01-19T18:46:01.754452637Z 2022-01-19T18:46:01.752+0000 7f6ccabd5700 -1
> monclient: get_monmap_and_config cannot identify monitors to contact
> 2022-01-19T18:46:01.754452637Z [errno 2] RADOS object not found (error
> connecting to the cluster): exit status 1
> 2022-01-19T18:46:01.942231333Z 2022-01-19 18:46:01.942184 E |
> clusterdisruption-controller: failed to check cluster health: failed to get
> status. . unable to get monitor info from DNS SRV with service name: ceph-mon
> 2022-01-19T18:46:01.942231333Z 2022-01-19T18:46:01.939+0000 7f902f1e9700 -1
> failed for service _ceph-mon._tcp
> 2022-01-19T18:46:01.942231333Z 2022-01-19T18:46:01.939+0000 7f902f1e9700 -1
> monclient: get_monmap_and_config cannot identify monitors to contact
> 2022-01-19T18:46:01.942231333Z [errno 2] RADOS object not found (error
> connecting to the cluster): exit status 1
> 2022-01-19T18:46:02.101964896Z 2022-01-19 18:46:02.101930 I | op-mon:
> created canary monitor rook-ceph-mon-b-canary pvc rook-ceph-mon-b
> 2022-01-19T18:46:02.114259672Z 2022-01-19 18:46:02.114222 I | op-mon:
> created canary deployment rook-ceph-mon-b-canary
> 2022-01-19T18:46:02.207186987Z 2022-01-19 18:46:02.207138 E |
> clusterdisruption-controller: failed to check cluster health: failed to get
> status. . unable to get monitor info from DNS SRV with service name: ceph-mon
> 2022-01-19T18:46:02.207186987Z 2022-01-19T18:46:02.205+0000 7f64541bc700 -1
> failed for service _ceph-mon._tcp
> 2022-01-19T18:46:02.207186987Z 2022-01-19T18:46:02.205+0000 7f64541bc700 -1
> monclient: get_monmap_and_config cannot identify monitors to contact
> 2022-01-19T18:46:02.207186987Z [errno 2] RADOS object not found (error
> connecting to the cluster): exit status 1
> 2022-01-19T18:46:02.500750916Z 2022-01-19 18:46:02.500711 I | op-mon:
> created canary monitor rook-ceph-mon-c-canary pvc rook-ceph-mon-c
> 2022-01-19T18:46:02.520347381Z 2022-01-19 18:46:02.520318 I | op-mon:
> created canary deployment rook-ceph-mon-c-canary
> 2022-01-19T18:46:02.632430320Z 2022-01-19 18:46:02.632390 E |
> clusterdisruption-controller: failed to check cluster health: failed to get
> status. . unable to get monitor info from DNS SRV with service name: ceph-mon
> 2022-01-19T18:46:02.632430320Z 2022-01-19T18:46:02.630+0000 7fef45a9a700 -1
> failed for service _ceph-mon._tcp
> 2022-01-19T18:46:02.632430320Z 2022-01-19T18:46:02.630+0000 7fef45a9a700 -1
> monclient: get_monmap_and_config cannot identify monitors to contact
> 2022-01-19T18:46:02.632430320Z [errno 2] RADOS object not found (error
> connecting to the cluster): exit status 1
> ```
> 
> This is a common error in all rook-ceph-operator. 
> Question: these are fresh clusters?

All are fresh clusters

Comment 8 Travis Nielsen 2022-01-24 20:05:21 UTC

Vijay If you look at the "oc describe pod <mon>", you'll see what is taking so long for each mon to start. There is no guarantee that the mons will all be up within any time limit like 10 minutes. If it's a slow environment, then it will just take longer. Either you need to increase the timeout in the CI, or troubleshoot what is taking so long in that environment. 

Closing since it's a CI issue, rather than a product issue.

Note You need to log in before you can comment on or make changes to this bug.