I ran deployment + tier1 with build: quay.io/rhceph-dev/ocs-registry:4.6.4-311.ci I think that based on the execution I did: https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/1564/ Logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/pbalogh-ibmcloud/pbalogh-ibmcloud_20210324T154933 I was able to deploy the cluster and haven't seen mon pods stuck. I shared kubeconfig (http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/pbalogh-ibmcloud/pbalog[…]ud_20210324T154933/openshift-cluster-dir/auth/kubeconfig) to the cluster with IBM Guys/ Akash to confirm that.
Rohan, can you confirm if the build has the fix?
We confirmed that build with fix is not working.
Moving this out of 4.6.4 as we can't delay 4.6. for this fix.
Looks like the build we tested in with didn't have the patch: https://storage-jenkins-csb-ceph.cloud.paas.psi.redhat.com/job/ocs-ci/311/ -> https://storage-jenkins-csb-ceph.cloud.paas.psi.redhat.com/job/OCS%20Build%20Pipeline%204.6/174/artifact/ocs_operator_tag.txt -> ocs-operator tag 4.6-83.d9600491.release_4.6 When we tested with the patched version, the timeout was set to 15 minutes correctly. We made a mistake when verifying if the patch was in the build earlier.
@muagarwa can we move this back to 4.6.4?
Providing the dev_ack, lets wait for QA
I see the patch in the latest build: https://storage-jenkins-csb-ceph.cloud.paas.psi.redhat.com/job/ocs-ci/322/
Deployed new once cluster with RC2 build of 4.6.4 and here is kubeconfig which I provided to Akash to take a look at cluster: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/pbaloghibmcloud/pbaloghibmcloud_20210330T101320/openshift-cluster-dir/auth/kubeconfig Deployed here: https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/1674/
I have verified the mon timeout on the cluster provided by @pbalogh and it was set to 15 minutes. The OCS version on the cluster is : 4.7.0-330.ci
Hey Shrisha, yesterday about 3-4pm Brno time I upgraded the cluster so I got confirmed from Akash that you are done with testing on this cluster so I used it for upgrade testing. So when you worked on cluster yesterday it was: v4.6.4-323.ci So I will mark it as verified. Thanks
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Container Storage 4.6.4 container bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:1134