Bug 1937837

Summary: [ROKS] OCS deployment stuck at mon pod in pending state
Product: [Red Hat Storage] Red Hat OpenShift Container Storage Reporter: Mudit Agarwal <muagarwa>
Component: ocs-operatorAssignee: Rohan Gupta <rohgupta>
Status: CLOSED ERRATA QA Contact: Petr Balogh <pbalogh>
Severity: high Docs Contact:
Priority: high    
Version: 4.6CC: akgunjal, assingh, bniver, jdurgin, kvellalo, madam, muagarwa, ocs-bugs, pbalogh, ratamir, rcyriac, rohgupta, rojoseph, sabose, shrao, sostapov, tnielsen
Target Milestone: ---Keywords: AutomationBackLog, ZStream
Target Release: OCS 4.6.4   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: 1922421 Environment:
Last Closed: 2021-04-08 10:29:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1922421    
Bug Blocks: 1931424    

Comment 6 Petr Balogh 2021-03-25 08:45:03 UTC
I ran deployment + tier1 with build:
quay.io/rhceph-dev/ocs-registry:4.6.4-311.ci

I think that based on the execution I did:
https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/1564/
Logs:
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/pbalogh-ibmcloud/pbalogh-ibmcloud_20210324T154933

I was able to deploy the cluster and haven't seen mon pods stuck.
I shared kubeconfig (http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/pbalogh-ibmcloud/pbalog[…]ud_20210324T154933/openshift-cluster-dir/auth/kubeconfig) to the cluster with IBM Guys/ Akash to confirm that.

Comment 7 Sahina Bose 2021-03-26 08:30:30 UTC
Rohan, can you confirm if the build has the fix?

Comment 8 Rohan CJ 2021-03-26 08:51:38 UTC
We confirmed that build with fix is not working.

Comment 10 Mudit Agarwal 2021-03-26 17:05:10 UTC
Moving this out of 4.6.4 as we can't delay 4.6. for this fix.

Comment 11 Rohan CJ 2021-03-29 07:32:57 UTC
Looks like the build we tested in with didn't have the patch:

https://storage-jenkins-csb-ceph.cloud.paas.psi.redhat.com/job/ocs-ci/311/ -> https://storage-jenkins-csb-ceph.cloud.paas.psi.redhat.com/job/OCS%20Build%20Pipeline%204.6/174/artifact/ocs_operator_tag.txt -> ocs-operator tag 4.6-83.d9600491.release_4.6


When we tested with the patched version, the timeout was set to 15 minutes correctly.

We made a mistake when verifying if the patch was in the build earlier.

Comment 12 Rohan CJ 2021-03-29 07:36:59 UTC
@muagarwa can we move this back to 4.6.4?

Comment 13 Mudit Agarwal 2021-03-29 10:48:01 UTC
Providing the dev_ack, lets wait for QA

Comment 14 Rohan CJ 2021-03-29 13:20:31 UTC
I see the patch in the latest build: https://storage-jenkins-csb-ceph.cloud.paas.psi.redhat.com/job/ocs-ci/322/

Comment 15 Petr Balogh 2021-03-30 12:07:32 UTC
Deployed new once cluster with RC2 build of 4.6.4 and here is kubeconfig which I provided to Akash to take a look at cluster:
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/pbaloghibmcloud/pbaloghibmcloud_20210330T101320/openshift-cluster-dir/auth/kubeconfig

Deployed here:
https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/1674/

Comment 18 Shirisha S Rao 2021-04-01 03:31:11 UTC
I have verified the mon timeout on the cluster provided by @pbalogh and it was set to 15 minutes.
The OCS version on the cluster is : 4.7.0-330.ci

Comment 19 Petr Balogh 2021-04-01 11:14:28 UTC
Hey Shrisha,

yesterday about 3-4pm Brno time I upgraded the cluster so I got confirmed from Akash that you are done with testing on this cluster so I used it for upgrade testing.

So when you worked on cluster yesterday it was:
v4.6.4-323.ci 

So I will mark it as verified.

Thanks

Comment 23 errata-xmlrpc 2021-04-08 10:29:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Container Storage 4.6.4 container bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1134