Description of problem: During UPI deployment on vmware we see that we weren't able to finish command: /home/jenkins/bin/openshift-install wait-for install-complete --dir /home/jenkins/current-cluster-dir/openshift-cluster-dir --log-level INFO Our jenkins job: https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/4860/console Actually we have 30 mins timeout to run this command and it got interrupted from our function run_cmd, so didn't finish in this timeout but I think that 30 mins is reasonable timeout as it's also used in IPI installer IIRC. Should we wait more or this 30 mins is enough? If you can see some more details from logs please help us to understand what is wrong and stabilize our CI deployment over VmWare. Thanks Installer logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-vu1cs33-t1/jnk-vu1cs33-t1_20200222T074822/logs/openshift_install_create_cluster_1582362132.log Must gather logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-vu1cs33-t1/jnk-vu1cs33-t1_20200222T074822/logs/failed_testcase_ocs_logs_1582358295/deployment_ocs_logs/ocp_must_gather/quay-io-openshift-origin-must-gather-sha256-ee4eae4c297a6f0c80de95d12266c61f7348349a3e72d909a294644e8371e3aa/ Version-Release number of selected component (if applicable): 4.3.1 GA version How reproducible: Steps to Reproduce: 1. Install UPI on VmWare 2. Than wait for install complete with mentioned command Actual results: We didn't get cluster ready in 30 mins Expected results: Have installation done successfully.
Moving to Auth as the operator is still progressing. And as for the timeout, the current expectation is that things should finish under 30 minutes.
Hi Petr, did this used to work with 4.2? I know some changes went into cri-o 1.16 (ocp 4.3) that lead to permission denied issues.
Hello Urvashi, I do not remember this issue in 4.2 and actually I've seen this issue only once so far. If we will reproduce I will update here for sure but I've attached enough logs I hope from first occurrence. Petr
Looked through the logs attached and not much info in there. Since this only happened once, and only happened on 1 out of the 3 nodes, it is very likely that this was a flake. Petr, please re-open if you run into this again.