Description of problem (please be detailed as possible and provide log snippests): During OCS installation we are attempting to validate the PVC is mounted on the registry pod with the following command which is timing out. This appears to only be happening when installing on vmware. E ocs_ci.ocs.exceptions.CommandFailed: Error during execution of command: oc -n openshift-image-registry --kubeconfig /home/jenkins/current-cluster-dir/openshift-cluster-dir/auth/kubeconfig rsh image-registry-5f49c458bf-76zgh mount. E Error is Error from server: etcdserver: request timed out A portion of must-gather also failed with the same error(the rest of must gather is in additional info): 21:41:36 - MainThread - ocs_ci.ocs.utils - ERROR - Failed during must gather logs! Error: Error during execution of command: oc --kubeconfig /home/jenkins/current-cluster-dir/openshift-cluster-dir/auth/kubeconfig adm must-gather --image=quay.io/openshift/origin-must-gather --dest-dir=/home/jenkins/current-cluster-dir/logs/failed_testcase_ocs_logs_1581540953/deployment_ocs_logs/ocp_must_gather. Error is error: gather did not start for pod must-gather-p58g7: etcdserver: request timed out Version of all relevant components (if applicable): quay.io/rhceph-dev/ocs-olm-operator:4.2.2-rc4 OCP 4.3 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Installation failing due to this stopping us from further testing Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? Yes Can this issue reproduce from the UI? untested If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Install OCS cluster in vmware (jenkins: https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/4543//console) 2. Validate the PVC is mounted on the registry pod 3. Actual results: rsh command is timing out Expected results: Verification of pvc mounted on registry pod succeeds Additional info: Link to must-gather logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-vu1cs33-t4a/jnk-vu1cs33-t4a_20200212T204539/logs/failed_testcase_ocs_logs_1581540953/deployment_ocs_logs/
- Are you sure your VMware environment is reasonably performing? Looks like etcd is timing out? - Severity?
(In reply to Yaniv Kaul from comment #2) > - Are you sure your VMware environment is reasonably performing? Looks like > etcd is timing out? Not really sure how to answer this. Are there specific requirements that I can verify our environment is meeting with regards to etcd timing out?
This cluster has 256 GB of memory and storage ( VSAN ) of 8.99 TB, Its connected to 1 Gbits/s.
(In reply to Vijay Avuthu from comment #4) > This cluster has 256 GB of memory and storage ( VSAN ) of 8.99 TB, Its > connected to 1 Gbits/s. 1Gb is VERY VERY slow.
Has there been any further insight on this? Anyway it's not 4.4 material...
Closing. If you can reproduce on a reasonable* platform, please re-open. * reasonable: - well performing (10g network, for start) - not overloaded (by other workloads) It's not always easy to understand what's going on with the underlying platform, but it's a must in these cases. Not much we can do if the setup is either under-performing or overloaded.