Description of problem (please be detailed as possible and provide log snippests): OCS4.6 must_gather failes to complete in 600sec Version of all relevant components (if applicable): OCS4.6 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? no Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? yes Can this issue reproduce from the UI? no If this is a regression, please provide more details to justify this: Steps to Reproduce: 1.Collect must_gather: oc adm must-gather --image=quay.io/rhceph-dev/ocs-must-gather:latest-4.6 2.Failes to complete in 600sec image=quay.io/rhceph-dev/ocs-must-gather:latest-4.6 --dest-dir=/home/jenkins/current-cluster-dir/logs/deployment_1603639721/ocs_must_gather 17:07:50 - MainThread - ocs_ci.ocs.utils - ERROR - Timeout 600s for must-gather reached, command exited with error: Command '['oc', '--kubeconfig', '/home/jenkins/current-cluster-dir/openshift-cluster-dir/auth/kubeconfig', 'adm', 'must-gather', '--image=quay.io/rhceph-dev/ocs-must-gather:latest-4.6', '--dest-dir=/home/jenkins/current-cluster-dir/logs/deployment_1603639721/ocs_must_gather']' timed out after 600 seconds Actual results: Collection time on OCS4.6 longer than OCS4.5 Expected results: Collection time on OCS4.6 and OCS4.5 same Additional info:
Proposing as a blocker because of the significant time difference of collecting the logs, between 4.5 and 4.6
AFAIK, must-gather has many changes in 4.6 which add more time to the collection. I don't think it is mandatory for must-gather to finish collection in 600 seconds. Do we publish that somewhere? If not I don't think that this is a blocker. Pulkit, please correct me if I am wrong
(In reply to Mudit Agarwal from comment #3) > AFAIK, must-gather has many changes in 4.6 which add more time to the > collection. I don't think it is mandatory for must-gather to finish > collection in 600 seconds. > Do we publish that somewhere? If not I don't think that this is a blocker. > > Pulkit, please correct me if I am wrong yes it is not a blocker. It is not at all mandatory for must-gather to finish before 600 seconds. collection time can be different for each setup up. If must-gather fails with msg `timed out waiting for condition` then --timeout flag should be used to increase the time for collection. It is no where mentioned that must-gather should finish before 10 minutes.
Thanks Pulkit, this is not even a bug then. Will close it if QE doesn't have something to add.
Would like to add also that this affects our automation runs - we collect OCS and OCP must gather upon each test failure. Therefore, recently, ever since must gather takes more time to complete, we fail to collect those logs. In case we adjust our automation with the new needed timeout, the time it will take to run our automation will be significantly higher.
Providing dev_ack to fix extra sleep time which has been added as part of crash info collection. Please note that this will still not gurantee 10 minute completion of must-gather, which as stated earlier is not a valid requirement also.
Backport PR: https://github.com/openshift/ocs-operator/pull/890
Must gather collection takes 3 minutes and 35 seconds SetUp: Provider:Vmware OCP Version:4.6.0-0.nightly-2020-11-07-035509 OCS Version:ocs-operator.v4.6.0-156.ci Test Process: 1.Run Bash Script #!/bin/bash SECONDS=0 oc adm must-gather --image=quay.io/rhceph-dev/ocs-must-gather:latest-4.6 duration=$SECONDS echo "$(($duration / 60)) minutes and $(($duration % 60)) seconds elapsed." OutPut: 3 minutes and 35 seconds elapsed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.6.0 security, bug fix, enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5605