Description of problem: Running oc adm must-gather to gather all CNV info, after for a while, there is no output directory generated, gather never finished: timed out waiting for the condition Version-Release number of selected component (if applicable): oc version: Client Version: openshift-clients-4.3.0-201910250623-70-g0ed83003 Server Version: 4.3.0-0.nightly-2019-11-28-103851 Kubernetes Version: v1.16.2 CNV 2.2 How reproducible: 100% in PSI Steps to Reproduce: 1. Deployed OCP 4.3 and CNV 2.2 successful. 2. $ oc adm must-gather --image=registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-cnv-must-gather-rhel8:v2.2.0-6 --dest-dir=/tmp/pytest-of-cnv-qe-jenkins/pytest-1/must_gather0 # see attachment: output_mustgather.txt 3. $ oc get pods -A -w # see attachment: ocgetpods.txt $ oc describe pod must-gather-fbwz2 -n openshift-must-gather-jbzp9 # see attachment: ocdescribpod.txt Actual results: Step 2, checking /tmp/pytest-of-cnv-qe-jenkins/pytest-1/must_gather0, there is no output directory generated, gather never finished: timed out waiting for the condition Expected results: The output directory generated. Additional info: 1. $ oc adm must-gather --image=quay.io/kubevirt/must-gather does NOT work. 2. The specific issue follows up by Bug 1781038 - [must gather] openshift-must-gather has been DEPRECATED. Use `oc adm inspect` instead.
Created attachment 1643190 [details] screen_messages_output_mustgather
Created attachment 1643191 [details] ocgetpods
Created attachment 1643204 [details] ocdescribepod
Maciej, I remember you wanted to investigate this one. We agreed that no matter what happens some gathered logs should be collected.
It doesn't look like regression. In my opinion it never worked. Let's wait on Maciej to reply but I think he or anyone else from the platform should fix it.
Piotr, what is "it" that never worked? I though that Ying was attempting a very basic use case which was tested before. What am I missing?
Dan this issue was reported before as BZ #1755714. Maciej closed it as works on my machine and promised to investigate which seems like it never happened.
Created attachment 1643655 [details] mustgather_withoutimage_successful
It seems that it's failing specifically because of the 10 minute timeout built into `oc adm must-gather`. When I used the `--keep` flag (which will not delete the pod and namespace after execution), the pod finished after 13 minutes.
The problem seems to be specifically in the gathering of the packagemanifests. That section has been taking close to 10 minutes. It takes around 3 seconds to execute `oc get packagemanifest $name -n $NS -o yaml >> ${NAMESPACE_PATH}/${NS}/packagemanifests` and on a test cluster there were 185 packagemanifests.
There's a pending pull request that should fix this in upstream: https://github.com/kubevirt/must-gather/pull/60
(In reply to Avram Levitter from comment #18) > There's a pending pull request that should fix this in upstream: > https://github.com/kubevirt/must-gather/pull/60 That's exactly the reason to move a bz to the POST state.
VERIFIED this bug on cnv-must-gather-container-v2.2.0-7 Test Steps: $ oc adm must-gather --image=registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-cnv-must-gather-rhel8:v2.2.0-7 --dest-dir=/tmp The output directory generated, the issue is fixed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:0307