Bug 1781044 - [must gather] oc adm must-gather failed to generate the directory, gather never finished: timed out waiting for the condition.
Summary: [must gather] oc adm must-gather failed to generate the directory, gather nev...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Providers
Version: 2.2.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 2.2.0
Assignee: Avram Levitter
QA Contact: Ying Cui
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-12-09 07:30 UTC by Ying Cui
Modified: 2020-01-30 16:27 UTC (History)
7 users (show)

Fixed In Version: cnv-must-gather-container-v2.2.0-7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-01-30 16:27:33 UTC
Target Upstream Version:
Embargoed:
maszulik: needinfo-


Attachments (Terms of Use)
screen_messages_output_mustgather (6.56 KB, text/plain)
2019-12-09 07:32 UTC, Ying Cui
no flags Details
ocgetpods (36.89 KB, text/plain)
2019-12-09 07:33 UTC, Ying Cui
no flags Details
ocdescribepod (3.29 KB, text/plain)
2019-12-09 07:34 UTC, Ying Cui
no flags Details
mustgather_withoutimage_successful (378.47 KB, text/plain)
2019-12-10 14:37 UTC, Ying Cui
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2020:0307 0 None None None 2020-01-30 16:27:48 UTC

Description Ying Cui 2019-12-09 07:30:35 UTC
Description of problem:
Running oc adm must-gather to gather all CNV info, after for a while, there is no output directory generated, gather never finished: timed out waiting for the condition

Version-Release number of selected component (if applicable):
oc version: Client Version: openshift-clients-4.3.0-201910250623-70-g0ed83003
Server Version: 4.3.0-0.nightly-2019-11-28-103851
Kubernetes Version: v1.16.2
CNV 2.2

How reproducible:
100% in PSI


Steps to Reproduce:
1. Deployed OCP 4.3 and CNV 2.2 successful.

2. $ oc adm must-gather --image=registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-cnv-must-gather-rhel8:v2.2.0-6 --dest-dir=/tmp/pytest-of-cnv-qe-jenkins/pytest-1/must_gather0
# see attachment: output_mustgather.txt

3. 
$ oc get pods -A  -w  # see attachment: ocgetpods.txt
$ oc describe pod must-gather-fbwz2 -n openshift-must-gather-jbzp9 # see attachment: ocdescribpod.txt

Actual results:
Step 2, checking /tmp/pytest-of-cnv-qe-jenkins/pytest-1/must_gather0, there is no output directory generated, gather never finished: timed out waiting for the condition


Expected results:
The output directory generated. 

Additional info:
1. $ oc adm must-gather --image=quay.io/kubevirt/must-gather does NOT work.
2. The specific issue follows up by Bug 1781038 - [must gather] openshift-must-gather has been DEPRECATED. Use `oc adm inspect` instead.

Comment 1 Ying Cui 2019-12-09 07:32:07 UTC
Created attachment 1643190 [details]
screen_messages_output_mustgather

Comment 2 Ying Cui 2019-12-09 07:33:40 UTC
Created attachment 1643191 [details]
ocgetpods

Comment 3 Ying Cui 2019-12-09 07:34:53 UTC
Created attachment 1643204 [details]
ocdescribepod

Comment 4 Piotr Kliczewski 2019-12-09 08:25:08 UTC
Maciej, I remember you wanted to investigate this one. We agreed that no matter what happens some gathered logs should be collected.

Comment 5 Piotr Kliczewski 2019-12-09 09:20:58 UTC
It doesn't look like regression. In my opinion it never worked. Let's wait on Maciej to reply but I think he or anyone else from the platform should fix it.

Comment 6 Dan Kenigsberg 2019-12-09 17:19:33 UTC
Piotr, what is "it" that never worked? I though that Ying was attempting a very basic use case which was tested before. What am I missing?

Comment 7 Piotr Kliczewski 2019-12-09 17:54:08 UTC
Dan this issue was reported before as BZ #1755714. Maciej closed it as works on my machine and promised to investigate which seems like it never happened.

Comment 12 Ying Cui 2019-12-10 14:37:51 UTC
Created attachment 1643655 [details]
mustgather_withoutimage_successful

Comment 16 Avram Levitter 2019-12-11 06:55:07 UTC
It seems that it's failing specifically because of the 10 minute timeout built into `oc adm must-gather`. When I used the `--keep` flag (which will not delete the pod and namespace after execution), the pod finished after 13 minutes.

Comment 17 Avram Levitter 2019-12-11 10:26:39 UTC
The problem seems to be specifically in the gathering of the packagemanifests. That section has been taking close to 10 minutes. It takes around 3 seconds to execute `oc get packagemanifest $name -n $NS -o yaml >> ${NAMESPACE_PATH}/${NS}/packagemanifests` and on a test cluster there were 185 packagemanifests.

Comment 18 Avram Levitter 2019-12-12 13:57:50 UTC
There's a pending pull request that should fix this in upstream: https://github.com/kubevirt/must-gather/pull/60

Comment 19 Dan Kenigsberg 2019-12-12 20:07:03 UTC
(In reply to Avram Levitter from comment #18)
> There's a pending pull request that should fix this in upstream:
> https://github.com/kubevirt/must-gather/pull/60

That's exactly the reason to move a bz to the POST state.

Comment 21 Ying Cui 2019-12-24 07:29:28 UTC
VERIFIED this bug on cnv-must-gather-container-v2.2.0-7

Test Steps:

$ oc adm must-gather --image=registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-cnv-must-gather-rhel8:v2.2.0-7 --dest-dir=/tmp

The output directory generated, the issue is fixed.

Comment 23 errata-xmlrpc 2020-01-30 16:27:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:0307


Note You need to log in before you can comment on or make changes to this bug.