1781044 – [must gather] oc adm must-gather failed to generate the directory, gather never finished: timed out waiting for the condition.

Bug 1781044 - [must gather] oc adm must-gather failed to generate the directory, gather never finished: timed out waiting for the condition.

Summary: [must gather] oc adm must-gather failed to generate the directory, gather nev...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Providers
Sub Component:
Version:	2.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	2.2.0
Assignee:	Avram Levitter
QA Contact:	Ying Cui
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-12-09 07:30 UTC by Ying Cui
Modified:	2020-01-30 16:27 UTC (History)
CC List:	7 users (show)
Fixed In Version:	cnv-must-gather-container-v2.2.0-7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-01-30 16:27:33 UTC
Target Upstream Version:
Embargoed:
Dependent Products:
Flags:	maszulik: needinfo-

Attachments	(Terms of Use)
screen_messages_output_mustgather (6.56 KB, text/plain) 2019-12-09 07:32 UTC, Ying Cui	no flags	Details
ocgetpods (36.89 KB, text/plain) 2019-12-09 07:33 UTC, Ying Cui	no flags	Details
ocdescribepod (3.29 KB, text/plain) 2019-12-09 07:34 UTC, Ying Cui	no flags	Details
mustgather_withoutimage_successful (378.47 KB, text/plain) 2019-12-10 14:37 UTC, Ying Cui	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2020:0307	0	None	None	None	2020-01-30 16:27:48 UTC

Description Ying Cui 2019-12-09 07:30:35 UTC

Description of problem:
Running oc adm must-gather to gather all CNV info, after for a while, there is no output directory generated, gather never finished: timed out waiting for the condition

Version-Release number of selected component (if applicable):
oc version: Client Version: openshift-clients-4.3.0-201910250623-70-g0ed83003
Server Version: 4.3.0-0.nightly-2019-11-28-103851
Kubernetes Version: v1.16.2
CNV 2.2

How reproducible:
100% in PSI


Steps to Reproduce:
1. Deployed OCP 4.3 and CNV 2.2 successful.

2. $ oc adm must-gather --image=registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-cnv-must-gather-rhel8:v2.2.0-6 --dest-dir=/tmp/pytest-of-cnv-qe-jenkins/pytest-1/must_gather0
# see attachment: output_mustgather.txt

3. 
$ oc get pods -A  -w  # see attachment: ocgetpods.txt
$ oc describe pod must-gather-fbwz2 -n openshift-must-gather-jbzp9 # see attachment: ocdescribpod.txt

Actual results:
Step 2, checking /tmp/pytest-of-cnv-qe-jenkins/pytest-1/must_gather0, there is no output directory generated, gather never finished: timed out waiting for the condition


Expected results:
The output directory generated. 

Additional info:
1. $ oc adm must-gather --image=quay.io/kubevirt/must-gather does NOT work.
2. The specific issue follows up by Bug 1781038 - [must gather] openshift-must-gather has been DEPRECATED. Use `oc adm inspect` instead.

Comment 1 Ying Cui 2019-12-09 07:32:07 UTC

Created attachment 1643190 [details]
screen_messages_output_mustgather

Comment 2 Ying Cui 2019-12-09 07:33:40 UTC

Created attachment 1643191 [details]
ocgetpods

Comment 3 Ying Cui 2019-12-09 07:34:53 UTC

Created attachment 1643204 [details]
ocdescribepod

Comment 4 Piotr Kliczewski 2019-12-09 08:25:08 UTC

Maciej, I remember you wanted to investigate this one. We agreed that no matter what happens some gathered logs should be collected.

Comment 5 Piotr Kliczewski 2019-12-09 09:20:58 UTC

It doesn't look like regression. In my opinion it never worked. Let's wait on Maciej to reply but I think he or anyone else from the platform should fix it.

Comment 6 Dan Kenigsberg 2019-12-09 17:19:33 UTC

Piotr, what is "it" that never worked? I though that Ying was attempting a very basic use case which was tested before. What am I missing?

Comment 7 Piotr Kliczewski 2019-12-09 17:54:08 UTC

Dan this issue was reported before as BZ #1755714. Maciej closed it as works on my machine and promised to investigate which seems like it never happened.

Comment 12 Ying Cui 2019-12-10 14:37:51 UTC

Created attachment 1643655 [details]
mustgather_withoutimage_successful

Comment 16 Avram Levitter 2019-12-11 06:55:07 UTC

It seems that it's failing specifically because of the 10 minute timeout built into `oc adm must-gather`. When I used the `--keep` flag (which will not delete the pod and namespace after execution), the pod finished after 13 minutes.

Comment 17 Avram Levitter 2019-12-11 10:26:39 UTC

The problem seems to be specifically in the gathering of the packagemanifests. That section has been taking close to 10 minutes. It takes around 3 seconds to execute `oc get packagemanifest $name -n $NS -o yaml >> ${NAMESPACE_PATH}/${NS}/packagemanifests` and on a test cluster there were 185 packagemanifests.

Comment 18 Avram Levitter 2019-12-12 13:57:50 UTC

There's a pending pull request that should fix this in upstream: https://github.com/kubevirt/must-gather/pull/60

Comment 19 Dan Kenigsberg 2019-12-12 20:07:03 UTC

(In reply to Avram Levitter from comment #18)
> There's a pending pull request that should fix this in upstream:
> https://github.com/kubevirt/must-gather/pull/60

That's exactly the reason to move a bz to the POST state.

Comment 21 Ying Cui 2019-12-24 07:29:28 UTC

VERIFIED this bug on cnv-must-gather-container-v2.2.0-7

Test Steps:

$ oc adm must-gather --image=registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-cnv-must-gather-rhel8:v2.2.0-7 --dest-dir=/tmp

The output directory generated, the issue is fixed.

Comment 23 errata-xmlrpc 2020-01-30 16:27:33 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:0307

Note You need to log in before you can comment on or make changes to this bug.