Bug 1866399
Summary: | must-gather pod retry interval (10 minutes) doesn't make sense | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Yossi Segev <ysegev> |
Component: | oc | Assignee: | Sally <somalley> |
Status: | CLOSED ERRATA | QA Contact: | RamaKasturi <knarra> |
Severity: | low | Docs Contact: | |
Priority: | low | ||
Version: | 4.5 | CC: | aos-bugs, danken, jokerman, knarra, maszulik, mfojtik, somalley, tnozicka |
Target Milestone: | --- | Keywords: | Reopened, TestBlockerForLayeredProduct |
Target Release: | 4.6.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-10-27 16:24:53 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1859972 |
Description
Yossi Segev
2020-08-05 13:49:47 UTC
The interval there is defaulted to 10mins, but a user can change at will through --timeout flag. It's true that this parameter is configurable, but I think it's configurable by mistake: The help of the timeout parameter says: --timeout=600: The length of time to gather data, in seconds. Defaults to 10 minutes. But what actually happens is that configuring this parameter doesn't affect the timeout to gather data - just the must-gather pod's *init interval*. Clearly this is a bug: The user doesn't care what is the interval of the pod's init retries, but he does care on how much time he is willing to wait for data to be gathered. So there are actually 3 issues here that should be fixed, in the order of their severity (the first is the most severe): 1. The user cannot set the total timeout for gathering data. 2. The default interval of the pod's initialization is way too long (10 minutes), and should be set to something more reasonable. 3. The pod init interval shouldn't be configurable. What do you think?
> 1. The user cannot set the total timeout for gathering data.
> 2. The default interval of the pod's initialization is way too long (10
> minutes), and should be set to something more reasonable.
> 3. The pod init interval shouldn't be configurable.
Init pod is where the actual gathering is happening, I can agree about the overall timeout, though.
The timeout is configurable, so leaving that alone. I notice the retry intervals in must-gather of 1 min each add ~2 min to the must-gather, though. I've added a PR to shorten the retry intervals. Tried to verify the bug and i see that time it takes now is 20 to 25 seconds and retries are happening for every 3 seconds for about 8-9 times, waiting for a query to be resolved from dev, once done will move bug to appropriate state. QA, Please verify now that I've bumped the retry interval to 10s that the must-gather command does not pause for 60s in these 2 places, thanks: 1) after this output line: "OUT pod for plug-in image" 2) after this output line: "OUT waiting for gather to complete" The retry interval is shortened from the original 60s to 10s in 2 places, so the overall command running time is decreased by ~80-120s. The overall timeout is configurable with the --timeout flag (defaults to 600s). still see retries for 3 seconds, will verify this tomorrow again [ramakasturinarra@dhcp35-60 openshift-client-linux-4.6.0-0.nightly-2020-09-01-070508]$ ./oc version Client Version: 4.6.0-0.nightly-2020-09-01-070508 Server Version: 4.6.0-0.nightly-2020-09-01-070508 Kubernetes Version: v1.19.0-rc.2+b5dc585-dirty Verified with the payload below and observation follows. [ramakasturinarra@dhcp35-60 openshift-client-linux-4.6.0-0.nightly-2020-09-03-191144]$ ./oc version -o yaml clientVersion: buildDate: "2020-09-03T06:56:57Z" compiler: gc gitCommit: 162e03f9f11dd645b0cc331b115a1afda8e8a2eb gitTreeState: clean gitVersion: openshift-clients-4.6.0-202006250705.p0-106-g162e03f9f goVersion: go1.14.4 major: "" minor: "" platform: linux/amd64 openshiftVersion: 4.6.0-0.nightly-2020-09-03-191144 releaseClientVersion: 4.6.0-0.nightly-2020-09-03-191144 serverVersion: buildDate: "2020-06-26T23:02:45Z" compiler: gc gitCommit: 6e15712811e6e7955ec70854fc72153c8b417cba gitTreeState: dirty gitVersion: v4.6.0-202009031349.p0-dirty goVersion: go1.14.4 major: "4" minor: 6+ platform: linux/amd64 Steps followed to verify the bug: =================================== 1) Install latest oc 4.6 server & client 2) Run oc adm must-gather command 3) see that retry interval was 10 seconds before proceeding at "OUT pod for plug-in image" and less than 10 seconds at "OUT waiting for gather to complete" [must-gather ] OUT pod for plug-in image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:868b6c6113db3c0750d17dfc21193687bb8689137263578f4bc1ce0b71506af5 created +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ I0904 15:55:18.411934 22800 round_trippers.go:420] GET https://api.knarra0904.qe.devcluster.openshift.com:6443/api/v1/namespaces/openshift-must-gather-vnzl7/pods/must-gather-ts6ds I0904 15:55:29.014294 22800 round_trippers.go:420] GET https://api.knarra0904.qe.devcluster.openshift.com:6443/api/v1/namespaces/openshift-must-gather-vnzl7/pods/must-gather-ts6ds [must-gather-ts6ds] OUT waiting for gather to complete +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [must-gather-gg9v6] OUT waiting for gather to complete I0904 17:47:57.117725 25520 round_trippers.go:420] GET https://api.knarra0904.qe.devcluster.openshift.com:6443/api/v1/namespaces/openshift-must-gather-c2d9t/pods/must-gather-gg9v6 I0904 17:47:57.117774 25520 round_trippers.go:427] Request Headers: I0904 17:48:08.758233 25520 round_trippers.go:420] GET https://api.knarra0904.qe.devcluster.openshift.com:6443/api/v1/namespaces/openshift-must-gather-c2d9t/pods/must-gather-gg9v6 4) Repeated the same steps with oc 4.5 client and i see that retry interval was 60 at "OUT pod for plug-in image and at "OUT waiting for gather to complete" [must-gather ] OUT pod for plug-in image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:868b6c6113db3c0750d17dfc21193687bb8689137263578f4bc1ce0b71506af5 created ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ I0904 15:42:41.827131 22303 round_trippers.go:420] GET https://api.knarra0904.qe.devcluster.openshift.com:6443/api/v1/namespaces/openshift-must-gather-79q2g/pods/must-gather-kflsm I0904 15:43:42.377047 22303 round_trippers.go:420] GET https://api.knarra0904.qe.devcluster.openshift.com:6443/api/v1/namespaces/openshift-must-gather-79q2g/pods/must-gather-kflsm [must-gather-xnqx8] OUT waiting for gather to complete +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ I0904 17:34:46.826636 25131 round_trippers.go:420] GET https://api.knarra0904.qe.devcluster.openshift.com:6443/api/v1/namespaces/openshift-must-gather-7n9xn/pods/must-gather-xnqx8 I0904 17:35:47.159863 25131 round_trippers.go:420] GET https://api.knarra0904.qe.devcluster.openshift.com:6443/api/v1/namespaces/openshift-must-gather-7n9xn/pods/must-gather-xnqx8 Based on the above moving the bug to verified state. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |