Bug 2103283 - In CI 4.10 HAProxy must-gather takes longer than 10 minutes
Summary: In CI 4.10 HAProxy must-gather takes longer than 10 minutes
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.10
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.12.0
Assignee: Grant Spence
QA Contact: Shudi Li
Depends On:
Blocks: 2104701
TreeView+ depends on / blocked
Reported: 2022-07-01 23:09 UTC by Candace Holman
Modified: 2023-01-17 19:51 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
*Previously, routers left in the terminating state delayed the `oc cp` command, which delayed the must gather until the terminating pod was terminated. With this update, a timeout is set for each `oc cp` command resulting in the must gathers not being delayed by terminating pods. (link:https://bugzilla.redhat.com/show_bug.cgi?id=2103283[*BZ#2103283*])
Clone Of:
Last Closed: 2023-01-17 19:51:19 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github openshift must-gather pull 317 0 None open Bug 2103283: Add timeout to oc cp command to fix must-gather delays when routers are terminating 2022-07-06 21:28:37 UTC
Red Hat Product Errata RHSA-2022:7399 0 None None None 2023-01-17 19:51:37 UTC

Comment 5 Shudi Li 2022-07-08 10:00:57 UTC
Verified it with 4.12.0-0.nightly-2022-07-07-144231 on an AWS cluster, the total time to must-gather was about 20 minutes, which was less than 40 minutes

%oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.0-0.nightly-2022-07-07-144231   True        False         132m    Cluster version is 4.12.0-0.nightly-2022-07-07-144231
% oc -n openshift-ingress patch deploy/router-default --type=strategic --patch='{"spec":{"template":{"spec":{"containers":[{"name":"router","livenessProbe":{"timeoutSeconds":15},"readinessProbe":{"timeoutSeconds":15}}]}}}}'
deployment.apps/router-default patched

%oc -n openshift-ingress get pods
NAME                              READY   STATUS        RESTARTS   AGE
router-default-6f67d6db6f-h4mqf   1/1     Terminating   0          2m40s
router-default-6f67d6db6f-t8p4j   0/1     Terminating   0          2m40s
router-default-86696fc96c-94cfp   1/1     Running       0          40s
router-default-86696fc96c-dc4pn   0/1     Pending       0          40s

3. oc adm must-gather, it took about 20 minutes
% oc adm must-gather  
[must-gather      ] OUT Using must-gather plug-in image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f2a076583e9b0646014c1135a52bb7af45c141c96f162e2ae8c0ad5bdedbefec
When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information:
ClusterID: ae2bfb96-bb30-475a-99a3-a61f297ccaf7
ClusterVersion: Stable at "4.12.0-0.nightly-2022-07-07-144231"
	All healthy and stable

4. During must-gather, oc -n openshift-ingress patch deploy/router-default with different readinessProbe value, so the router pods kept being terminated and then being created.

5.  After the must-gather was done, check the route pods
oc -n openshift-ingress get pods
NAME                              READY   STATUS    RESTARTS   AGE
router-default-667876b84d-n48t5   1/1     Running   0          13m
router-default-667876b84d-xgdct   1/1     Running   0          13m

6. Check the log
% pwd
shudi@Shudis-MacBook-Pro must-gather.local.1946140740011803351 % cat timestamp
2022-07-08 17:13:06.722968 +0800 CST m=+20.127377290
2022-07-08 17:35:31.548306 +0800 CST m=+1365.052625387

% ls -lht
total 0
drwxr-xr-x  3 shudi  staff    96B Jul  8 17:47 router-default-86696fc96c-dc4pn
drwxr-xr-x  3 shudi  staff    96B Jul  8 17:17 router-default-86696fc96c-94cfp
% pwd

Comment 8 errata-xmlrpc 2023-01-17 19:51:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.