Bug 1875551 - [4.5] KubePodNotReady should not be raised for must-gather pod
Summary: [4.5] KubePodNotReady should not be raised for must-gather pod
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: oc
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: ---
: 4.5.z
Assignee: Jan Chaloupka
QA Contact: Mike Fiedler
URL:
Whiteboard: LifecycleReset
Depends On: 1805891
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-03 18:22 UTC by Jan Chaloupka
Modified: 2020-10-19 14:54 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of: 1805891
Environment:
Last Closed: 2020-10-19 14:54:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift oc pull 554 0 None closed bug 1875551: must-gather: move gather init container under containers 2021-01-15 11:15:28 UTC
Red Hat Product Errata RHBA-2020:4228 0 None None None 2020-10-19 14:54:40 UTC

Comment 1 Jan Chaloupka 2020-09-10 14:22:30 UTC
Putting UpcomingSprint just in case 4.5 backport PR does not get merged EOW

Comment 4 Mike Fiedler 2020-09-14 21:47:10 UTC
I was able to reproduce the issue (or a variation ) on 4.5.0-0.nightly-2020-09-14-124053 which should have the fix.

In a 50 worker AWS cluster, 1 minute after running oc adm must-gather, the following alert fired:

AL
KubePodNotReady
Pod openshift-must-gather-gbggn/must-gather-sd5n7 has been in a non-ready state for longer than 15 minutes.

It had not been 15 minutes since I started the must-gather.  See pod info below at the time the alert fired.


# oc get pods --all-namespaces | grep must-gather
openshift-must-gather-gbggn                        must-gather-sd5n7                                                     0/1     Init:0/1    0          86s

Comment 6 Jan Chaloupka 2020-09-15 07:39:22 UTC
> openshift-must-gather-gbggn                        must-gather-sd5n7                                                     0/1     Init:0/1    0          86s

After applying the fix, must-gather pod has no longer the init container. Can you run `oc get -o yaml` over the must-gather pod?

Comment 7 Mike Fiedler 2020-09-15 18:00:07 UTC
Moving back ON_QA to try later build.

Comment 8 Mike Fiedler 2020-09-16 12:07:50 UTC
4.5.0-0.nightly-2020-09-14-124053 is the latest 4.5 nightly and it does not have the fix.   Not sure why automation moved it to ON_QA.   Moving it back to MODIFIED while awaiting a nightly with the PR

Comment 14 Mike Fiedler 2020-09-24 17:17:16 UTC
@Jan In 4.5.12 I still see the Init container.   The diff for 4.5.12 (https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4-stable/release/4.5.12?from=4.5.9) shows that the merged PR is included:

CLI, CLI-ARTIFACTS, DEPLOYER, TOOLS
Bug 1876238: fix typo in oc adm upgrade help #557
Bug 1875551: must-gather: move gather init container under containers #554
Full changelog

Does the presence of the Init container mean the fix is not correct?   Ref:  your comment 6

From oc get pods --all-namespaces -w

openshift-must-gather-bdmnq                        must-gather-9972t                                                     0/1     Pending             0          0s
openshift-must-gather-bdmnq                        must-gather-9972t                                                     0/1     Pending             0          0s
openshift-must-gather-bdmnq                        must-gather-9972t                                                     0/1     Init:0/1            0          0s
openshift-must-gather-bdmnq                        must-gather-9972t                                                     0/1     Init:0/1            0          2s
openshift-must-gather-bdmnq                        must-gather-9972t                                                     0/1     Init:0/1            0          4s


The alert was raised:  

KubePodNotReady
Pod openshift-must-gather-bdmnq/must-gather-9972t has been in a non-ready state for longer than 15 minutes.

Since the release diff indicates the fix should be in 4.5.12 I am moving this back to ASSIGNED for investigation

Comment 16 Jan Chaloupka 2020-09-29 12:10:12 UTC
That is strange. How do you download the oc binary for 4.5.12?

Comment 17 W. Trevor King 2020-09-29 22:48:22 UTC
> How do you download the oc binary for 4.5.12?

https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/4.5.12/ links to openshift-client-linux-4.5.12.tar.gz, etc. and a signed checksum file.

Comment 18 Jan Chaloupka 2020-09-30 09:57:31 UTC
Checking with https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/4.5.12/openshift-client-linux-4.5.12.tar.gz I don't see any init container in the pod's spec. What command do you use to run the must-gather? Can you run `oc version` as well?

Comment 19 Jan Chaloupka 2020-09-30 10:03:13 UTC
In my case:

```
$ ./oc version
Client Version: 4.5.12
Server Version: 4.6.0-0.nightly-2020-08-10-233406
Kubernetes Version: v1.19.0-rc.2+5241b27-dirty
```

```
$ ./oc adm must-gather
[must-gather      ] OUT Using must-gather plugin-in image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8a84ccbdd140bb7151a774df04b6ba8a310ab4bbf025405a9af18b5e63847912
[must-gather      ] OUT namespace/openshift-must-gather-bjbvl created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-72xwb created
[must-gather      ] OUT pod for plug-in image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8a84ccbdd140bb7151a774df04b6ba8a310ab4bbf025405a9af18b5e63847912 created
[must-gather-mrp5v] POD Wrote inspect data to must-gather.
...
```

```
./oc get pods --all-namespaces -w
...
openshift-must-gather-bjbvl                        must-gather-mrp5v                                         0/2     Pending     0          0s
openshift-must-gather-bjbvl                        must-gather-mrp5v                                         0/2     Pending     0          0s
openshift-must-gather-bjbvl                        must-gather-mrp5v                                         0/2     ContainerCreating   0          0s
openshift-must-gather-bjbvl                        must-gather-mrp5v                                         0/2     ContainerCreating   0          2s
openshift-must-gather-bjbvl                        must-gather-mrp5v                                         2/2     Running             0          4s
```

Comment 20 Mike Fiedler 2020-09-30 18:43:28 UTC
client issue on my side.   re-testing.

Comment 21 Mike Fiedler 2020-09-30 23:27:13 UTC
Verified with:

$ oc version
Client Version: 4.5.14
Server Version: 4.5.14
Kubernetes Version: v1.18.3+5302882


No init pod, no alerts.  Problem was old client on my system.

Comment 24 errata-xmlrpc 2020-10-19 14:54:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.15 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4228


Note You need to log in before you can comment on or make changes to this bug.