Bug 1875551

Summary: [4.5] KubePodNotReady should not be raised for must-gather pod
Product: OpenShift Container Platform Reporter: Jan Chaloupka <jchaloup>
Component: ocAssignee: Jan Chaloupka <jchaloup>
Status: CLOSED ERRATA QA Contact: Mike Fiedler <mifiedle>
Severity: medium Docs Contact:
Priority: low    
Version: 4.3.0CC: aos-bugs, jokerman, mfojtik, mifiedle, nmalik, obulatov, wking, wzheng
Target Milestone: ---Keywords: ServiceDeliveryImpact
Target Release: 4.5.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: LifecycleReset
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: 1805891 Environment:
Last Closed: 2020-10-19 14:54:24 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1805891    
Bug Blocks:    

Comment 1 Jan Chaloupka 2020-09-10 14:22:30 UTC
Putting UpcomingSprint just in case 4.5 backport PR does not get merged EOW

Comment 4 Mike Fiedler 2020-09-14 21:47:10 UTC
I was able to reproduce the issue (or a variation ) on 4.5.0-0.nightly-2020-09-14-124053 which should have the fix.

In a 50 worker AWS cluster, 1 minute after running oc adm must-gather, the following alert fired:

AL
KubePodNotReady
Pod openshift-must-gather-gbggn/must-gather-sd5n7 has been in a non-ready state for longer than 15 minutes.

It had not been 15 minutes since I started the must-gather.  See pod info below at the time the alert fired.


# oc get pods --all-namespaces | grep must-gather
openshift-must-gather-gbggn                        must-gather-sd5n7                                                     0/1     Init:0/1    0          86s

Comment 6 Jan Chaloupka 2020-09-15 07:39:22 UTC
> openshift-must-gather-gbggn                        must-gather-sd5n7                                                     0/1     Init:0/1    0          86s

After applying the fix, must-gather pod has no longer the init container. Can you run `oc get -o yaml` over the must-gather pod?

Comment 7 Mike Fiedler 2020-09-15 18:00:07 UTC
Moving back ON_QA to try later build.

Comment 8 Mike Fiedler 2020-09-16 12:07:50 UTC
4.5.0-0.nightly-2020-09-14-124053 is the latest 4.5 nightly and it does not have the fix.   Not sure why automation moved it to ON_QA.   Moving it back to MODIFIED while awaiting a nightly with the PR

Comment 14 Mike Fiedler 2020-09-24 17:17:16 UTC
@Jan In 4.5.12 I still see the Init container.   The diff for 4.5.12 (https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4-stable/release/4.5.12?from=4.5.9) shows that the merged PR is included:

CLI, CLI-ARTIFACTS, DEPLOYER, TOOLS
Bug 1876238: fix typo in oc adm upgrade help #557
Bug 1875551: must-gather: move gather init container under containers #554
Full changelog

Does the presence of the Init container mean the fix is not correct?   Ref:  your comment 6

From oc get pods --all-namespaces -w

openshift-must-gather-bdmnq                        must-gather-9972t                                                     0/1     Pending             0          0s
openshift-must-gather-bdmnq                        must-gather-9972t                                                     0/1     Pending             0          0s
openshift-must-gather-bdmnq                        must-gather-9972t                                                     0/1     Init:0/1            0          0s
openshift-must-gather-bdmnq                        must-gather-9972t                                                     0/1     Init:0/1            0          2s
openshift-must-gather-bdmnq                        must-gather-9972t                                                     0/1     Init:0/1            0          4s


The alert was raised:  

KubePodNotReady
Pod openshift-must-gather-bdmnq/must-gather-9972t has been in a non-ready state for longer than 15 minutes.

Since the release diff indicates the fix should be in 4.5.12 I am moving this back to ASSIGNED for investigation

Comment 16 Jan Chaloupka 2020-09-29 12:10:12 UTC
That is strange. How do you download the oc binary for 4.5.12?

Comment 17 W. Trevor King 2020-09-29 22:48:22 UTC
> How do you download the oc binary for 4.5.12?

https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/4.5.12/ links to openshift-client-linux-4.5.12.tar.gz, etc. and a signed checksum file.

Comment 18 Jan Chaloupka 2020-09-30 09:57:31 UTC
Checking with https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/4.5.12/openshift-client-linux-4.5.12.tar.gz I don't see any init container in the pod's spec. What command do you use to run the must-gather? Can you run `oc version` as well?

Comment 19 Jan Chaloupka 2020-09-30 10:03:13 UTC
In my case:

```
$ ./oc version
Client Version: 4.5.12
Server Version: 4.6.0-0.nightly-2020-08-10-233406
Kubernetes Version: v1.19.0-rc.2+5241b27-dirty
```

```
$ ./oc adm must-gather
[must-gather      ] OUT Using must-gather plugin-in image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8a84ccbdd140bb7151a774df04b6ba8a310ab4bbf025405a9af18b5e63847912
[must-gather      ] OUT namespace/openshift-must-gather-bjbvl created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-72xwb created
[must-gather      ] OUT pod for plug-in image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8a84ccbdd140bb7151a774df04b6ba8a310ab4bbf025405a9af18b5e63847912 created
[must-gather-mrp5v] POD Wrote inspect data to must-gather.
...
```

```
./oc get pods --all-namespaces -w
...
openshift-must-gather-bjbvl                        must-gather-mrp5v                                         0/2     Pending     0          0s
openshift-must-gather-bjbvl                        must-gather-mrp5v                                         0/2     Pending     0          0s
openshift-must-gather-bjbvl                        must-gather-mrp5v                                         0/2     ContainerCreating   0          0s
openshift-must-gather-bjbvl                        must-gather-mrp5v                                         0/2     ContainerCreating   0          2s
openshift-must-gather-bjbvl                        must-gather-mrp5v                                         2/2     Running             0          4s
```

Comment 20 Mike Fiedler 2020-09-30 18:43:28 UTC
client issue on my side.   re-testing.

Comment 21 Mike Fiedler 2020-09-30 23:27:13 UTC
Verified with:

$ oc version
Client Version: 4.5.14
Server Version: 4.5.14
Kubernetes Version: v1.18.3+5302882


No init pod, no alerts.  Problem was old client on my system.

Comment 24 errata-xmlrpc 2020-10-19 14:54:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.15 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4228