Bug 1737604 - Revision pruner pods end in Error state during fresh install
Summary: Revision pruner pods end in Error state during fresh install
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: openshift-apiserver
Version: 4.1.z
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.5.0
Assignee: Stefan Schimanski
QA Contact: Xingxing Xia
Depends On:
TreeView+ depends on / blocked
Reported: 2019-08-05 20:17 UTC by brad.williams
Modified: 2020-05-18 14:58 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2020-05-14 11:21:43 UTC
Target Upstream Version:
brad.williams: needinfo+

Attachments (Terms of Use)
must-gather data (3.42 MB, application/gzip)
2019-08-05 20:17 UTC, brad.williams
no flags Details
Crio inspect output (1.62 KB, text/plain)
2019-08-05 20:19 UTC, brad.williams
no flags Details
openshift-kube-controller-manager pod describe (3.64 KB, text/plain)
2019-08-05 20:20 UTC, brad.williams
no flags Details
openshift-kube-scheduler pod describe (3.51 KB, text/plain)
2019-08-05 20:21 UTC, brad.williams
no flags Details
openshift-kube-apiserver pod describe (4.02 KB, text/plain)
2019-08-05 20:22 UTC, brad.williams
no flags Details

Description brad.williams 2019-08-05 20:17:07 UTC
Created attachment 1600743 [details]
must-gather data

Description of problem:
Standing up an OpenShift 4.1.8 cluster in our starter environment.  The installation was successful.  We successfully applied our configuration.  The verification of our environment failed stating that 3 critical pods were in an unexpected state...  

1. ERROR [/var/lib/jenkins-agent/jenkins-agent/workspace/rhcos/cluster-standup_master/git/continuous-release-jobs/config/imperative/verifications/generic.py:173 critical_pod_check] - Pod openshift-kube-apiserver:pod/revision-pruner-10-ip-10-0-142-118.us-east-2.compute.internal is not in expected state; presently: Failed

Further investigation showed that 3 revision-pruner pods were all in an Error state.  All the pods were on the same node.  They all had exited with error code 255.  The logs showed no errors.

Version-Release number of selected component (if applicable):
version   4.1.8     True        False         4h13m   Cluster version is 4.1.8

How reproducible:

Steps to Reproduce:
1. Create OpenShift 4.1.8 cluster
2. Apply starter configuration
3. Verification of cluster failed

Actual results:
Revision pruner pods in an Error state

Expected results:
The cluster should install, apply configuration, and pass verification successfully.

Additional info:

Comment 1 brad.williams 2019-08-05 20:19:26 UTC
Created attachment 1600744 [details]
Crio inspect output

Comment 2 brad.williams 2019-08-05 20:20:31 UTC
Created attachment 1600746 [details]
openshift-kube-controller-manager pod describe

Comment 3 brad.williams 2019-08-05 20:21:21 UTC
Created attachment 1600748 [details]
openshift-kube-scheduler pod describe

Comment 4 brad.williams 2019-08-05 20:22:10 UTC
Created attachment 1600750 [details]
openshift-kube-apiserver pod describe

Comment 6 Michal Fojtik 2020-05-12 10:53:28 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet.

As such, we're marking this bug as "LifecycleStale".

If you have further information on the current state of the bug, please update it, otherwise this bug will be automatically closed in 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant.

Note You need to log in before you can comment on or make changes to this bug.