Bug 1993971 - OCP installation is reported as successful but some pods remain in an Error state
Summary: OCP installation is reported as successful but some pods remain in an Error s...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.9
Hardware: Unspecified
OS: Unspecified
low
urgent
Target Milestone: ---
: ---
Assignee: Emilien Macchi
QA Contact: Jon Uriarte
URL:
Whiteboard:
Depends On:
Blocks: 1999564 2009308 2010037 2035311
TreeView+ depends on / blocked
 
Reported: 2021-08-16 13:15 UTC by Itay Matza
Modified: 2022-02-09 07:08 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1999564 2009308 2010037 2035311 (view as bug list)
Environment:
Last Closed: 2021-12-23 14:44:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Pods remain in an Error state (5.76 KB, text/plain)
2021-08-16 13:15 UTC, Itay Matza
no flags Details

Description Itay Matza 2021-08-16 13:15:45 UTC
Created attachment 1814432 [details]
Pods remain in an Error state

Version:
OSP: RHOS-16.1-RHEL-8-20210604.n.0
OCP: OCP 4.9.0-0.nightly-2021-08-07-175228.


Platform:
OpenShift on OpenStack with all of the network types (OpenshiftSDN, Kuryr, and OVN-Kubernetes).


Installation type:
IPI + UPI


Description:
The OCP installation is reported as successful but some pods remain in ERROR status, and the pods are staying in this state.
The issue reproduces ~once out of three attempts, and not always for the same pods.
Examples of pods in an error state after OCP installation - 
>openshift-operator-lifecycle-manager               collect-profiles-27147450-287fp
>openshift-kube-apiserver                           installer-7-ostest-77pc4-master-0
>openshift-image-registry                           image-pruner-1628899200-w4jx5
>openshift-kube-controller-manager                  revision-pruner-4-ostest-zjjz9-master-2

Attached info and logs.


Expectation:
OCP installation will finish without any pod in an error state.


Reproduce:
The issue reproduces ~once out of three OCP installations attempts.


Additional information:
The issue was also reproduced on OCP versions 4.6 and 4.8.

Comment 14 ShiftStack Bugwatcher 2021-11-25 16:12:07 UTC
Removing the Triaged keyword because:
* the target release value is missing

* the QE automation assessment (flag qe_test_coverage) is missing


Note You need to log in before you can comment on or make changes to this bug.