Bug 1993971

Summary: OCP installation is reported as successful but some pods remain in an Error state
Product: OpenShift Container Platform Reporter: Itay Matza <imatza>
Component: InstallerAssignee: Emilien Macchi <emacchi>
Installer sub component: OpenShift on OpenStack QA Contact: Jon Uriarte <juriarte>
Status: CLOSED WORKSFORME Docs Contact:
Severity: urgent    
Priority: low CC: amalykhi, emacchi, itbrown, tsze, ushkalim
Version: 4.9Keywords: AutomationBlocker, Reopened
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1999564 2009308 2010037 2035311 (view as bug list) Environment:
Last Closed: 2021-12-23 14:44:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1999564, 2009308, 2010037, 2035311    
Attachments:
Description Flags
Pods remain in an Error state none

Description Itay Matza 2021-08-16 13:15:45 UTC
Created attachment 1814432 [details]
Pods remain in an Error state

Version:
OSP: RHOS-16.1-RHEL-8-20210604.n.0
OCP: OCP 4.9.0-0.nightly-2021-08-07-175228.


Platform:
OpenShift on OpenStack with all of the network types (OpenshiftSDN, Kuryr, and OVN-Kubernetes).


Installation type:
IPI + UPI


Description:
The OCP installation is reported as successful but some pods remain in ERROR status, and the pods are staying in this state.
The issue reproduces ~once out of three attempts, and not always for the same pods.
Examples of pods in an error state after OCP installation - 
>openshift-operator-lifecycle-manager               collect-profiles-27147450-287fp
>openshift-kube-apiserver                           installer-7-ostest-77pc4-master-0
>openshift-image-registry                           image-pruner-1628899200-w4jx5
>openshift-kube-controller-manager                  revision-pruner-4-ostest-zjjz9-master-2

Attached info and logs.


Expectation:
OCP installation will finish without any pod in an error state.


Reproduce:
The issue reproduces ~once out of three OCP installations attempts.


Additional information:
The issue was also reproduced on OCP versions 4.6 and 4.8.

Comment 14 ShiftStack Bugwatcher 2021-11-25 16:12:07 UTC
Removing the Triaged keyword because:
* the target release value is missing

* the QE automation assessment (flag qe_test_coverage) is missing