Bug 1623108

Summary: [free-stg] scale group ami pull keys expire for operations registry
Product: OpenShift Online Reporter: Justin Pierce <jupierce>
Component: UnknownAssignee: Justin Pierce <jupierce>
Status: CLOSED CURRENTRELEASE QA Contact: DeShuai Ma <dma>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.xCC: aos-bugs, jokerman, mmccomas, sdodson, yufchang
Target Milestone: ---Keywords: OnlineStarter
Target Release: 3.x   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-24 13:16:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Justin Pierce 2018-08-28 13:31:38 UTC
Description of problem:
Presently, 268 projects are stuck in terminating state. Cluster upgrades are presently blocked. 

Version-Release number of selected component (if applicable):
v3.11.0-0.21.0

How reproducible:
Unknown - current state of the cluster

Comment 4 Xingxing Xia 2018-08-29 02:45:39 UTC
Your attachment shows the error message "unable to retrieve the complete list of server APIs".
Just fyi, it is seen in many bugs, here is a search list: https://url.corp.redhat.com/unable-to-retrieve-the-complete-list-of-server-APIs . A short summary: seems most of them cause server problem, some cause client problem (like bug 1623195)

Comment 5 Michal Fojtik 2018-08-29 08:58:35 UTC
From the logs it looks like the metrics api server is stuck in ContainerCreate. We are not able to finalize the namespace until we can reach that server and assure that all resources created by that server were removed.

If we ignore this error, the danger is that when the aggregated API server (metrics) come back and the deleted namespace is recreated, you might gain access to resources that were part of deleted namespace...

The easiest way to fix this is to figure out why the metrics api server is stuck in ContainerCreate and have it up and running, so the namespace finalizer can function properly.

Comment 6 Michal Fojtik 2018-08-29 09:01:53 UTC
Alternatively, you can disable the metrics api service (backup && oc delete?) which will unstuck the namespace controller, however the metrics api server might be needed in next step.

Comment 8 Justin Pierce 2018-08-29 13:57:04 UTC
Moving to online component as this is presently specific to the online environment and the means by which docker pull secrets are managed.