Bug 1889003

Summary: few secrets got deleted while upgrading cluster from 4.5.6 to 4.5.13
Product: OpenShift Container Platform Reporter: Sudarshan Chaudhari <suchaudh>
Component: EtcdAssignee: Suresh Kolichala <skolicha>
Status: CLOSED NOTABUG QA Contact: ge liu <geliu>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.5CC: aabhishe, aos-bugs, dahernan, dmace, sbatsche, skolicha, sttts, wlewis
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-04-15 01:48:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sudarshan Chaudhari 2020-10-16 18:33:43 UTC
Description of problem:
While upgrading the cluster form 4.5.6 to 4.5.13, the folowing secrets were removed and are not created causing unstability in the cluster as multiple cluster-operators are in Degraded state:
- etcd-client
- etcd-metric-client
- etcd-metric-signer
- etcd-signer
- pull-secret


Post upgrade succeeded, the following errors were observed in the operators:


kube-apiserver:
~~~
message: 'RevisionControllerDegraded: secrets "etcd-client" not found'
~~~

image-registry:
~~~
message: 'Progressing: Unable to apply resources: unable to apply objects: failed to update object *v1.Secret, Namespace=openshift-image-registry, Name=installation-pull-secrets: Secret "installation-pull-secrets" is invalid: data[.dockerconfigjson]: Required
~~~

machine-config:
~~~
message: 'Failed to resync 4.5.13 because: timed out waiting for the condition during waitForControllerConfigToBeCompleted: controllerconfig is not completed: ControllerConfig has not completed: completed(false) running(false) failing(true)'
~~~

Checking the logs, we could not find why the secrets were deleted.

On comparing with the working cluster, it seems that most of the deleted secrets are managed by "cluster-bootstrap".

What we are looking to know:
- why the secrets were deleted?
- as they are managed by cluster-bootstrap, what is the process to create all the lost secrets?
- is there any operator which should be resonsible to manage and create the secrets?


How reproducible:
Always

Steps to Reproduce:
1. delete the secret pull-secret or any other secret in openshift-config project

Actual results:
The secrets ere lost during upgrade and were not created automatically.

Expected results:
The secrets should be created automatically

Additional info:
- attaching the must-gather


Additional info:
I created the

Comment 2 Sudarshan Chaudhari 2020-11-06 01:17:37 UTC
Hello @Suresh

Is there any update on this?

Is there any way we can re-create the secrets created by openshift-installer while extracting the information from the cluster in current state?

Let me know if there is any additional information needed from Support or from Customer cluster, I will get it for investigation.

Comment 16 Suresh Kolichala 2021-04-15 01:48:13 UTC
Closing this BZ. Created an RFE for providing a way to recover when the signers are missing.

https://issues.redhat.com/browse/RFE-1790