Bug 1785498

Summary: `regenerate-certificates` command blocked by error `illegal base64 data at input byte 3`
Product: OpenShift Container Platform Reporter: zhou ying <yinzhou>
Component: kube-apiserverAssignee: Lukasz Szaszkiewicz <lszaszki>
Status: CLOSED ERRATA QA Contact: Xingxing Xia <xxia>
Severity: high Docs Contact:
Priority: high    
Version: 4.3.0CC: aos-bugs, lszaszki, mfojtik
Target Milestone: ---   
Target Release: 4.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1789655 1802161 (view as bug list) Environment:
Last Closed: 2020-05-04 11:20:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1789655, 1802161    
Attachments:
Description Flags
inspect result none

Description zhou ying 2019-12-20 03:15:11 UTC
Description of problem:
Run the regenerate-certificates command on master failed with error:
E1220 02:19:38.188639       1 reflector.go:123] k8s.io/client-go/informers/factory.go:134: Failed to list *v1.ConfigMap: illegal base64 data at input byte 3
E1220 02:19:38.390169       1 reflector.go:123] k8s.io/client-go/informers/factory.go:134: Failed to list *v1.Secret: illegal base64 data at input byte 3


Version-Release number of selected component (if applicable):
Payload: 4.3.0-0.nightly-2019-12-13-180405

How reproducible:
Sometimes

Steps to Reproduce:
1. Follow the doc: https://docs.openshift.com/container-platform/4.2/backup_and_restore/disaster_recovery/scenario-3-expired-certs.html  to do certificate recovery;


Actual results:
1.  Failed when run regenerate-certificates command on master:
[root@control-plane-0 ~]# podman run -it --network=host -v /etc/kubernetes/:/etc/kubernetes/:Z --entrypoint=/usr/bin/cluster-kube-apiserver-operator "${KAO_IMAGE}" regenerate-certificates
I1220 02:11:21.185177       1 certrotationcontroller.go:492] Waiting for CertRotation
E1220 02:11:21.210381       1 reflector.go:123] k8s.io/client-go/informers/factory.go:134: Failed to list *v1.Secret: illegal base64 data at input byte 3
E1220 02:11:21.210392       1 reflector.go:123] k8s.io/client-go/informers/factory.go:134: Failed to list *v1.ConfigMap: illegal base64 data at input byte 3
...many repetions of above E1220 without stop...

Expected results:
1. Should succeed.


Additional info:

Comment 1 zhou ying 2019-12-20 05:32:13 UTC
[root@control-plane-0 ~]# oc adm must-gather
[must-gather      ] OUT the server is currently unable to handle the request (get imagestreams.image.openshift.io must-gather)
[must-gather      ] OUT 
[must-gather      ] OUT Using must-gather plugin-in image: quay.io/openshift/origin-must-gather:latest
[must-gather      ] OUT namespace/openshift-must-gather-56qwg created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-6qml5 created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-6qml5 deleted
[must-gather      ] OUT namespace/openshift-must-gather-56qwg deleted
Error from server (Forbidden): pods "must-gather-" is forbidden: error looking up service account openshift-must-gather-56qwg/default: serviceaccount "default" not found

Comment 2 zhou ying 2019-12-20 05:33:05 UTC
Created attachment 1646739 [details]
inspect result

Comment 3 Lukasz Szaszkiewicz 2020-02-12 08:04:16 UTC
the root cause of the issue was that the recovery API didn't know how to decrypt the encrypted content from the DB.

please validate the fix on an encrypted cluster.

Comment 5 Xingxing Xia 2020-03-13 02:20:22 UTC
The 4.4 certs disaster recovery bug 1771410 is verified with successful auto recovery, that process covers this bug's issue. Thus moving to VERIFIED directly.

Comment 6 Xingxing Xia 2020-03-13 03:39:00 UTC
(In reply to Lukasz Szaszkiewicz from comment #3)
> please validate the fix on an encrypted cluster.
Ah, didn't notice this. Will try on Etcd Encrypted cluster later.

Comment 7 Xingxing Xia 2020-03-19 11:15:44 UTC
Installed 4.4.0-0.nightly-2020-03-18-102708 ipi on aws env, enabled etcd encryption.
Then broke the cluster per google document of bug 1771410, waited for certs expired, then re-start masters, clusters can come back well, control plane certs can recover automatically, oc get po/co/no and other oc basic operations (new-project, new-app, rsh etc) have no problem, kas 4 containers logs no abnormality. In short, the bug issue is not seen now.

Comment 9 errata-xmlrpc 2020-05-04 11:20:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581