1785498 – `regenerate-certificates` command blocked by error `illegal base64 data at input byte 3`

Bug 1785498 - `regenerate-certificates` command blocked by error `illegal base64 data at input byte 3`

Summary: `regenerate-certificates` command blocked by error `illegal base64 data at in...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-apiserver
Sub Component:
Version:	4.3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.4.0
Assignee:	Lukasz Szaszkiewicz
QA Contact:	Xingxing Xia
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1789655 1802161
TreeView+	depends on / blocked

Reported:	2019-12-20 03:15 UTC by zhou ying
Modified:	2020-05-04 11:21 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Clones:	1789655 1802161 (view as bug list)
Environment:
Last Closed:	2020-05-04 11:20:55 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
inspect result (826.21 KB, application/gzip) 2019-12-20 05:33 UTC, zhou ying	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-kube-apiserver-operator pull 761	0	None	closed	Bug 1785498: regenerate-certificates command blocked by error illegal base64 data at input byte 3	2020-12-15 14:27:10 UTC
Red Hat Product Errata	RHBA-2020:0581	0	None	None	None	2020-05-04 11:21:35 UTC

Description zhou ying 2019-12-20 03:15:11 UTC

Description of problem:
Run the regenerate-certificates command on master failed with error:
E1220 02:19:38.188639       1 reflector.go:123] k8s.io/client-go/informers/factory.go:134: Failed to list *v1.ConfigMap: illegal base64 data at input byte 3
E1220 02:19:38.390169       1 reflector.go:123] k8s.io/client-go/informers/factory.go:134: Failed to list *v1.Secret: illegal base64 data at input byte 3


Version-Release number of selected component (if applicable):
Payload: 4.3.0-0.nightly-2019-12-13-180405

How reproducible:
Sometimes

Steps to Reproduce:
1. Follow the doc: https://docs.openshift.com/container-platform/4.2/backup_and_restore/disaster_recovery/scenario-3-expired-certs.html  to do certificate recovery;


Actual results:
1.  Failed when run regenerate-certificates command on master:
[root@control-plane-0 ~]# podman run -it --network=host -v /etc/kubernetes/:/etc/kubernetes/:Z --entrypoint=/usr/bin/cluster-kube-apiserver-operator "${KAO_IMAGE}" regenerate-certificates
I1220 02:11:21.185177       1 certrotationcontroller.go:492] Waiting for CertRotation
E1220 02:11:21.210381       1 reflector.go:123] k8s.io/client-go/informers/factory.go:134: Failed to list *v1.Secret: illegal base64 data at input byte 3
E1220 02:11:21.210392       1 reflector.go:123] k8s.io/client-go/informers/factory.go:134: Failed to list *v1.ConfigMap: illegal base64 data at input byte 3
...many repetions of above E1220 without stop...

Expected results:
1. Should succeed.


Additional info:

Comment 1 zhou ying 2019-12-20 05:32:13 UTC

[root@control-plane-0 ~]# oc adm must-gather
[must-gather      ] OUT the server is currently unable to handle the request (get imagestreams.image.openshift.io must-gather)
[must-gather      ] OUT 
[must-gather      ] OUT Using must-gather plugin-in image: quay.io/openshift/origin-must-gather:latest
[must-gather      ] OUT namespace/openshift-must-gather-56qwg created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-6qml5 created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-6qml5 deleted
[must-gather      ] OUT namespace/openshift-must-gather-56qwg deleted
Error from server (Forbidden): pods "must-gather-" is forbidden: error looking up service account openshift-must-gather-56qwg/default: serviceaccount "default" not found

Comment 2 zhou ying 2019-12-20 05:33:05 UTC

Created attachment 1646739 [details]
inspect result

Comment 3 Lukasz Szaszkiewicz 2020-02-12 08:04:16 UTC

the root cause of the issue was that the recovery API didn't know how to decrypt the encrypted content from the DB.

please validate the fix on an encrypted cluster.

Comment 5 Xingxing Xia 2020-03-13 02:20:22 UTC

The 4.4 certs disaster recovery bug 1771410 is verified with successful auto recovery, that process covers this bug's issue. Thus moving to VERIFIED directly.

Comment 6 Xingxing Xia 2020-03-13 03:39:00 UTC

(In reply to Lukasz Szaszkiewicz from comment #3)
> please validate the fix on an encrypted cluster.
Ah, didn't notice this. Will try on Etcd Encrypted cluster later.

Comment 7 Xingxing Xia 2020-03-19 11:15:44 UTC

Installed 4.4.0-0.nightly-2020-03-18-102708 ipi on aws env, enabled etcd encryption.
Then broke the cluster per google document of bug 1771410, waited for certs expired, then re-start masters, clusters can come back well, control plane certs can recover automatically, oc get po/co/no and other oc basic operations (new-project, new-app, rsh etc) have no problem, kas 4 containers logs no abnormality. In short, the bug issue is not seen now.

Comment 9 errata-xmlrpc 2020-05-04 11:20:55 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581

Note You need to log in before you can comment on or make changes to this bug.