Bug 1785498

Summary:

`regenerate-certificates` command blocked by error `illegal base64 data at input byte 3`

Product:

OpenShift Container Platform

Reporter:

zhou ying <yinzhou>

Component:

kube-apiserver

Assignee:

Lukasz Szaszkiewicz <lszaszki>

Status:

CLOSED ERRATA

QA Contact:

Xingxing Xia <xxia>

Severity:

high

Docs Contact:

Priority:

high

Version:

4.3.0

CC:

aos-bugs, lszaszki, mfojtik

Target Milestone:

---

Target Release:

4.4.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

No Doc Update

Doc Text:

Story Points:

---

Clone Of:

Clones:

1789655 1802161 (view as bug list)

Environment:

Last Closed:

2020-05-04 11:20:55 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1789655, 1802161

Attachments:

Description	Flags
inspect result	none

Description zhou ying 2019-12-20 03:15:11 UTC

Description of problem:
Run the regenerate-certificates command on master failed with error:
E1220 02:19:38.188639       1 reflector.go:123] k8s.io/client-go/informers/factory.go:134: Failed to list *v1.ConfigMap: illegal base64 data at input byte 3
E1220 02:19:38.390169       1 reflector.go:123] k8s.io/client-go/informers/factory.go:134: Failed to list *v1.Secret: illegal base64 data at input byte 3


Version-Release number of selected component (if applicable):
Payload: 4.3.0-0.nightly-2019-12-13-180405

How reproducible:
Sometimes

Steps to Reproduce:
1. Follow the doc: https://docs.openshift.com/container-platform/4.2/backup_and_restore/disaster_recovery/scenario-3-expired-certs.html  to do certificate recovery;


Actual results:
1.  Failed when run regenerate-certificates command on master:
[root@control-plane-0 ~]# podman run -it --network=host -v /etc/kubernetes/:/etc/kubernetes/:Z --entrypoint=/usr/bin/cluster-kube-apiserver-operator "${KAO_IMAGE}" regenerate-certificates
I1220 02:11:21.185177       1 certrotationcontroller.go:492] Waiting for CertRotation
E1220 02:11:21.210381       1 reflector.go:123] k8s.io/client-go/informers/factory.go:134: Failed to list *v1.Secret: illegal base64 data at input byte 3
E1220 02:11:21.210392       1 reflector.go:123] k8s.io/client-go/informers/factory.go:134: Failed to list *v1.ConfigMap: illegal base64 data at input byte 3
...many repetions of above E1220 without stop...

Expected results:
1. Should succeed.


Additional info:

Comment 1 zhou ying 2019-12-20 05:32:13 UTC

[root@control-plane-0 ~]# oc adm must-gather
[must-gather      ] OUT the server is currently unable to handle the request (get imagestreams.image.openshift.io must-gather)
[must-gather      ] OUT 
[must-gather      ] OUT Using must-gather plugin-in image: quay.io/openshift/origin-must-gather:latest
[must-gather      ] OUT namespace/openshift-must-gather-56qwg created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-6qml5 created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-6qml5 deleted
[must-gather      ] OUT namespace/openshift-must-gather-56qwg deleted
Error from server (Forbidden): pods "must-gather-" is forbidden: error looking up service account openshift-must-gather-56qwg/default: serviceaccount "default" not found

Comment 2 zhou ying 2019-12-20 05:33:05 UTC

Created attachment 1646739 [details]
inspect result

Comment 3 Lukasz Szaszkiewicz 2020-02-12 08:04:16 UTC

the root cause of the issue was that the recovery API didn't know how to decrypt the encrypted content from the DB.

please validate the fix on an encrypted cluster.

Comment 5 Xingxing Xia 2020-03-13 02:20:22 UTC

The 4.4 certs disaster recovery bug 1771410 is verified with successful auto recovery, that process covers this bug's issue. Thus moving to VERIFIED directly.

Comment 6 Xingxing Xia 2020-03-13 03:39:00 UTC

(In reply to Lukasz Szaszkiewicz from comment #3)
> please validate the fix on an encrypted cluster.
Ah, didn't notice this. Will try on Etcd Encrypted cluster later.

Comment 7 Xingxing Xia 2020-03-19 11:15:44 UTC

Installed 4.4.0-0.nightly-2020-03-18-102708 ipi on aws env, enabled etcd encryption.
Then broke the cluster per google document of bug 1771410, waited for certs expired, then re-start masters, clusters can come back well, control plane certs can recover automatically, oc get po/co/no and other oc basic operations (new-project, new-app, rsh etc) have no problem, kas 4 containers logs no abnormality. In short, the bug issue is not seen now.

Comment 9 errata-xmlrpc 2020-05-04 11:20:55 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581