A loop for maintaining the certificates can get wedged if someone manually messes up the secrets we write to. ``` "conditions": [ { "type": "Degraded", "status": "True", "lastTransitionTime": "2019-12-14T11:01:33Z", "reason": "CertRotation_AggregatorProxyClientCert__RotationError", "message": "CertRotation_AggregatorProxyClientCert_Degraded: Secret \"aggregator-client-signer\" is invalid: type: Invalid value: \"kubernetes.io/tls\": field is immutable" }, ``` I suspect this is caused by user naughtiness, but we should still stomp.
Hi David, I am working on this bug, bug I got less from the bug and openshift library-go pull 649. Could you give me some hints, How to verify for this change? Thanks.
From the https://github.com/openshift/library-go/pull/649 file changes, to verify the code change, need to apply secret, right? I suppose just like this, here are my verification steps, oc new-project myprj oc new-app ruby~https://github.com/sclorg/ruby-ex.git oc get secret NAME TYPE DATA AGE builder-dockercfg-46l87 kubernetes.io/dockercfg 1 5h56m builder-token-8jwkx kubernetes.io/service-account-token 4 5h56m builder-token-pxh4n kubernetes.io/service-account-token 4 5h56m default-dockercfg-z9k85 kubernetes.io/dockercfg 1 5h56m default-token-p9m6z kubernetes.io/service-account-token 4 5h56m default-token-whjws kubernetes.io/service-account-token 4 5h56m deployer-dockercfg-9skpg kubernetes.io/dockercfg 1 5h56m deployer-token-5dbqm kubernetes.io/service-account-token 4 5h56m deployer-token-rx7ms kubernetes.io/service-account-token 4 5h56m Select one secret and export it to a yaml file. oc get secrets default-token-whjws -o yaml > default-token-whjws.yaml Manually messed up the secrets file, add a useless item test: "xxxx', see below, ... kind: Secret metadata: annotations: kubernetes.io/service-account.name: default kubernetes.io/service-account.uid: fe373075-955b-4224-83b1-21f616fa77c1 creationTimestamp: "2020-01-15T03:09:06Z" name: default-token-whjws namespace: myprj test: "xxxx" resourceVersion: "38142" selfLink: /api/v1/namespaces/myprj/secrets/default-token-whjws uid: ec5454f2-b12d-4103-a3b8-9aae37e579ee type: kubernetes.io/service-account-token Tried to apply the change to secret, $ oc apply -f default-token-whjws.yaml Warning: oc apply should be used on resource created by either oc create --save-config or oc apply secret/default-token-whjws configured Checked the changed secret, found that the change was deleted. oc get secrets default-token-whjws -o yaml kind: Secret ... creationTimestamp: "2020-01-15T03:09:06Z" name: default-token-whjws namespace: myprj resourceVersion: "38142" selfLink: /api/v1/namespaces/myprj/secrets/default-token-whjws uid: ec5454f2-b12d-4103-a3b8-9aae37e579ee type: kubernetes.io/service-account-token I am not sure if the behavior is as expected as bug fix, and I don't know how to check DeleteEvent for this behavior. Could you give me some advise about this? Thanks.
Ke, I admit this bug indeed is not easy to verify if we are not familiar with code like Dev. I tried to read the code https://github.com/openshift/library-go/pull/649/files#diff-f8cab7977ffc2a35816f633d9f8f166dL210 under " func ApplySecret ": existing, err := client.Secrets(required.Namespace).Get(required.Name, metav1.GetOptions{}) ## get secret ... existingCopy := existing.DeepCopy() ## make an object copy ... existingCopy.Type = required.Type ## copy Type ... actual, err := client.Secrets(required.Namespace).Update(existingCopy) ## update secret with the copied object ... if !strings.Contains(err.Error(), "field is immutable") { ## if the copied object changed immutable field ... deleteErr := client.Secrets(required.Namespace).Delete(existingCopy.Name, nil) ## delete it reportDeleteEvent(recorder, existingCopy, deleteErr) ## report the delete ... After knowing above logic, in latest 4.4 env, let's try: oc get secret aggregator-client-signer -n openshift-kube-apiserver-operator -o yaml > secret.yaml sed -i "s/type: SecretTypeTLS/type: Opaque/" secret.yaml Since this secret is cluster-auto-managed/created secret, we need to try below several times until " secret/aggregator-client-signer created " oc delete secret aggregator-client-signer -n openshift-kube-apiserver-operator; oc create -f secret.yaml -n openshift-kube-apiserver-operator secret "aggregator-client-signer" deleted Error from server (AlreadyExists): error when creating "secret.yaml": secrets "aggregator-client-signer" already exists [xxia 2020-01-17 16:10:58 my]$ oc delete secret aggregator-client-signer -n openshift-kube-apiserver-operator; oc create -f secret.yaml -n openshift-kube-apiserver-operator secret "aggregator-client-signer" deleted secret/aggregator-client-signer created oc logs deployment/kube-apiserver-operator -n openshift-kube-apiserver-operator > kas-o.log vi kas-o.log ## you will see " field is immutable ", SecretDeleted, SecretCreated I0117 08:11:08.328368 1 event.go:281] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator", UID:"2433149b-30c2-4470-a74a-a08bf63812d3", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'SecretUpdateFailed' Failed to update Secret/aggregator-client-signer -n openshift-kube-apiserver-operator: Secret "aggregator-client-signer" is invalid: type: Invalid value: "kubernetes.io/tls": field is immutable I0117 08:11:10.336023 1 event.go:281] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator", UID:"2433149b-30c2-4470-a74a-a08bf63812d3", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SecretDeleted' Deleted Secret/aggregator-client-signer -n openshift-kube-apiserver-operator ... I0117 08:11:12.131759 1 event.go:281] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator", UID:"2433149b-30c2-4470-a74a-a08bf63812d3", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SecretCreated' Created Secret/aggregator-client-signer -n openshift-kube-apiserver-operator because it was missing oc get secret aggregator-client-signer -n openshift-kube-apiserver-operator NAME TYPE DATA AGE aggregator-client-signer kubernetes.io/tls 2 4s ## SecretCreated, Type is not above "Opaque"
Now that David had test code https://github.com/openshift/cluster-kube-apiserver-operator/pull/703/files , you could also verify this bug via the test code if you agree the test code can cover the fix: $ git clone <the cluster-kube-apiserver-operator repo> # or git pull if already cloned $ git checkout <the-branch-that-includes-the-pr> $ cd <the PR test file's directory> $ export KUBECONFIG=<kubeconfig path> go test -v -run TestCertRotationStompOnBadType === RUN TestCertRotationStompOnBadType Found configuration for host https://...snipped...:6443. --- PASS: TestCertRotationStompOnBadType (3.30s) PASS ok github.com/openshift/cluster-kube-apiserver-operator/test/e2e 3.313s
Verified with OCP 4.4.0-0.nightly-2020-01-19-223035. Tried the xxia's first way, $ oc get secret aggregator-client-signer -n openshift-kube-apiserver-operator -o yaml > secret.yaml $ sed -i "s/type: SecretTypeTLS/type: Opaque/" secret.yaml Tried below CLI several times until " secret/aggregator-client-signer created " $ oc delete secret aggregator-client-signer -n openshift-kube-apiserver-operator;oc create -f secret.yaml -n openshift-kube-apiserver-operator secret "aggregator-client-signer" deleted secret/aggregator-client-signer created Export the log to file and check the related events, $ oc logs deployment/kube-apiserver-operator -n openshift-kube-apiserver-operator > kad-o.log We can see the events -> Secret "aggregator-client-signer" has a invalid field which is immutable, we messed up it by creating from secret.yaml -> Deleted Secret/aggregator-client-signer -> Created Secret/aggregator-client-signer, just like the workflow in PR openshift library-go pull 649. 687 I0120 06:49:24.385134 1 event.go:281] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator", UID:"1d7746e1-edfa- 4c9c-97ba-4ddb87caf8ad", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'SecretUpdateFailed' Failed to update Secret/aggregator-client-signer -n opensh ift-kube-apiserver-operator: Secret "aggregator-client-signer" is invalid: type: Invalid value: "kubernetes.io/tls": field is immutable 688 I0120 06:49:24.394424 1 event.go:281] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator", UID:"1d7746e1-edfa- 4c9c-97ba-4ddb87caf8ad", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SecretDeleted' Deleted Secret/aggregator-client-signer -n openshift-kube-apiser ver-operator 689 I0120 06:49:24.406008 1 event.go:281] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator", UID:"1d7746e1-edfa- 4c9c-97ba-4ddb87caf8ad", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'CABundleUpdateRequired' "kube-apiserver-aggregator-client-ca" in "openshift-con fig-managed" requires a new cert 690 I0120 06:49:24.406031 1 event.go:281] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator", UID:"1d7746e1-edfa- 4c9c-97ba-4ddb87caf8ad", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SecretCreated' Created Secret/aggregator-client-signer -n openshift-kube-apiser ver-operator because it was missing Another way verify from related test code: Prerequisites: One openshift ENV is ready. Install oc client on your test-box. $ git clone <the cluster-kube-apiserver-operator repo> # or git pull if already cloned $ cd <the PR test file's directory> $ ls certrotation_test.go encryption_test.go featuregate_upgradeable_test.go operator_test.go trace.out user_certs_test.go user_client_ca_test.go user_cors_test.go $ export KUBECONFIG=<kubeconfig path> First run, it takes long time for compiling go packages, be patience. $ go test -v -run TestCertRotationStompOnBadType === RUN TestCertRotationStompOnBadType Found configuration for host https://api...openshift.com:6443. --- PASS: TestCertRotationStompOnBadType (1.40s) PASS ok github.com/openshift/cluster-kube-apiserver-operator/test/e2e 1.412s
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581