Bug 1900635 - [oVirt] installation fails bootstrap phase because of kube apiserver pod crashlooping
Summary: [oVirt] installation fails bootstrap phase because of kube apiserver pod cras...
Keywords:
Status: CLOSED DUPLICATE of bug 1900446
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-apiserver
Version: 4.7
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.7.0
Assignee: Stefan Schimanski
QA Contact: Ke Wang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-23 13:09 UTC by Gal Zaidman
Modified: 2020-11-23 23:53 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-23 23:53:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-kube-apiserver-operator pull 1011 0 None open Bug 1900635: Revert "Merge pull request #1006 from abhinavdahiya/user-provided-sa-signing-key" 2020-11-23 13:16:21 UTC

Description Gal Zaidman 2020-11-23 13:09:09 UTC
Description of problem:
On oVirt CI we see jobs that fail on bootstrap phase due to kube-apiserver pods crash looping, See Examples[1]:

When we look at the nodes[2] of one of the failing jobs we see that they all have "taints: NoSchedule" that the operators don't tolerate and therefore they are not installed.

When we look at the pods[3] we see that kube-apiserver has "CrashLoopBackOff"

[1] Examples:
- https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-ovirt-4.7/1330781822644129792
- https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-ovirt-4.7/1330615017845821440

[2] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-ovirt-4.7/1330781822644129792/artifacts/e2e-ovirt/nodes.json

[3] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-ovirt-4.7/1330781822644129792/artifacts/e2e-ovirt/pods.json

On the operator logs I see:
"
W1123 08:27:04.284811       1 staticpod.go:37] revision 6 is unexpectedly already the latest available revision. This is a possible race!
E1123 08:27:04.293782       1 base_controller.go:250] "RevisionController" controller failed to sync "key", err: conflicting latestAvailableRevision 6
"

On the pods I see a lot of connection refused

Additional info:
On previous jobs I the same issue but only 2 nodes (masters) joined the cluster, examples:
- https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-ovirt-4.7/1330112463709933568
- https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-ovirt-4.7/1329706492684668928

Comment 1 Stefan Schimanski 2020-11-23 13:12:33 UTC
What is in the logs of the crashlooping kube-apiserver?

Comment 2 Stefan Schimanski 2020-11-23 13:17:31 UTC
Kube-apiserver report this:

  Error: error reading public key file /etc/kubernetes/static-pod-resources/configmaps/bound-sa-token-signing-certs/service-account-001.pub: data does not contain any valid RSA or ECDSA public keys

This is very probably due to https://github.com/openshift/cluster-kube-apiserver-operator/pull/1006, to be reverted in https://github.com/openshift/cluster-kube-apiserver-operator/pull/1011.

Comment 3 Gal Zaidman 2020-11-23 13:26:58 UTC
(In reply to Stefan Schimanski from comment #2)
> Kube-apiserver report this:
> 
>   Error: error reading public key file
> /etc/kubernetes/static-pod-resources/configmaps/bound-sa-token-signing-certs/
> service-account-001.pub: data does not contain any valid RSA or ECDSA public
> keys
> 
> This is very probably due to
> https://github.com/openshift/cluster-kube-apiserver-operator/pull/1006, to
> be reverted in
> https://github.com/openshift/cluster-kube-apiserver-operator/pull/1011.

Thanks for the fastest reply ever!
I see that #1011 is about to get merged so I will track the upcoming jobs and update on this Bug incase the issue is resolved

Comment 5 Xingxing Xia 2020-11-23 23:53:37 UTC

*** This bug has been marked as a duplicate of bug 1900446 ***


Note You need to log in before you can comment on or make changes to this bug.