1848956 – KMP requires downtime for CA stabilization during certificate rotation

Bug 1848956 - KMP requires downtime for CA stabilization during certificate rotation

Summary: KMP requires downtime for CA stabilization during certificate rotation

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	2.4.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	2.6.0
Assignee:	Petr Horáček
QA Contact:	Ofir Nash
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-06-19 11:33 UTC by Geetika Kapoor
Modified:	2021-03-10 11:17 UTC (History)
CC List:	3 users (show)
Fixed In Version:	cluster-network-addons-operator-container-v2.5.0-8
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-03-10 11:16:12 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
vm-fedora (911 bytes, text/plain) 2020-12-27 10:04 UTC, Ofir Nash	no flags	Details
kmp-namespace (120 bytes, text/plain) 2020-12-27 10:06 UTC, Ofir Nash	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2021:0799	0	None	None	None	2021-03-10 11:17:36 UTC

Description Geetika Kapoor 2020-06-19 11:33:10 UTC

Description of problem:

Downtime of 2-3 mins is seen during CA rotation where no new vm's can be created. This downtime varies on kubelet config , certificate keysize and handshake process  (TLS).
https://kubernetes.io/docs/concepts/configuration/secret/#mounted-secrets-are-updated-automatically

During this time, CA is unstable so TLS connection is broken and creation of vm's is failure if vm is created under ns that has label mutatevirtualmachines.kubemacpool.io=allocate

Common failures seen during this time are :

Error from server (InternalError): error when creating "vm_create.yaml": Internal error occurred: failed calling webhook "mutatevirtualmachines.kubemacpool.io": Post https://kubemacpool-service.openshift-cnv.svc:443/mutate-virtualmachines?timeout=30s: x509: certificate signed by unknown authority 

Error from server (InternalError): error when creating "vm_create.yaml": Internal error occurred: failed calling webhook "mutatevirtualmachines.kubemacpool.io": Post https://kubemacpool-service.openshift-cnv.svc:443/mutate-virtualmachines?timeout=30s: dial tcp 10.128.2.32:8000: connect: connection refused



Version-Release number of selected component (if applicable):


$ oc get csv -n openshift-cnv | awk ' { print $4 } ' | tail -n1
2.4.0

How reproducible:

always 

Steps to Reproduce:
1. Create a certificate and apply it on cabundle of mutatingwebhookconfiguration
2. Scripts used : https://github.com/k8snetworkplumbingwg/kubemacpool/pull/193
3.

Actual results:

KMP becomes unstable and unable to process request when CA bundle gets unstable


Expected results:

Ideal approach should be customer/admin should be able to configure certificate rotation policy based on their downtime/need/availability. Also, this downtime should be reduced when system is in unstable state or some rescheduling policy.

Additional info:

Comment 1 Petr Horáček 2020-06-19 11:36:24 UTC

Thanks for opening this.

Since the rotation interval is quite long and the downtime happens only on opted in namespaces, I suggest we handle this in 2.5 (and not as a 2.4 blocker).

Comment 3 Petr Horáček 2020-09-03 12:07:19 UTC

We need HCO to expose rotation parameters on its API. That will happen only in 2.6.

Comment 4 Ofir Nash 2020-12-27 10:04:36 UTC

Created attachment 1742246 [details]
vm-fedora

VM Fedora with namespace: kmp-ns-bug

Comment 5 Ofir Nash 2020-12-27 10:06:49 UTC

Created attachment 1742247 [details]
kmp-namespace

Comment 6 Ofir Nash 2020-12-27 10:07:52 UTC

Verified.

Steps verified:
1. Create a certificate with the given scripts.
2. Create namespace with label: "mutatevirtualmachines.kubemacpool.io: allocate" and apply (oc apply -f namespace.yaml) - Attached namespace.yaml
3. Create VM under the namespace created - Attached vm-fedora.yaml
4. Check that VM is created successfully and running, KMP pods are running.
5. Delete VM works successfully without latency/downtime.

Comment 7 Ofir Nash 2020-12-27 10:09:27 UTC

Comment on attachment 1742247 [details]
kmp-namespace

KMP Namespace example - kmp-ns-bug.
Has label: "mutatevirtualmachines.kubemacpool.io: allocate"

Comment 10 errata-xmlrpc 2021-03-10 11:16:12 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 2.6.0 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0799

Note You need to log in before you can comment on or make changes to this bug.