Bug 2000653 - Add hypershift namespace to exclude namespaces list in descheduler configmap
Summary: Add hypershift namespace to exclude namespaces list in descheduler configmap
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-scheduler
Version: 4.9
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.11.0
Assignee: Jan Chaloupka
QA Contact: RamaKasturi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-02 15:56 UTC by RamaKasturi
Modified: 2022-08-10 10:37 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Feature: exclude hypershift namespace from eviction Reason: hypershift alongside all openshift- prefixed namespaces is protected as well Result: pods are no longer evicted from the hypershift namespace
Clone Of:
Environment:
Last Closed: 2022-08-10 10:37:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-kube-descheduler-operator pull 244 0 None open Bug 2000653: Customize threshold priority params strategies wide, exclude hyperkube namespace 2022-03-09 15:57:08 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:37:45 UTC

Description RamaKasturi 2021-09-02 15:56:24 UTC
Description of problem:
When descheduler is installed on a cluster which has hypershift, it will evict hypershift-operator from the cluster,to prevent this we should add hypershift namespace to list of excluded namespaces.

*Version-Release number of selected components (if applicable):*
 4.9.0-202108261630

How Reproducible:
Always

Steps to Reproduce:
Install 4.9 cluster
Install hypershift operator
Configure descheduler to evict pods which has lifetime less than 15 mins

Actual Results:
     You could see that pods in hypershift namespace get evicted

Expected Results:
     Operator pods in hypershift namespace should not get evicted.

Additional Info:
    Looks like there is another namespace along with hypershift which will be needed i ocp, we might have to ignore that as well at a later point of time.

Comment 2 RamaKasturi 2021-09-07 15:14:49 UTC
Adding info for reference:
============================
currently ns hypershift is fixed where hypershift operator is installed, there is another ns where some more components is installed which are 'clusters-jz-test'  but the name is not fixed yet

knarra@knarra cucushift]$ oc get pods -n hypershift
NAME                        READY   STATUS    RESTARTS   AGE
operator-5f8d444fd4-mjd9z   1/1     Running   0          13m

@heliu could you please help confirm if the above is true? Also if there are any other namespaces where hypershift components will be installed ?

Comment 3 Mike Dame 2021-09-07 15:39:47 UTC
We cannot currently prevent the descheduler operator from evicting in non-static namespaces (ie, we can exclude "openshift-*" "hypershift*", etc, but not "randomNS1234")

@heli what we need to know is if we are seeing any issues specifically resulting from the descheduler evicting pods in namespaces besides "hypershift". Rama shared a slack thread[1] where you showed some pods in "clusters-jz-test" were in CrashLoopBackOff due to a lack of etcd pod. Was this due to the etcd pod being evicted by the descheduler? 

If not, then we can just exclude the "hypershift" namespace

[1] https://coreos.slack.com/archives/CH76YSYSC/p1630485091436500

Comment 4 He Liu 2021-09-13 04:15:55 UTC
@@mdame

Comment 5 He Liu 2021-09-13 04:27:05 UTC
@knarra @mdame@mdame In OCP Hypershift env, the pod in namespace hypershift is Hypershift operator. It can be recovered automatically. In another namespace called "clusters-{clustername}", the pods can not be recovered automatically because they are the control plane of a guest OCP cluster. And currently the etcd pod in that namespace is not HA mode. Once etcd pod crashed, the whole control plane would be crashed too.

Here is the existing jira bug link: https://issues.redhat.com/browse/HOSTEDCP-181 Developers will fix that by supporting etcd HA mode.

So please ignore the crash pods in namespace "clusters-{clustername}". During the test triggered by RamaKasturi, I recovered those pods manually to support her test at that time.

Comment 6 He Liu 2021-09-13 04:27:54 UTC
@knarra @mdame@mdame In OCP Hypershift env, the pod in namespace hypershift is Hypershift operator. It can be recovered automatically. In another namespace called "clusters-{clustername}", the pods can not be recovered automatically because they are the control plane of a guest OCP cluster. And currently the etcd pod in that namespace is not HA mode. Once etcd pod crashed, the whole control plane would be crashed too.

Here is the existing jira bug link: https://issues.redhat.com/browse/HOSTEDCP-181 Developers will fix that by supporting etcd HA mode.

So please ignore the crash pods in namespace "clusters-{clustername}". During the test triggered by RamaKasturi, I recovered those pods manually to support her test at that time.

Comment 7 Mike Dame 2021-09-14 13:03:07 UTC
Okay, so since the developers are fixing the etcd bug then it is sufficient for us to just exclude the "hypershift" namespace?

I'm confused since you said the "clusters-{clustername}" pods can't be recovered automatically. If that's the case, then descheduler will evict them (if they do not have a system critical priority or other disqualifying factors for eviction)

Comment 9 Maciej Szulik 2022-01-28 12:26:17 UTC
Mike had a PR opened in https://github.com/openshift/cluster-kube-descheduler-operator/pull/218

Comment 11 He Liu 2022-01-29 06:59:34 UTC
@knarra @mdame etcd HA has been completed. https://issues.redhat.com/browse/HOSTEDCP-252. Now hypershift could create a HA guest cluster.

Comment 12 Jan Chaloupka 2022-02-18 11:23:15 UTC
Due to higher priority tasks I have been able to resolve this issue in time. Moving to the next sprint.

Comment 14 RamaKasturi 2022-05-10 09:02:07 UTC
Hello Jan,

   I tried to verify the bug here and i see that pods in hypershift namespace does not get evicted but besides that there will be control plane namespaces named clusters-{guest-cluster-name} that each represents a guest cluster.They all should be protected somehow  from being evicted. 

   Currently i tried to enable LifeCycleAndUtilization with podLifeTimeSeconds to be 5m, EvictPodswithPVC, EvictPodswithLocalStorage and i see that all pods in clusters-hypershift-ci-20310 gets evicted which should not happen. Is there a way we can prevent this ?

Descheduler config:
========================
[knarra@knarra verification-tests]$ oc get kubedescheduler cluster -o yaml -n openshift-kube-descheduler-operator
apiVersion: operator.openshift.io/v1
kind: KubeDescheduler
metadata:
  creationTimestamp: "2022-05-10T07:15:17Z"
  generation: 2
  name: cluster
  namespace: openshift-kube-descheduler-operator
  resourceVersion: "100001"
  uid: aad9b447-f4dc-4667-a063-1ce60d14549a
spec:
  deschedulingIntervalSeconds: 3600
  logLevel: TraceAll
  managementState: Managed
  mode: Predictive
  observedConfig:
    servingInfo:
      cipherSuites:
      - TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
      - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
      - TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
      - TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
      - TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256
      - TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
      minTLSVersion: VersionTLS12
  operatorLogLevel: Normal
  profileCustomizations:
    podLifetime: 5m0s
  profiles:
  - EvictPodsWithPVC
  - LifecycleAndUtilization
  - EvictPodsWithLocalStorage
  unsupportedConfigOverrides: null
status:
  conditions:
  - lastTransitionTime: "2022-05-10T07:15:17Z"
    status: "False"
    type: ResourceSyncControllerDegraded
  - lastTransitionTime: "2022-05-10T07:15:19Z"
    status: "False"
    type: ConfigObservationDegraded
  - lastTransitionTime: "2022-05-10T07:15:21Z"
    status: "False"
    type: TargetConfigControllerDegraded
  generations:
  - group: apps
    hash: ""
    lastGeneration: 9
    name: descheduler
    namespace: openshift-kube-descheduler-operator
    resource: deployments
  readyReplicas: 0

Logs from Descheduler:
===============================
[knarra@knarra verification-tests]$ oc logs descheduler-69c9564569-dfz4r -n openshift-kube-descheduler-operator | grep "clusters-hypershift-ci-20310" | grep "dry run mode"
I0510 07:15:33.087625       1 evictions.go:158] "Evicted pod in dry run mode" pod="clusters-hypershift-ci-20310/redhat-operators-catalog-c6dc64c94-n8jdw" reason="PodLifeTime"
I0510 07:15:33.087687       1 evictions.go:158] "Evicted pod in dry run mode" pod="clusters-hypershift-ci-20310/cluster-api-5c54bb49fd-2ff5c" reason="PodLifeTime"
I0510 07:15:33.087753       1 evictions.go:158] "Evicted pod in dry run mode" pod="clusters-hypershift-ci-20310/ingress-operator-758bc7f9b-hm8sg" reason="PodLifeTime"
I0510 07:15:33.087801       1 evictions.go:158] "Evicted pod in dry run mode" pod="clusters-hypershift-ci-20310/redhat-marketplace-catalog-6657dfd8f9-dhwbj" reason="PodLifeTime"
I0510 07:15:33.087849       1 evictions.go:158] "Evicted pod in dry run mode" pod="clusters-hypershift-ci-20310/konnectivity-server-574f947985-j94mt" reason="PodLifeTime"
I0510 07:15:33.087930       1 evictions.go:158] "Evicted pod in dry run mode" pod="clusters-hypershift-ci-20310/kube-apiserver-c8b7d4954-f878z" reason="PodLifeTime"
I0510 07:15:33.087987       1 evictions.go:158] "Evicted pod in dry run mode" pod="clusters-hypershift-ci-20310/olm-operator-5675794986-7cdht" reason="PodLifeTime"
I0510 07:15:33.088044       1 evictions.go:158] "Evicted pod in dry run mode" pod="clusters-hypershift-ci-20310/capi-provider-7dc7c985c-czjsc" reason="PodLifeTime"
I0510 07:15:33.088095       1 evictions.go:158] "Evicted pod in dry run mode" pod="clusters-hypershift-ci-20310/cluster-policy-controller-76756b6d9-crmrd" reason="PodLifeTime"
I0510 07:15:33.088148       1 evictions.go:158] "Evicted pod in dry run mode" pod="clusters-hypershift-ci-20310/etcd-0" reason="PodLifeTime"
I0510 07:15:33.088201       1 evictions.go:158] "Evicted pod in dry run mode" pod="clusters-hypershift-ci-20310/hosted-cluster-config-operator-5898786db5-wggwc" reason="PodLifeTime"
I0510 07:15:33.088249       1 evictions.go:158] "Evicted pod in dry run mode" pod="clusters-hypershift-ci-20310/ignition-server-675f877d7f-6d576" reason="PodLifeTime"
I0510 07:15:33.088304       1 evictions.go:158] "Evicted pod in dry run mode" pod="clusters-hypershift-ci-20310/oauth-openshift-79dcf8d678-qfkk8" reason="PodLifeTime"
I0510 07:15:33.088368       1 evictions.go:158] "Evicted pod in dry run mode" pod="clusters-hypershift-ci-20310/packageserver-67ccd68698-lcnnn" reason="PodLifeTime"
I0510 07:15:33.088439       1 evictions.go:158] "Evicted pod in dry run mode" pod="clusters-hypershift-ci-20310/cluster-version-operator-7f76dbd47c-jjj55" reason="PodLifeTime"
I0510 07:15:33.088487       1 evictions.go:158] "Evicted pod in dry run mode" pod="clusters-hypershift-ci-20310/konnectivity-agent-6d85b46746-vkpfd" reason="PodLifeTime"
I0510 07:15:33.088540       1 evictions.go:158] "Evicted pod in dry run mode" pod="clusters-hypershift-ci-20310/machine-approver-7d999d966b-jdqsz" reason="PodLifeTime"
I0510 07:15:33.088605       1 evictions.go:158] "Evicted pod in dry run mode" pod="clusters-hypershift-ci-20310/openshift-apiserver-8588b66bdb-2kz5k" reason="PodLifeTime"
I0510 07:15:33.088650       1 evictions.go:158] "Evicted pod in dry run mode" pod="clusters-hypershift-ci-20310/openshift-controller-manager-69f94fcc6b-5bx6d" reason="PodLifeTime"
I0510 07:15:33.088705       1 evictions.go:158] "Evicted pod in dry run mode" pod="clusters-hypershift-ci-20310/catalog-operator-55988d9c7c-xdt7c" reason="PodLifeTime"
I0510 07:15:33.088753       1 evictions.go:158] "Evicted pod in dry run mode" pod="clusters-hypershift-ci-20310/cluster-autoscaler-8554bb9899-h6ctt" reason="PodLifeTime"
I0510 07:15:33.088802       1 evictions.go:158] "Evicted pod in dry run mode" pod="clusters-hypershift-ci-20310/community-operators-catalog-6565cd8b97-zh2p5" reason="PodLifeTime"
I0510 07:15:33.088865       1 evictions.go:158] "Evicted pod in dry run mode" pod="clusters-hypershift-ci-20310/kube-controller-manager-65c6ff566c-k8csx" reason="PodLifeTime"
I0510 07:15:33.088920       1 evictions.go:158] "Evicted pod in dry run mode" pod="clusters-hypershift-ci-20310/certified-operators-catalog-645c845c85-plsg6" reason="PodLifeTime"
I0510 07:15:33.088981       1 evictions.go:158] "Evicted pod in dry run mode" pod="clusters-hypershift-ci-20310/cluster-network-operator-7c7d9586c7-dlvdg" reason="PodLifeTime"
I0510 07:15:33.089035       1 evictions.go:158] "Evicted pod in dry run mode" pod="clusters-hypershift-ci-20310/control-plane-operator-66b74dcbdc-hvwlf" reason="PodLifeTime"
I0510 07:15:33.089083       1 evictions.go:158] "Evicted pod in dry run mode" pod="clusters-hypershift-ci-20310/kube-scheduler-55f669598c-dpdfq" reason="PodLifeTime"
I0510 07:15:33.089157       1 evictions.go:158] "Evicted pod in dry run mode" pod="clusters-hypershift-ci-20310/openshift-oauth-apiserver-6d679995f-bdddm" reason="PodLifeTime"
I0510 07:15:33.090565       1 evictions.go:158] "Evicted pod in dry run mode" pod="clusters-hypershift-ci-20310/ovnkube-master-0" reason="PodLifeTime"

Comment 15 Jan Chaloupka 2022-05-17 11:20:08 UTC
Hello Rama,

> there will be control plane namespaces named clusters-{guest-cluster-name} that each represents a guest cluster. They all should be protected somehow from being evicted. 

> Currently i tried to enable LifeCycleAndUtilization with podLifeTimeSeconds to be 5m, EvictPodswithPVC, EvictPodswithLocalStorage and i see that all pods in clusters-hypershift-ci-20310 gets evicted which should not happen. Is there a way we can prevent this ?

That's currently the expected behaviour. All such namespace names are expected to be prefixed with `cluster-` string. Unfortunately, in non-hypershift clusters it is acceptable to have a namespace name prefixed with `cluster-` string. Thus in order to avoid limiting these clusters, we do not hardcode exclusion of `cluster-` prefixed namespaces.

The agreed solution with the hypershift team was to utilize priority class thresholds. All `cluster-` prefixed hypershift namespaces are expected to have hypershift-etcd, hypershift-api-critical or hypershift-control-plane priority class assigned [1]. Whichever of those three priority classes is the lowest is the one to be set through the new thresholdPriorityClassName, resp. thresholdPriority fields.

[1] https://coreos.slack.com/archives/C01C8502FMM/p1646659511356289

Comment 16 RamaKasturi 2022-05-17 12:56:03 UTC
Based on comment15 moving the bug to verified state as i see that pods in hypershift namespace does not get evicted now that it has been added to the exclude namespaces in the descheduler config.

namespaces:
exclude:
  - kube-system
  - hypershift

Also i see that customers who have hypershift installed they should set either hyershift-api-critical or hypershift-control-plane or hypershift-etcd whichever is lower in the thresholdPriorityClassName of descheduler so that pods do not get evicted.

Comment 18 errata-xmlrpc 2022-08-10 10:37:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.