Bug 2034984 - openshift-controller-manager should use a Deployment
Summary: openshift-controller-manager should use a Deployment
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: openshift-controller-manager
Version: 4.10
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: ---
: 4.11.0
Assignee: Otávio Fernandes
QA Contact: wewang
URL:
Whiteboard:
Depends On:
Blocks: 2042587
TreeView+ depends on / blocked
 
Reported: 2021-12-22 16:32 UTC by Adam Kaplan
Modified: 2022-09-02 21:03 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2042587 (view as bug list)
Environment:
Last Closed: 2022-05-18 20:19:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 2026488 1 unspecified CLOSED openshift-controller-manager - delete event is repeating pathologically 2022-03-10 16:30:48 UTC

Description Adam Kaplan 2021-12-22 16:32:41 UTC
Description of problem:

openshift-controller-manager is currently installed using a DaemonSet to ensure that there is an instance on every control plane node.
This has negative performance impact on installation and upgrade due to the semantics of how DaemonSets are managed.
A Deployment with sufficient anti-affinity rules should be used instead to improve the performance of installation and upgrade.
The operator should also ensure that the correct number of replicas are deployed for single-node topologies.


Version-Release number of selected component (if applicable): 4.10


Additional info:

Before this change merges, a backport to 4.9 must be opened that removes the Deployment and restores the DameonSet.
This is necessary to ensure that we don't break the OpenShift control plane when downgrading a cluster.

Comment 1 Adam Kaplan 2021-12-22 18:45:27 UTC
This issue is related to Bug #2026488. Because OCM is deployed with a DaemonSet, the rollout results in a significant number of SuccessfulDelete events that get aggregated together. Deployment's semantics are better in this regard, as each rollout is managed by a ReplicaSet.

Comment 7 Filip Krepinsky 2022-09-02 21:03:18 UTC
These events are caused by the tests and not a DS vs Deployment

The problem is that the builds tests change the OpenShiftControllerManagerConfig often and this triggers change in the final config ConfigMap which in turn force applies the DS and triggers a new rollout. Tests responsible for this: https://github.com/openshift/origin/blob/master/test/extended/builds/cluster_config.go .

We are using/sharing the same ConfigMap in route-controller-manager, so this is triggered by the build config changes as well even though these changes are not used by route-controller-manager


events we are getting route-controller-manager

event happened 31 times, something is wrong: ns/openshift-route-controller-manager deployment/route-controller-manager - reason/ScalingReplicaSet (combined from similar events): Scaled down replica set route-controller-manager-6c979bfbd to 2
event happened 43 times, something is wrong: ns/openshift-route-controller-manager deployment/route-controller-manager - reason/ScalingReplicaSet (combined from similar events): Scaled down replica set route-controller-manager-588c4b7974 to 2
event happened 49 times, something is wrong: ns/openshift-route-controller-manager deployment/route-controller-manager - reason/ScalingReplicaSet (combined from similar events): Scaled down replica set route-controller-manager-57f867cc5c to 2
event happened 25 times, something is wrong: ns/openshift-route-controller-manager deployment/route-controller-manager - reason/ScalingReplicaSet (combined from similar events): Scaled down replica set route-controller-manager-55d87f87dd to 2}


Note You need to log in before you can comment on or make changes to this bug.