Bug 2034984

Summary: openshift-controller-manager should use a Deployment
Product: OpenShift Container Platform Reporter: Adam Kaplan <adam.kaplan>
Component: openshift-controller-managerAssignee: Otávio Fernandes <olemefer>
openshift-controller-manager sub component: controller-manager QA Contact: wewang <wewang>
Status: CLOSED NOTABUG Docs Contact:
Severity: low    
Priority: medium CC: cdaley, fkrepins
Version: 4.10   
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2042587 (view as bug list) Environment:
Last Closed: 2022-05-18 20:19:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2042587    

Description Adam Kaplan 2021-12-22 16:32:41 UTC
Description of problem:

openshift-controller-manager is currently installed using a DaemonSet to ensure that there is an instance on every control plane node.
This has negative performance impact on installation and upgrade due to the semantics of how DaemonSets are managed.
A Deployment with sufficient anti-affinity rules should be used instead to improve the performance of installation and upgrade.
The operator should also ensure that the correct number of replicas are deployed for single-node topologies.


Version-Release number of selected component (if applicable): 4.10


Additional info:

Before this change merges, a backport to 4.9 must be opened that removes the Deployment and restores the DameonSet.
This is necessary to ensure that we don't break the OpenShift control plane when downgrading a cluster.

Comment 1 Adam Kaplan 2021-12-22 18:45:27 UTC
This issue is related to Bug #2026488. Because OCM is deployed with a DaemonSet, the rollout results in a significant number of SuccessfulDelete events that get aggregated together. Deployment's semantics are better in this regard, as each rollout is managed by a ReplicaSet.

Comment 7 Filip Krepinsky 2022-09-02 21:03:18 UTC
These events are caused by the tests and not a DS vs Deployment

The problem is that the builds tests change the OpenShiftControllerManagerConfig often and this triggers change in the final config ConfigMap which in turn force applies the DS and triggers a new rollout. Tests responsible for this: https://github.com/openshift/origin/blob/master/test/extended/builds/cluster_config.go .

We are using/sharing the same ConfigMap in route-controller-manager, so this is triggered by the build config changes as well even though these changes are not used by route-controller-manager


events we are getting route-controller-manager

event happened 31 times, something is wrong: ns/openshift-route-controller-manager deployment/route-controller-manager - reason/ScalingReplicaSet (combined from similar events): Scaled down replica set route-controller-manager-6c979bfbd to 2
event happened 43 times, something is wrong: ns/openshift-route-controller-manager deployment/route-controller-manager - reason/ScalingReplicaSet (combined from similar events): Scaled down replica set route-controller-manager-588c4b7974 to 2
event happened 49 times, something is wrong: ns/openshift-route-controller-manager deployment/route-controller-manager - reason/ScalingReplicaSet (combined from similar events): Scaled down replica set route-controller-manager-57f867cc5c to 2
event happened 25 times, something is wrong: ns/openshift-route-controller-manager deployment/route-controller-manager - reason/ScalingReplicaSet (combined from similar events): Scaled down replica set route-controller-manager-55d87f87dd to 2}