If you update the OCM pods, they don't release their lease on shutdown. The kube election library has been updated to make this possible (ReleaseOnCancel, see k8s.io/client-go/examples/leader-election/main.go) but it requires changes to your controllers to shutdown gracefully before the lock is released. By releasing the lease you minimize the time no controller is running. If possible to fix this easily (to ensure the client is shutdown) we should implement it because it reduces the duration in a failure before we recover. If it is complex or requires rewiring the controller our current logic is fine.
@gabe if it's ok to verfiy the bug with follow steps, it cost about 51 seconds to new pod running Steps: [wewang@wangwen work]$ oc get configmap openshift-master-controllers -oyaml -n openshift-controller-manager apiVersion: v1 kind: ConfigMap metadata: annotations: control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"controller-manager-srqzr","leaseDurationSeconds":60,"acquireTime":"2020-05-26T07:58:09Z","renewTime":"2020-05-26T08:03:09Z","leaderTransitions":3}' creationTimestamp: "2020-05-26T06:37:10Z" name: openshift-master-controllers namespace: openshift-controller-manager resourceVersion: "56364" selfLink: /api/v1/namespaces/openshift-controller-manager/configmaps/openshift-master-controllers uid: 34654c77-4b7d-4004-a61c-84bc584d0024 [wewang@wangwen work]$ date ; oc delete pod controller-manager-srqzr -n openshift-controller-manager ; date; oc get pods -n openshift-controller-manager Tue May 26 16:08:50 CST 2020 pod "controller-manager-srqzr" deleted Tue May 26 16:09:41 CST 2020 NAME READY STATUS RESTARTS AGE controller-manager-k5ckk 1/1 Running 0 91m controller-manager-lxsj8 1/1 Running 0 91m controller-manager-xgt6h 1/1 Running 0 7s
Perfect @Wen ... looks good marking verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409