Bug 1866828

Summary: Operator catalog Pods created by CatalogSource aren't evicted to other nodes even after their node is down
Product: OpenShift Container Platform Reporter: Ben Luddy <bluddy>
Component: OLMAssignee: Ben Luddy <bluddy>
OLM sub component: OLM QA Contact: Bruno Andrade <bandrade>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: bandrade, ecordell
Version: 4.5   
Target Milestone: ---   
Target Release: 4.5.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1862340 Environment:
Last Closed: 2020-09-08 10:54:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1866829    
Bug Blocks:    

Description Ben Luddy 2020-08-06 13:53:53 UTC
This bug was initially created as a copy of Bug #1862340, which is private:

Operator catalog Pods created by CatalogSource aren't evicted to other nodes,
even after the node that the pods are running on get down.

Comment 4 Bruno Andrade 2020-08-21 14:36:30 UTC
Marking as VERIFIED, catalog source operator pod was evicted as expected.

Cluster Version: 4.6.0-0.nightly-2020-08-18-165040
OLM version: 0.16.0
git commit: 1fdd347ab723bf6aec30c79dfb217bcbf21a13e9

 oc get pods -n openshift-operator-lifecycle-manager -o wide
NAME                                READY   STATUS    RESTARTS   AGE   IP             NODE                                         NOMINATED NODE   READINESS GATES
catalog-operator-6944b55486-q8gkq   1/1     Running   0          46h   10.130.0.29    ip-10-0-192-165.us-east-2.compute.internal   <none>           <none>
olm-operator-587669c4dc-jmqfl       1/1     Running   0          46h   10.130.0.27    ip-10-0-192-165.us-east-2.compute.internal   <none>           <none>
packageserver-59c9f4dd68-22wds      1/1     Running   0          11h   10.129.3.187   ip-10-0-155-133.us-east-2.compute.internal   <none>           <none>
packageserver-59c9f4dd68-ml5n5      1/1     Running   0          11h   10.130.0.55    ip-10-0-192-165.us-east-2.compute.internal   <none>           <none>

oc adm cordon ip-10-0-192-165.us-east-2.compute.internal   
node/ip-10-0-192-165.us-east-2.compute.internal cordoned

oc adm drain ip-10-0-192-165.us-east-2.compute.internal --delete-local-data --ignore-daemonsets                                                          130 ↵
node/ip-10-0-192-165.us-east-2.compute.internal already cordoned
WARNING: ignoring DaemonSet-managed Pods: openshift-cluster-csi-drivers/aws-ebs-csi-driver-node-fpwf8, openshift-cluster-node-tuning-operator/tuned-lhsl5, openshift-controller-manager/controller-manager-2pmvr, openshift-dns/dns-default-t7nlh, openshift-image-registry/node-ca-bdx72, openshift-machine-config-operator/machine-config-daemon-r75bn, openshift-machine-config-operator/machine-config-server-4t22s, openshift-monitoring/node-exporter-nrfgq, openshift-multus/multus-admission-controller-4sjbf, openshift-multus/multus-mnlfk, openshift-multus/network-metrics-daemon-l968d, openshift-sdn/ovs-xhtf6, openshift-sdn/sdn-controller-p5t9m, openshift-sdn/sdn-v88bk
evicting pod openshift-cloud-credential-operator/cloud-credential-operator-78bbb5fddf-m9pn7
evicting pod openshift-cloud-credential-operator/pod-identity-webhook-596ff668d-qqdrc
evicting pod openshift-apiserver/apiserver-6846795c64-75xdc
evicting pod openshift-kube-apiserver/revision-pruner-14-ip-10-0-192-165.us-east-2.compute.internal
evicting pod openshift-cluster-samples-operator/cluster-samples-operator-7d4f777bd7-vx49z
evicting pod openshift-console-operator/console-operator-b77b78d55-scrss
evicting pod openshift-kube-apiserver/installer-12-ip-10-0-192-165.us-east-2.compute.internal
evicting pod openshift-authentication/oauth-openshift-68896fb778-tzhv5
evicting pod openshift-kube-apiserver-operator/kube-apiserver-operator-bd8cb9589-5lvsk
evicting pod openshift-machine-config-operator/machine-config-controller-67f577b45f-kghrc
evicting pod openshift-etcd/revision-pruner-3-ip-10-0-192-165.us-east-2.compute.internal
evicting pod openshift-cluster-node-tuning-operator/cluster-node-tuning-operator-5b9d49dcf8-dwnrr
evicting pod openshift-kube-apiserver/installer-13-ip-10-0-192-165.us-east-2.compute.internal
evicting pod openshift-controller-manager-operator/openshift-controller-manager-operator-9c66885fc-bf9zx
evicting pod openshift-kube-scheduler/revision-pruner-7-ip-10-0-192-165.us-east-2.compute.internal
evicting pod openshift-kube-apiserver/revision-pruner-15-ip-10-0-192-165.us-east-2.compute.internal
evicting pod openshift-etcd-operator/etcd-operator-7c9854d6d4-w8gwd
evicting pod openshift-image-registry/cluster-image-registry-operator-89755bc47-mnzx6
evicting pod openshift-kube-apiserver/installer-14-ip-10-0-192-165.us-east-2.compute.internal
evicting pod openshift-kube-apiserver/revision-pruner-16-ip-10-0-192-165.us-east-2.compute.internal
evicting pod openshift-kube-apiserver/installer-15-ip-10-0-192-165.us-east-2.compute.internal
evicting pod openshift-network-operator/network-operator-578ffc867f-vbv9n
evicting pod openshift-etcd/etcd-quorum-guard-665574fdb6-7rk7n
evicting pod openshift-marketplace/marketplace-operator-746f875dbd-hjrqj
evicting pod openshift-insights/insights-operator-6ccf4845b-r2sr7
evicting pod openshift-machine-api/cluster-autoscaler-operator-7f98c76ff9-4blvh
evicting pod openshift-machine-api/machine-api-operator-5b4685cb69-kqr7g
evicting pod openshift-console/console-7668cbf5d8-vgct9
evicting pod openshift-kube-controller-manager/revision-pruner-9-ip-10-0-192-165.us-east-2.compute.internal
evicting pod openshift-monitoring/cluster-monitoring-operator-848448b8f4-nt8l5
evicting pod openshift-kube-storage-version-migrator-operator/kube-storage-version-migrator-operator-5889fcfbdd-lxr89
evicting pod openshift-monitoring/prometheus-operator-b99dbf4bb-55nbn
evicting pod openshift-kube-apiserver/revision-pruner-12-ip-10-0-192-165.us-east-2.compute.internal
evicting pod openshift-kube-apiserver/revision-pruner-13-ip-10-0-192-165.us-east-2.compute.internal
evicting pod openshift-kube-apiserver/installer-16-ip-10-0-192-165.us-east-2.compute.internal
evicting pod openshift-oauth-apiserver/apiserver-558756467-bs5tp
evicting pod openshift-operator-lifecycle-manager/packageserver-59c9f4dd68-ml5n5
evicting pod openshift-operator-lifecycle-manager/olm-operator-587669c4dc-jmqfl
evicting pod openshift-operator-lifecycle-manager/catalog-operator-6944b55486-q8gkq
evicting pod openshift-ingress-operator/ingress-operator-77d9dc9d84-97qd8
pod/revision-pruner-12-ip-10-0-192-165.us-east-2.compute.internal evicted
pod/revision-pruner-14-ip-10-0-192-165.us-east-2.compute.internal evicted
I0821 11:27:53.341277  154405 request.go:645] Throttling request took 1.000855838s, request: POST:https://api.bandrade2231321.qe.devcluster.openshift.com:6443/api/v1/namespaces/openshift-kube-scheduler/pods/revision-pruner-7-ip-10-0-192-165.us-east-2.compute.internal/eviction
pod/revision-pruner-3-ip-10-0-192-165.us-east-2.compute.internal evicted
pod/installer-13-ip-10-0-192-165.us-east-2.compute.internal evicted
pod/revision-pruner-7-ip-10-0-192-165.us-east-2.compute.internal evicted
pod/installer-12-ip-10-0-192-165.us-east-2.compute.internal evicted
pod/installer-14-ip-10-0-192-165.us-east-2.compute.internal evicted
pod/revision-pruner-15-ip-10-0-192-165.us-east-2.compute.internal evicted
pod/revision-pruner-16-ip-10-0-192-165.us-east-2.compute.internal evicted
pod/revision-pruner-9-ip-10-0-192-165.us-east-2.compute.internal evicted
pod/revision-pruner-13-ip-10-0-192-165.us-east-2.compute.internal evicted
pod/installer-15-ip-10-0-192-165.us-east-2.compute.internal evicted
pod/installer-16-ip-10-0-192-165.us-east-2.compute.internal evicted
I0821 11:28:03.526183  154405 request.go:645] Throttling request took 5.380261238s, request: GET:https://api.bandrade2231321.qe.devcluster.openshift.com:6443/api/v1/namespaces/openshift-kube-storage-version-migrator-operator/pods/kube-storage-version-migrator-operator-5889fcfbdd-lxr89
pod/cloud-credential-operator-78bbb5fddf-m9pn7 evicted
pod/cluster-node-tuning-operator-5b9d49dcf8-dwnrr evicted
pod/apiserver-6846795c64-75xdc evicted
pod/marketplace-operator-746f875dbd-hjrqj evicted
pod/cluster-image-registry-operator-89755bc47-mnzx6 evicted
pod/apiserver-558756467-bs5tp evicted
pod/machine-api-operator-5b4685cb69-kqr7g evicted
pod/insights-operator-6ccf4845b-r2sr7 evicted
pod/openshift-controller-manager-operator-9c66885fc-bf9zx evicted
pod/machine-config-controller-67f577b45f-kghrc evicted
pod/packageserver-59c9f4dd68-ml5n5 evicted
pod/cluster-samples-operator-7d4f777bd7-vx49z evicted
pod/etcd-quorum-guard-665574fdb6-7rk7n evicted
pod/kube-apiserver-operator-bd8cb9589-5lvsk evicted
pod/cluster-monitoring-operator-848448b8f4-nt8l5 evicted
pod/etcd-operator-7c9854d6d4-w8gwd evicted
pod/kube-storage-version-migrator-operator-5889fcfbdd-lxr89 evicted
pod/ingress-operator-77d9dc9d84-97qd8 evicted
pod/cluster-autoscaler-operator-7f98c76ff9-4blvh evicted
pod/network-operator-578ffc867f-vbv9n evicted
pod/prometheus-operator-b99dbf4bb-55nbn evicted
pod/catalog-operator-6944b55486-q8gkq evicted
pod/olm-operator-587669c4dc-jmqfl evicted
pod/console-operator-b77b78d55-scrss evicted
pod/oauth-openshift-68896fb778-tzhv5 evicted
pod/pod-identity-webhook-596ff668d-qqdrc evicted
pod/console-7668cbf5d8-vgct9 evicted
node/ip-10-0-192-165.us-east-2.compute.internal evicted

oc get pods -n openshift-operator-lifecycle-manager -o wide                                    
NAME                                READY   STATUS    RESTARTS   AGE   IP             NODE                                         NOMINATED NODE   READINESS GATES
catalog-operator-6944b55486-jt7kj   1/1     Running   0          54s   10.129.0.44    ip-10-0-156-115.us-east-2.compute.internal   <none>           <none>
olm-operator-587669c4dc-4f6xk       1/1     Running   0          57s   10.129.0.43    ip-10-0-156-115.us-east-2.compute.internal   <none>           <none>
packageserver-59c9f4dd68-22wds      1/1     Running   0          11h   10.129.3.187   ip-10-0-155-133.us-east-2.compute.internal   <none>           <none>
packageserver-59c9f4dd68-kcn9v      1/1     Running   0          57s   10.128.0.54    ip-10-0-163-251.us-east-2.compute.internal   <none>           <none>

Comment 6 errata-xmlrpc 2020-09-08 10:54:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.8 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3510