Bug 1866829

Summary: Operator catalog Pods created by CatalogSource aren't evicted to other nodes even after their node is down
Product: OpenShift Container Platform Reporter: Ben Luddy <bluddy>
Component: OLMAssignee: Ben Luddy <bluddy>
OLM sub component: OLM QA Contact: Bruno Andrade <bandrade>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: ecordell
Version: 4.5   
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:25:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1862340, 1866828    

Description Ben Luddy 2020-08-06 13:58:42 UTC
This bug was initially created as a copy of Bug #1866828

Operator catalog Pods created by CatalogSource aren't evicted to other nodes,
even after the node that the pods are running on get down.

This issue was fixed before 4.6 feature freeze, so this BZ exists to satisfy automation required for a 4.5.z backport.

Comment 1 Ben Luddy 2020-08-06 14:19:24 UTC
This should have been addressed by https://github.com/operator-framework/operator-lifecycle-manager/pull/1680. The BZ that is motivating a backport to 4.5 is here: https://bugzilla.redhat.com/show_bug.cgi?id=1862340.

Comment 2 Evan Cordell 2020-08-07 14:45:38 UTC
*** Bug 1867166 has been marked as a duplicate of this bug. ***

Comment 3 Bruno Andrade 2020-08-07 17:12:49 UTC
Marking as VERIFIED, catalog source operator pod was evicted as expected.

Cluster Version: 4.6.0-0.nightly-2020-08-06-062308
OLM version: 0.16.0
git commit: 163608d60f37cc3496736bfc4ec72ca01dc7083a


oc get pods -o wide -n openshift-operator-lifecycle-manager
NAME                                READY   STATUS    RESTARTS   AGE   IP            NODE                                         NOMINATED NODE   READINESS GATES
catalog-operator-7f657cc44f-z645v   1/1     Running   0          53m   10.130.0.14   ip-10-0-150-70.us-east-2.compute.internal    <none>           <none>
olm-operator-7c8ff4698b-mcg7n       1/1     Running   0          53m   10.129.0.32   ip-10-0-212-236.us-east-2.compute.internal   <none>           <none>
packageserver-568cc46cf-98wpp       1/1     Running   0          53m   10.129.0.40   ip-10-0-212-236.us-east-2.compute.internal   <none>           <none>
packageserver-568cc46cf-q9v4z       1/1     Running   0          52m   10.130.0.20   ip-10-0-150-70.us-east-2.compute.internal

oc adm cordon ip-10-0-150-70.us-east-2.compute.internal
node/ip-10-0-150-70.us-east-2.compute.internal cordoned

oc adm drain ip-10-0-150-70.us-east-2.compute.internal 
node/ip-10-0-150-70.us-east-2.compute.internal already cordoned
evicting pod "oauth-openshift-6b74f5fcb7-8t22j"
evicting pod "revision-pruner-7-ip-10-0-150-70.us-east-2.compute.internal"
evicting pod "console-989795df-hg742"
evicting pod "openshift-controller-manager-operator-859fd974d9-v2kgf"
evicting pod "etcd-quorum-guard-6788dcf7d-8pdmf"
evicting pod "revision-pruner-3-ip-10-0-150-70.us-east-2.compute.internal"
evicting pod "kube-apiserver-operator-778756cbd9-wgmvj"
evicting pod "revision-pruner-7-ip-10-0-150-70.us-east-2.compute.internal"
evicting pod "apiserver-766c75f7bb-9pr64"
evicting pod "cluster-autoscaler-operator-768d4f89fb-2dkzc"
evicting pod "openshift-kube-scheduler-operator-dd467875f-vxqqq"
evicting pod "revision-pruner-7-ip-10-0-150-70.us-east-2.compute.internal"
evicting pod "kube-storage-version-migrator-operator-7d9859bb97-ls2bs"
evicting pod "network-operator-57d5f54c59-qjthf"
evicting pod "machine-config-operator-849dbf6bd7-jcdgn"
evicting pod "apiserver-565bd5f986-2p5l9"
evicting pod "catalog-operator-7f657cc44f-z645v"
pod/revision-pruner-7-ip-10-0-150-70.us-east-2.compute.internal evicted
pod/revision-pruner-7-ip-10-0-150-70.us-east-2.compute.internal evicted
pod/revision-pruner-7-ip-10-0-150-70.us-east-2.compute.internal evicted
pod/revision-pruner-3-ip-10-0-150-70.us-east-2.compute.internal evicted
pod/openshift-controller-manager-operator-859fd974d9-v2kgf evicted
pod/network-operator-57d5f54c59-qjthf evicted
pod/kube-apiserver-operator-778756cbd9-wgmvj evicted
pod/machine-config-operator-849dbf6bd7-jcdgn evicted
pod/apiserver-565bd5f986-2p5l9 evicted
pod/apiserver-766c75f7bb-9pr64 evicted
pod/cluster-autoscaler-operator-768d4f89fb-2dkzc evicted
pod/etcd-quorum-guard-6788dcf7d-8pdmf evicted
pod/openshift-kube-scheduler-operator-dd467875f-vxqqq evicted
pod/catalog-operator-7f657cc44f-z645v evicted
pod/kube-storage-version-migrator-operator-7d9859bb97-ls2bs evicted
pod/oauth-openshift-6b74f5fcb7-8t22j evicted
pod/console-989795df-hg742 evicted
node/ip-10-0-150-70.us-east-2.compute.internal evicted

 oc get pods -o wide -n openshift-operator-lifecycle-manager
NAME                                READY   STATUS    RESTARTS   AGE   IP            NODE                                         NOMINATED NODE   READINESS GATES
catalog-operator-7f657cc44f-f72c5   1/1     Running   0          61s   10.128.0.18   ip-10-0-177-115.us-east-2.compute.internal   <none>           <none>
olm-operator-7c8ff4698b-mcg7n       1/1     Running   0          57m   10.129.0.32   ip-10-0-212-236.us-east-2.compute.internal   <none>           <none>
packageserver-568cc46cf-98wpp       1/1     Running   0          56m   10.129.0.40   ip-10-0-212-236.us-east-2.compute.internal   <none>           <none>
packageserver-568cc46cf-q9v4z       1/1     Running   0          56m   10.130.0.20   ip-10-0-150-70.us-east-2.compute.internal    <none>           <none>

Comment 6 errata-xmlrpc 2020-10-27 16:25:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196