1866829 – Operator catalog Pods created by CatalogSource aren't evicted to other nodes even after their node is down

Bug 1866829 - Operator catalog Pods created by CatalogSource aren't evicted to other nodes even after their node is down

Summary: Operator catalog Pods created by CatalogSource aren't evicted to other nodes ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	OLM
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.6.0
Assignee:	Ben Luddy
QA Contact:	Bruno Andrade
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1867166 (view as bug list)
Depends On:
Blocks:	1862340 1866828
TreeView+	depends on / blocked

Reported:	2020-08-06 13:58 UTC by Ben Luddy
Modified:	2020-10-27 16:25 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-10-27 16:25:25 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2020:4196	0	None	None	None	2020-10-27 16:25:42 UTC

Description Ben Luddy 2020-08-06 13:58:42 UTC

This bug was initially created as a copy of Bug #1866828

Operator catalog Pods created by CatalogSource aren't evicted to other nodes,
even after the node that the pods are running on get down.

This issue was fixed before 4.6 feature freeze, so this BZ exists to satisfy automation required for a 4.5.z backport.

Comment 1 Ben Luddy 2020-08-06 14:19:24 UTC

This should have been addressed by https://github.com/operator-framework/operator-lifecycle-manager/pull/1680. The BZ that is motivating a backport to 4.5 is here: https://bugzilla.redhat.com/show_bug.cgi?id=1862340.

Comment 2 Evan Cordell 2020-08-07 14:45:38 UTC

*** Bug 1867166 has been marked as a duplicate of this bug. ***

Comment 3 Bruno Andrade 2020-08-07 17:12:49 UTC

Marking as VERIFIED, catalog source operator pod was evicted as expected.

Cluster Version: 4.6.0-0.nightly-2020-08-06-062308
OLM version: 0.16.0
git commit: 163608d60f37cc3496736bfc4ec72ca01dc7083a


oc get pods -o wide -n openshift-operator-lifecycle-manager
NAME                                READY   STATUS    RESTARTS   AGE   IP            NODE                                         NOMINATED NODE   READINESS GATES
catalog-operator-7f657cc44f-z645v   1/1     Running   0          53m   10.130.0.14   ip-10-0-150-70.us-east-2.compute.internal    <none>           <none>
olm-operator-7c8ff4698b-mcg7n       1/1     Running   0          53m   10.129.0.32   ip-10-0-212-236.us-east-2.compute.internal   <none>           <none>
packageserver-568cc46cf-98wpp       1/1     Running   0          53m   10.129.0.40   ip-10-0-212-236.us-east-2.compute.internal   <none>           <none>
packageserver-568cc46cf-q9v4z       1/1     Running   0          52m   10.130.0.20   ip-10-0-150-70.us-east-2.compute.internal

oc adm cordon ip-10-0-150-70.us-east-2.compute.internal
node/ip-10-0-150-70.us-east-2.compute.internal cordoned

oc adm drain ip-10-0-150-70.us-east-2.compute.internal 
node/ip-10-0-150-70.us-east-2.compute.internal already cordoned
evicting pod "oauth-openshift-6b74f5fcb7-8t22j"
evicting pod "revision-pruner-7-ip-10-0-150-70.us-east-2.compute.internal"
evicting pod "console-989795df-hg742"
evicting pod "openshift-controller-manager-operator-859fd974d9-v2kgf"
evicting pod "etcd-quorum-guard-6788dcf7d-8pdmf"
evicting pod "revision-pruner-3-ip-10-0-150-70.us-east-2.compute.internal"
evicting pod "kube-apiserver-operator-778756cbd9-wgmvj"
evicting pod "revision-pruner-7-ip-10-0-150-70.us-east-2.compute.internal"
evicting pod "apiserver-766c75f7bb-9pr64"
evicting pod "cluster-autoscaler-operator-768d4f89fb-2dkzc"
evicting pod "openshift-kube-scheduler-operator-dd467875f-vxqqq"
evicting pod "revision-pruner-7-ip-10-0-150-70.us-east-2.compute.internal"
evicting pod "kube-storage-version-migrator-operator-7d9859bb97-ls2bs"
evicting pod "network-operator-57d5f54c59-qjthf"
evicting pod "machine-config-operator-849dbf6bd7-jcdgn"
evicting pod "apiserver-565bd5f986-2p5l9"
evicting pod "catalog-operator-7f657cc44f-z645v"
pod/revision-pruner-7-ip-10-0-150-70.us-east-2.compute.internal evicted
pod/revision-pruner-7-ip-10-0-150-70.us-east-2.compute.internal evicted
pod/revision-pruner-7-ip-10-0-150-70.us-east-2.compute.internal evicted
pod/revision-pruner-3-ip-10-0-150-70.us-east-2.compute.internal evicted
pod/openshift-controller-manager-operator-859fd974d9-v2kgf evicted
pod/network-operator-57d5f54c59-qjthf evicted
pod/kube-apiserver-operator-778756cbd9-wgmvj evicted
pod/machine-config-operator-849dbf6bd7-jcdgn evicted
pod/apiserver-565bd5f986-2p5l9 evicted
pod/apiserver-766c75f7bb-9pr64 evicted
pod/cluster-autoscaler-operator-768d4f89fb-2dkzc evicted
pod/etcd-quorum-guard-6788dcf7d-8pdmf evicted
pod/openshift-kube-scheduler-operator-dd467875f-vxqqq evicted
pod/catalog-operator-7f657cc44f-z645v evicted
pod/kube-storage-version-migrator-operator-7d9859bb97-ls2bs evicted
pod/oauth-openshift-6b74f5fcb7-8t22j evicted
pod/console-989795df-hg742 evicted
node/ip-10-0-150-70.us-east-2.compute.internal evicted

 oc get pods -o wide -n openshift-operator-lifecycle-manager
NAME                                READY   STATUS    RESTARTS   AGE   IP            NODE                                         NOMINATED NODE   READINESS GATES
catalog-operator-7f657cc44f-f72c5   1/1     Running   0          61s   10.128.0.18   ip-10-0-177-115.us-east-2.compute.internal   <none>           <none>
olm-operator-7c8ff4698b-mcg7n       1/1     Running   0          57m   10.129.0.32   ip-10-0-212-236.us-east-2.compute.internal   <none>           <none>
packageserver-568cc46cf-98wpp       1/1     Running   0          56m   10.129.0.40   ip-10-0-212-236.us-east-2.compute.internal   <none>           <none>
packageserver-568cc46cf-q9v4z       1/1     Running   0          56m   10.130.0.20   ip-10-0-150-70.us-east-2.compute.internal    <none>           <none>

Comment 6 errata-xmlrpc 2020-10-27 16:25:25 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Note You need to log in before you can comment on or make changes to this bug.