+++ This bug was initially created as a clone of Bug #2053343 +++ --- Additional comment from Michael McCune on 2022-02-23 15:00:55 UTC --- i got a chance to investigate deeper into the must-gather and i think that Matt correctly identified the root cause here. from the original text, we can see that we expected node "ip-10-244-54-105.ec2.internal" to be scaled down by the autoscaler. when i look at the pods in the openshift-marketplace namespace i see this: NAME READY STATUS RESTARTS AGE IP NODE redhat-operators-hqstb 1/1 Running 0 1d 10.130.10.18 ip-10-244-54-105.ec2.internal we clearly have a pod running in the node. when looking deeper into the pod manifest, Matt again is spot on, we see this: (i have clipped just the relevant portion) ownerReferences: - apiVersion: operators.coreos.com/v1alpha1 blockOwnerDeletion: false controller: false kind: CatalogSource name: redhat-operators and indeed, looking through the autoscaler drain code it will not be able to remove this entry. --- The machine-config operator has a similar need to drain pods. It vendors the upstream drain library to do so [1], while the autoscaler has its own local code looking for well-known kinds [2]. The upstream drain library has [3]: controllerRef := metav1.GetControllerOf(&pod) if controllerRef != nil { return MakePodDeleteStatusOkay() } GetControllerOf wraps GetControllerOfNoCopy [4]. GetControllerOfNoCopy returns the first ownerReferences entry where 'controller' is true [5]. The controller godocs are: // If true, this reference points to the managing controller. // +optional Controller *bool `json:"controller,omitempty" protobuf:"varint,6,opt,name=controller"` Poking at a 4.10 -> 4.11 CI run [7], and excluding pods I suspect are static (and therefore ignored by the drain library): $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-aws-upgrade/1496494490028871680/artifacts/e2e-aws-upgrade/gather-extra/artifacts/pods.json | jq -r '.items[].metadata | select([(.ownerReferences // [])[] | select(.controller)] | length == 0) | .namespace + " " + .name + " " + (.ownerReferences | tostring)' | grep -v '^\(openshift-etcd\|openshift-kube-apiserver\|openshift-kube-controller-manager\|openshift-kube-scheduler\) ' openshift-marketplace certified-operators-k58w7 [{"apiVersion":"operators.coreos.com/v1alpha1","blockOwnerDeletion":false,"controller":false,"kind":"CatalogSource","name":"certified-operators","uid":"39e79841-8f28-444f-94f6-9e6230b8bd18"}] openshift-marketplace community-operators-s7xps [{"apiVersion":"operators.coreos.com/v1alpha1","blockOwnerDeletion":false,"controller":false,"kind":"CatalogSource","name":"community-operators","uid":"5fcddec5-175d-488a-be07-162447cb54b8"}] openshift-marketplace redhat-marketplace-h8jp6 [{"apiVersion":"operators.coreos.com/v1alpha1","blockOwnerDeletion":false,"controller":false,"kind":"CatalogSource","name":"redhat-marketplace","uid":"dc130a38-a995-4118-9901-a8159fdf9632"}] openshift-marketplace redhat-operators-6chfq [{"apiVersion":"operators.coreos.com/v1alpha1","blockOwnerDeletion":false,"controller":false,"kind":"CatalogSource","name":"redhat-operators","uid":"034d00e2-0f3f-45a2-9267-93ee21d58149"}] Poking at one of those: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-aws-upgrade/1496494490028871680/artifacts/e2e-aws-upgrade/gather-extra/artifacts/pods.json | jq -r '.items[].metadata | select(.name == "certified-operators-k58w7")' { "annotations": { "k8s.v1.cni.cncf.io/network-status": "[{\n \"name\": \"openshift-sdn\",\n \"interface\": \"eth0\",\n \"ips\": [\n \"10.128.0.45\"\n ],\n \"default\": true,\n \"dns\": {}\n}]", "k8s.v1.cni.cncf.io/networks-status": "[{\n \"name\": \"openshift-sdn\",\n \"interface\": \"eth0\",\n \"ips\": [\n \"10.128.0.45\"\n ],\n \"default\": true,\n \"dns\": {}\n}]", "openshift.io/scc": "anyuid", "operatorframework.io/managed-by": "marketplace-operator" }, "creationTimestamp": "2022-02-23T16:07:31Z", "generateName": "certified-operators-", "labels": { "olm.catalogSource": "certified-operators", "olm.pod-spec-hash": "857db9f844" }, "name": "certified-operators-k58w7", "namespace": "openshift-marketplace", "ownerReferences": [ { "apiVersion": "operators.coreos.com/v1alpha1", "blockOwnerDeletion": false, "controller": false, "kind": "CatalogSource", "name": "certified-operators", "uid": "39e79841-8f28-444f-94f6-9e6230b8bd18" } ], "resourceVersion": "65842", "uid": "b1077bcb-08a7-402b-b700-ab2b6367813f" } The 'operatorframework.io/managed-by: marketplace-operator' annotation is suggestive that there is a controller. Dropping into audit logs in search of a creator: $ zgrep -h '"verb":"create".*"resource":"pods".*"namespace":"openshift-marketplace"' kube-apiserver/* | grep -v subresource | jq -r '.stageTimestamp + " " + .user.username' | grep '2022-02-23T16:07:31' 2022-02-23T16:07:31.851000Z system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount 2022-02-23T16:07:31.896892Z system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount Even more suggestive that an OLM-y operator is creating these pods. Checking events to see how they worked with drain: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-aws-upgrade/1496494490028871680/artifacts/e2e-aws-upgrade/gather-extra/artifacts/events.json | jq -r '[.items[] | select((.involvedObject.name // "" | startswith("certified-operators-")) and (.reason | . == "Scheduled" or . == "Killing"))] | sort_by(.metadata.creationTimestamp)[] | .metadata.creationTimestamp + " " + .involvedObject.name + " " + .reason + ": " + .message' | grep -1 16:07: 2022-02-23T16:00:31Z certified-operators-bnxrw Killing: Stopping container registry-server 2022-02-23T16:07:15Z certified-operators-zbb6r Killing: Stopping container registry-server 2022-02-23T16:07:31Z certified-operators-k58w7 Scheduled: Successfully assigned openshift-marketplace/certified-operators-k58w7 to ip-10-0-184-130.us-west-1.compute.internal 2022-02-23T16:17:03Z certified-operators-wbkc4 Scheduled: Successfully assigned openshift-marketplace/certified-operators-wbkc4 to ip-10-0-184-130.us-west-1.compute.internal Which suggests certified-operators-zbb6r was the previous pod that certified-operators-k58w7 replaced. Back to audit logs, I don't see any deletes or evictions: $ zgrep -h 'certified-operators-k58w7' kube-apiserver/* | jq -r .verb | sort | uniq -c 2 create 33 get 3 patch 1 update Dropping into Loki, machine-config-daemon-zk9tj logs have: E0223 16:07:08.199572 195651 daemon.go:340] WARNING: ignoring DaemonSet-managed Pods: ..., openshift-marketplace/certified-operators-zbb6r, openshift-marketplace/community-operators-qpvff, openshift-marketplace/redhat-marketplace-dxpbn, openshift-marketplace/redhat-operators-mhlf5 ... I0223 16:07:08.201839 195651 daemon.go:340] evicting pod openshift-marketplace/certified-operators-zbb6r ... I0223 16:07:19.831014 195651 daemon.go:325] Evicted pod openshift-marketplace/certified-operators-zbb6r That's... not entirely clear to me. Certainly doesn't look like a DaemonSet pod to me. But whatever, seems like MCO is able to drain this pod without the 'controller: true' setting. So no pressing functional need, but I'm creating this bug because I think the semantics of 'controller' mean we want whichever OLM component that creates these pods to be setting 'controller: true'. That makes the relationship nice and clear to any tooling that is jumpy about removing uncontrolled pods, even if the MCO for some reason is still draining in this case. [1]: https://github.com/openshift/machine-config-operator/blob/b7f7bb950e1d1ee66c90ed6761a162d402b74664/vendor/k8s.io/kubectl/pkg/drain/filters.go#L237-L239 [2]: https://github.com/kubernetes/autoscaler/blob/a255161a38b64be2f12e012cc1b76aec36c88d50/cluster-autoscaler/utils/drain/drain.go#L124 [3]: https://github.com/kubernetes/kubernetes/blame/77eb1a03df5ed29db8b093bc1b89778388593c2f/staging/src/k8s.io/kubectl/pkg/drain/filters.go#L237 [4]: https://github.com/kubernetes/kubernetes/blame/77eb1a03df5ed29db8b093bc1b89778388593c2f/staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/controller_ref.go#L33-L34 [5]: https://github.com/kubernetes/kubernetes/blame/77eb1a03df5ed29db8b093bc1b89778388593c2f/staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/controller_ref.go#L42-L47 [6]: https://github.com/kubernetes/kubernetes/blame/77eb1a03df5ed29db8b093bc1b89778388593c2f/staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/types.go#L307-L309 [7]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-aws-upgrade/1496494490028871680
(In reply to W. Trevor King from comment #0) > Dropping into Loki, machine-config-daemon-zk9tj logs have: > > E0223 16:07:08.199572 195651 daemon.go:340] WARNING: ignoring > DaemonSet-managed Pods: ..., > openshift-marketplace/certified-operators-zbb6r, > openshift-marketplace/community-operators-qpvff, > openshift-marketplace/redhat-marketplace-dxpbn, > openshift-marketplace/redhat-operators-mhlf5 > ... > I0223 16:07:08.201839 195651 daemon.go:340] evicting pod > openshift-marketplace/certified-operators-zbb6r > ... > I0223 16:07:19.831014 195651 daemon.go:325] Evicted pod > openshift-marketplace/certified-operators-zbb6r > > That's... not entirely clear to me. Certainly doesn't look like a DaemonSet > pod to me. But whatever, seems like MCO is able to drain this pod without > the 'controller: true' setting. Aha, this is because the MCO is forcing the drain [1]. So when we fix this bug and declare 'controller: true' on an ownerReferences entry, folks will no longer to force when using the upstream drain library to drain these openshift-marketplace pods. [1]: https://github.com/openshift/machine-config-operator/blob/b7f7bb950e1d1ee66c90ed6761a162d402b74664/pkg/daemon/daemon.go#L315
(In reply to W. Trevor King from comment #0) > E0223 16:07:08.199572 195651 daemon.go:340] WARNING: ignoring > DaemonSet-managed Pods: ..., > openshift-marketplace/certified-operators-zbb6r, > ... Better ellipsis for this log line: E0223 16:07:08.199572 195651 daemon.go:340] WARNING: ignoring DaemonSet-managed Pods: ...; deleting Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet: openshift-kube-apiserver/kube-apiserver-guard-ip-10-0-151-30.us-west-1.compute.internal, openshift-kube-controller-manager/kube-controller-manager-guard-ip-10-0-151-30.us-west-1.compute.internal, openshift-kube-scheduler/openshift-kube-scheduler-guard-ip-10-0-151-30.us-west-1.compute.internal, openshift-marketplace/certified-operators-zbb6r, openshift-marketplace/community-operators-qpvff, openshift-marketplace/redhat-marketplace-dxpbn, openshift-marketplace/redhat-operators-mhlf5 I've filed [1] to clean up the messaging a bit. And looks like I need to follow up with whoever creates those guard-ip pods too... [1]: https://github.com/kubernetes/kubernetes/pull/108314
I've spun off bug 2057756 to track those guard pods.
The issue is that the ownerref on the marketplace pods is the catalogsource that is fronting the pod (so that when a user deletes the catalogsource, the pod itself gets removed) instead of the underlying controller. This has been the behavior since the API has been introduced, so users are expecting the deletion to behave that way. It may make sense to update the ownerref to refer to the marketplace operator explicitly, which makes sense for draining purposes, but then the catalogsource won’t own it’s own pod anymore, and users who are used to the normal behavior (delete catalogsource deletes the underlying pod) may be impacted. If we could have multiple ownerrefs on the pod (one for the catalog and one for the controller) that might resolve the issue. Since the metadata.ownerReferences field is plural this might be ok. It would be a backwards-compatible change.
(In reply to Daniel Sover from comment #4) > If we could have multiple ownerrefs on the pod... That sounds plausible to me. But looking for precedent in a Kube-core controller (ReplicaSet -> Pod): $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-aws-upgrade/1496494490028871680/artifacts/e2e-aws-upgrade/gather-extra/artifacts/pods.json | jq -r '.items[].metadata | select(.name | startswith("cluster-version-operator-")).ownerReferences[]' { "apiVersion": "apps/v1", "blockOwnerDeletion": true, "controller": true, "kind": "ReplicaSet", "name": "cluster-version-operator-66bcf76d6b", "uid": "49a02058-a7c8-4edd-8b61-cfe6aba21d85" } So at least the ReplicaSet controller (which consumes the ReplicaSet and spits out Pods) doesn't actually point at itself, and claims the ReplicaSet itself as the controller. And I see no pods today that declare multiple owners: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-aws-upgrade/1496494490028871680/artifacts/e2e-aws-upgrade/gather-extra/artifacts/pods.json | jq -r '.items[].metadata | select(.ownerReferences | length > 1) | .namespace + " " + .name' ...no hits... But yeah, I'd personally have expected things like: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-aws-upgrade/1496494490028871680/artifacts/e2e-aws-upgrade/gather-extra/artifacts/pods.json | jq -r '.items[].metadata | select(.name | startswith("cluster-version-operator-")).ownerReferences[]' { "apiVersion": "apps/v1", "blockOwnerDeletion": true, "controller": false, "kind": "ReplicaSet", "name": "cluster-version-operator-66bcf76d6b", "uid": "49a02058-a7c8-4edd-8b61-cfe6aba21d85" } { "apiVersion": "apps/v1", "blockOwnerDeletion": true, "controller": true, "kind": "Deployment", "namespace": "openshift-kube-controller-manager", "name": "openshift-controller-manager-operator", "uid": "..." } For "created by Deployment for openshift-controller-manager-operator for ReplicaSet cluster-version-operator-66bcf76d6b". I'm not actually sure that's where the ReplicaSet controller lives, but whatever, that's the idea. However, that's not going to work because [1]: > Cross-namespace owner references are disallowed by design. Namespaced dependents can specify cluster-scoped or namespaced owners. A namespaced owner must exist in the same namespace as the dependent. If it does not, the owner reference is treated as absent, and the dependent is subject to deletion once all owners are verified absent. > > Cluster-scoped dependents can only specify cluster-scoped owners. So in situations where we can't actually point at the controller, but where we can point at the triggering resource, then claiming that the triggering resource is a controller seems like a reasonable hack. The alternative would be to find some other resource upstream of the controller (like a ClusterOperator or some such), and glossing over the fact that the actual controller is a subsystem beneath that more-central resource seems like a reasonable hack too. [1]: https://kubernetes.io/docs/concepts/architecture/garbage-collection/#owners-dependents
OLM team has moved to jira. Closing this as won't fix since it has not bubbled up to see action since 05Apr2022. Please open an OCPBUGS jira, component OLM, if this is still an issue.
I've ported this to Jira as https://issues.redhat.com/browse/OCPBUGS-7431