Bug 2057740

Summary:	openshift-marketplace pods with no 'controller: true' ownerReferences
Product:	OpenShift Container Platform	Reporter:	W. Trevor King <wking>
Component:	OLM	Assignee:	Per da Silva <pegoncal>
OLM sub component:	OLM	QA Contact:	Bruno Andrade <bandrade>
Status:	CLOSED WONTFIX	Docs Contact:
Severity:	low
Priority:	low	CC:	aos-bugs, jiazha, jkeister, kramraja, mbargenq, mimccune, oarribas, pegoncal, pmagotra, wking
Version:	4.11
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	2053343
Clones:	2057756 (view as bug list)		Environment:
Last Closed:	2023-02-13 21:57:16 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description W. Trevor King 2022-02-24 00:11:24 UTC

+++ This bug was initially created as a clone of Bug #2053343 +++

--- Additional comment from Michael McCune on 2022-02-23 15:00:55 UTC ---

i got a chance to investigate deeper into the must-gather and i think that Matt correctly identified the root cause here.

from the original text, we can see that we expected node "ip-10-244-54-105.ec2.internal" to be scaled down by the autoscaler. when i look at the pods in the openshift-marketplace namespace i see this:

NAME                                          READY  STATUS     RESTARTS  AGE    IP            NODE
redhat-operators-hqstb                        1/1    Running    0         1d     10.130.10.18  ip-10-244-54-105.ec2.internal

we clearly have a pod running in the node. when looking deeper into the pod manifest, Matt again is spot on, we see this: (i have clipped just the relevant portion)

 ownerReferences:
  - apiVersion: operators.coreos.com/v1alpha1
    blockOwnerDeletion: false
    controller: false
    kind: CatalogSource
    name: redhat-operators

and indeed, looking through the autoscaler drain code it will not be able to remove this entry.

---

The machine-config operator has a similar need to drain pods.  It vendors the upstream drain library to do so [1], while the autoscaler has its own local code looking for well-known kinds [2].  The upstream drain library has [3]:

	controllerRef := metav1.GetControllerOf(&pod)
	if controllerRef != nil {
		return MakePodDeleteStatusOkay()
	}

GetControllerOf wraps GetControllerOfNoCopy [4].  GetControllerOfNoCopy returns the first ownerReferences entry where 'controller' is true [5].  The controller godocs are:

	// If true, this reference points to the managing controller.
	// +optional
	Controller *bool `json:"controller,omitempty" protobuf:"varint,6,opt,name=controller"`

Poking at a 4.10 -> 4.11 CI run [7], and excluding pods I suspect are static (and therefore ignored by the drain library):

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-aws-upgrade/1496494490028871680/artifacts/e2e-aws-upgrade/gather-extra/artifacts/pods.json | jq -r '.items[].metadata | select([(.ownerReferences // [])[] | select(.controller)] | length == 0) | .namespace + " " + .name + " " + (.ownerReferences | tostring)' | grep -v '^\(openshift-etcd\|openshift-kube-apiserver\|openshift-kube-controller-manager\|openshift-kube-scheduler\) '
  openshift-marketplace certified-operators-k58w7 [{"apiVersion":"operators.coreos.com/v1alpha1","blockOwnerDeletion":false,"controller":false,"kind":"CatalogSource","name":"certified-operators","uid":"39e79841-8f28-444f-94f6-9e6230b8bd18"}]
  openshift-marketplace community-operators-s7xps [{"apiVersion":"operators.coreos.com/v1alpha1","blockOwnerDeletion":false,"controller":false,"kind":"CatalogSource","name":"community-operators","uid":"5fcddec5-175d-488a-be07-162447cb54b8"}]
  openshift-marketplace redhat-marketplace-h8jp6 [{"apiVersion":"operators.coreos.com/v1alpha1","blockOwnerDeletion":false,"controller":false,"kind":"CatalogSource","name":"redhat-marketplace","uid":"dc130a38-a995-4118-9901-a8159fdf9632"}]
  openshift-marketplace redhat-operators-6chfq [{"apiVersion":"operators.coreos.com/v1alpha1","blockOwnerDeletion":false,"controller":false,"kind":"CatalogSource","name":"redhat-operators","uid":"034d00e2-0f3f-45a2-9267-93ee21d58149"}]

Poking at one of those:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-aws-upgrade/1496494490028871680/artifacts/e2e-aws-upgrade/gather-extra/artifacts/pods.json | jq -r '.items[].metadata | select(.name == "certified-operators-k58w7")'
  {
    "annotations": {
      "k8s.v1.cni.cncf.io/network-status": "[{\n    \"name\": \"openshift-sdn\",\n    \"interface\": \"eth0\",\n    \"ips\": [\n        \"10.128.0.45\"\n    ],\n    \"default\": true,\n    \"dns\": {}\n}]",
      "k8s.v1.cni.cncf.io/networks-status": "[{\n    \"name\": \"openshift-sdn\",\n    \"interface\": \"eth0\",\n    \"ips\": [\n        \"10.128.0.45\"\n    ],\n    \"default\": true,\n    \"dns\": {}\n}]",
      "openshift.io/scc": "anyuid",
      "operatorframework.io/managed-by": "marketplace-operator"
    },
    "creationTimestamp": "2022-02-23T16:07:31Z",
    "generateName": "certified-operators-",
    "labels": {
      "olm.catalogSource": "certified-operators",
      "olm.pod-spec-hash": "857db9f844"
    },
    "name": "certified-operators-k58w7",
    "namespace": "openshift-marketplace",
    "ownerReferences": [
      {
        "apiVersion": "operators.coreos.com/v1alpha1",
        "blockOwnerDeletion": false,
        "controller": false,
        "kind": "CatalogSource",
        "name": "certified-operators",
        "uid": "39e79841-8f28-444f-94f6-9e6230b8bd18"
      }
    ],
    "resourceVersion": "65842",
    "uid": "b1077bcb-08a7-402b-b700-ab2b6367813f"
  }

The 'operatorframework.io/managed-by: marketplace-operator' annotation is suggestive that there is a controller.

Dropping into audit logs in search of a creator:

  $ zgrep -h '"verb":"create".*"resource":"pods".*"namespace":"openshift-marketplace"' kube-apiserver/* | grep -v subresource | jq -r '.stageTimestamp + " " + .user.username' | grep '2022-02-23T16:07:31'
  2022-02-23T16:07:31.851000Z system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount
  2022-02-23T16:07:31.896892Z system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount

Even more suggestive that an OLM-y operator is creating these pods.  Checking events to see how they worked with drain:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-aws-upgrade/1496494490028871680/artifacts/e2e-aws-upgrade/gather-extra/artifacts/events.json | jq -r '[.items[] | select((.involvedObject.name // "" | startswith("certified-operators-")) and (.reason | . == "Scheduled" or . == "Killing"))] | sort_by(.metadata.creationTimestamp)[] | .metadata.creationTimestamp + " " + .involvedObject.name + " " + .reason + ": " + .message' | grep -1 16:07:
  2022-02-23T16:00:31Z certified-operators-bnxrw Killing: Stopping container registry-server
  2022-02-23T16:07:15Z certified-operators-zbb6r Killing: Stopping container registry-server
  2022-02-23T16:07:31Z certified-operators-k58w7 Scheduled: Successfully assigned openshift-marketplace/certified-operators-k58w7 to ip-10-0-184-130.us-west-1.compute.internal
  2022-02-23T16:17:03Z certified-operators-wbkc4 Scheduled: Successfully assigned openshift-marketplace/certified-operators-wbkc4 to ip-10-0-184-130.us-west-1.compute.internal

Which suggests certified-operators-zbb6r was the previous pod that certified-operators-k58w7 replaced.  Back to audit logs, I don't see any deletes or evictions:

  $ zgrep -h 'certified-operators-k58w7' kube-apiserver/* | jq -r .verb | sort | uniq -c
        2 create
       33 get
        3 patch
        1 update

Dropping into Loki, machine-config-daemon-zk9tj logs have:

  E0223 16:07:08.199572  195651 daemon.go:340] WARNING: ignoring DaemonSet-managed Pods: ..., openshift-marketplace/certified-operators-zbb6r, openshift-marketplace/community-operators-qpvff, openshift-marketplace/redhat-marketplace-dxpbn, openshift-marketplace/redhat-operators-mhlf5
  ...
  I0223 16:07:08.201839  195651 daemon.go:340] evicting pod openshift-marketplace/certified-operators-zbb6r
  ...
  I0223 16:07:19.831014  195651 daemon.go:325] Evicted pod openshift-marketplace/certified-operators-zbb6r

That's... not entirely clear to me.  Certainly doesn't look like a DaemonSet pod to me.  But whatever, seems like MCO is able to drain this pod without the 'controller: true' setting.

So no pressing functional need, but I'm creating this bug because I think the semantics of 'controller' mean we want whichever OLM component that creates these pods to be setting 'controller: true'.  That makes the relationship nice and clear to any tooling that is jumpy about removing uncontrolled pods, even if the MCO for some reason is still draining in this case.

[1]: https://github.com/openshift/machine-config-operator/blob/b7f7bb950e1d1ee66c90ed6761a162d402b74664/vendor/k8s.io/kubectl/pkg/drain/filters.go#L237-L239
[2]: https://github.com/kubernetes/autoscaler/blob/a255161a38b64be2f12e012cc1b76aec36c88d50/cluster-autoscaler/utils/drain/drain.go#L124
[3]: https://github.com/kubernetes/kubernetes/blame/77eb1a03df5ed29db8b093bc1b89778388593c2f/staging/src/k8s.io/kubectl/pkg/drain/filters.go#L237
[4]: https://github.com/kubernetes/kubernetes/blame/77eb1a03df5ed29db8b093bc1b89778388593c2f/staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/controller_ref.go#L33-L34
[5]: https://github.com/kubernetes/kubernetes/blame/77eb1a03df5ed29db8b093bc1b89778388593c2f/staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/controller_ref.go#L42-L47
[6]: https://github.com/kubernetes/kubernetes/blame/77eb1a03df5ed29db8b093bc1b89778388593c2f/staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/types.go#L307-L309
[7]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-aws-upgrade/1496494490028871680

Comment 1 W. Trevor King 2022-02-24 00:21:08 UTC

(In reply to W. Trevor King from comment #0)
> Dropping into Loki, machine-config-daemon-zk9tj logs have:
> 
>   E0223 16:07:08.199572  195651 daemon.go:340] WARNING: ignoring
> DaemonSet-managed Pods: ...,
> openshift-marketplace/certified-operators-zbb6r,
> openshift-marketplace/community-operators-qpvff,
> openshift-marketplace/redhat-marketplace-dxpbn,
> openshift-marketplace/redhat-operators-mhlf5
>   ...
>   I0223 16:07:08.201839  195651 daemon.go:340] evicting pod
> openshift-marketplace/certified-operators-zbb6r
>   ...
>   I0223 16:07:19.831014  195651 daemon.go:325] Evicted pod
> openshift-marketplace/certified-operators-zbb6r
> 
> That's... not entirely clear to me.  Certainly doesn't look like a DaemonSet
> pod to me.  But whatever, seems like MCO is able to drain this pod without
> the 'controller: true' setting.


Aha, this is because the MCO is forcing the drain [1].  So when we fix this bug and declare 'controller: true' on an ownerReferences entry, folks will no longer to force when using the upstream drain library to drain these openshift-marketplace pods.

[1]: https://github.com/openshift/machine-config-operator/blob/b7f7bb950e1d1ee66c90ed6761a162d402b74664/pkg/daemon/daemon.go#L315

Comment 2 W. Trevor King 2022-02-24 02:36:41 UTC

(In reply to W. Trevor King from comment #0)
>   E0223 16:07:08.199572  195651 daemon.go:340] WARNING: ignoring
> DaemonSet-managed Pods: ...,
> openshift-marketplace/certified-operators-zbb6r,
> ...

Better ellipsis for this log line:

  E0223 16:07:08.199572  195651 daemon.go:340] WARNING: ignoring DaemonSet-managed Pods: ...; deleting Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet: openshift-kube-apiserver/kube-apiserver-guard-ip-10-0-151-30.us-west-1.compute.internal, openshift-kube-controller-manager/kube-controller-manager-guard-ip-10-0-151-30.us-west-1.compute.internal, openshift-kube-scheduler/openshift-kube-scheduler-guard-ip-10-0-151-30.us-west-1.compute.internal, openshift-marketplace/certified-operators-zbb6r, openshift-marketplace/community-operators-qpvff, openshift-marketplace/redhat-marketplace-dxpbn, openshift-marketplace/redhat-operators-mhlf5

I've filed [1] to clean up the messaging a bit.  And looks like I need to follow up with whoever creates those guard-ip pods too...

[1]: https://github.com/kubernetes/kubernetes/pull/108314

Comment 3 W. Trevor King 2022-02-24 02:49:54 UTC

I've spun off bug 2057756 to track those guard pods.

Comment 4 Daniel Sover 2022-02-25 14:45:18 UTC

The issue is that the ownerref on the marketplace pods is the catalogsource that is fronting the pod (so that when a user deletes the catalogsource, the pod itself gets removed) instead of the underlying controller. This has been the behavior since the API has been introduced, so users are expecting the deletion to behave that way. It may make sense to update the ownerref to refer to the marketplace operator explicitly, which makes sense for draining purposes, but then the catalogsource won’t own it’s own pod anymore, and users who are used to the normal behavior (delete catalogsource deletes the underlying pod) may be impacted.

If we could have multiple ownerrefs on the pod (one for the catalog and one for the controller) that might resolve the issue. Since the metadata.ownerReferences field is plural this might be ok. It would be a backwards-compatible change.

Comment 5 W. Trevor King 2022-02-25 21:52:39 UTC

(In reply to Daniel Sover from comment #4)
> If we could have multiple ownerrefs on the pod...

That sounds plausible to me.  But looking for precedent in a Kube-core controller (ReplicaSet -> Pod):

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-aws-upgrade/1496494490028871680/artifacts/e2e-aws-upgrade/gather-extra/artifacts/pods.json | jq -r '.items[].metadata | select(.name | startswith("cluster-version-operator-")).ownerReferences[]'
{
  "apiVersion": "apps/v1",
  "blockOwnerDeletion": true,
  "controller": true,
  "kind": "ReplicaSet",
  "name": "cluster-version-operator-66bcf76d6b",
  "uid": "49a02058-a7c8-4edd-8b61-cfe6aba21d85"
}

So at least the ReplicaSet controller (which consumes the ReplicaSet and spits out Pods) doesn't actually point at itself, and claims the ReplicaSet itself as the controller.  And I see no pods today that declare multiple owners:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-aws-upgrade/1496494490028871680/artifacts/e2e-aws-upgrade/gather-extra/artifacts/pods.json | jq -r '.items[].metadata | select(.ownerReferences | length > 1) | .namespace + " " + .name'
...no hits...

But yeah, I'd personally have expected things like:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-aws-upgrade/1496494490028871680/artifacts/e2e-aws-upgrade/gather-extra/artifacts/pods.json | jq -r '.items[].metadata | select(.name | startswith("cluster-version-operator-")).ownerReferences[]'
{
  "apiVersion": "apps/v1",
  "blockOwnerDeletion": true,
  "controller": false,
  "kind": "ReplicaSet",
  "name": "cluster-version-operator-66bcf76d6b",
  "uid": "49a02058-a7c8-4edd-8b61-cfe6aba21d85"
}
{
  "apiVersion": "apps/v1",
  "blockOwnerDeletion": true,
  "controller": true,
  "kind": "Deployment",
  "namespace": "openshift-kube-controller-manager",
  "name": "openshift-controller-manager-operator",
  "uid": "..."
}

For "created by Deployment for openshift-controller-manager-operator for ReplicaSet cluster-version-operator-66bcf76d6b".  I'm not actually sure that's where the ReplicaSet controller lives, but whatever, that's the idea.  However, that's not going to work because [1]:

> Cross-namespace owner references are disallowed by design. Namespaced dependents can specify cluster-scoped or namespaced owners. A namespaced owner must exist in the same namespace as the dependent. If it does not, the owner reference is treated as absent, and the dependent is subject to deletion once all owners are verified absent.
>
> Cluster-scoped dependents can only specify cluster-scoped owners.

So in situations where we can't actually point at the controller, but where we can point at the triggering resource, then claiming that the triggering resource is a controller seems like a reasonable hack.  The alternative would be to find some other resource upstream of the controller (like a ClusterOperator or some such), and glossing over the fact that the actual controller is a subsystem beneath that more-central resource seems like a reasonable hack too.

[1]: https://kubernetes.io/docs/concepts/architecture/garbage-collection/#owners-dependents

Comment 7 jkeister 2023-02-13 21:57:16 UTC

OLM team has moved to jira.  Closing this as won't fix since it has not bubbled up to see action since 05Apr2022.  Please open an OCPBUGS jira, component OLM, if this is still an issue.

Comment 8 W. Trevor King 2023-02-14 00:00:48 UTC

I've ported this to Jira as https://issues.redhat.com/browse/OCPBUGS-7431