2057756 – static-pod guard pods lack ownerReferences

Bug 2057756 - static-pod guard pods lack ownerReferences

Summary: static-pod guard pods lack ownerReferences

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-controller-manager
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	---
Target Release:	---
Assignee:	Jan Chaloupka
QA Contact:	zhou ying
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-02-24 02:47 UTC by W. Trevor King
Modified:	2023-01-16 09:35 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	2057740
Environment:
Last Closed:	2023-01-16 09:35:42 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description W. Trevor King 2022-02-24 02:47:55 UTC

+++ This bug was initially created as a clone of Bug #2057740 +++

+++ This bug was initially created as a clone of Bug #2053343 +++

--- Additional comment from W. Trevor King on 2022-02-24 00:21:08 UTC ---

(In reply to W. Trevor King from comment #0)
> Dropping into Loki, machine-config-daemon-zk9tj logs have:
> 
>   E0223 16:07:08.199572  195651 daemon.go:340] WARNING: ignoring
> DaemonSet-managed Pods: ...,
> openshift-marketplace/certified-operators-zbb6r,
> openshift-marketplace/community-operators-qpvff,
> openshift-marketplace/redhat-marketplace-dxpbn,
> openshift-marketplace/redhat-operators-mhlf5
>   ...
>   I0223 16:07:08.201839  195651 daemon.go:340] evicting pod
> openshift-marketplace/certified-operators-zbb6r
>   ...
>   I0223 16:07:19.831014  195651 daemon.go:325] Evicted pod
> openshift-marketplace/certified-operators-zbb6r
> 
> That's... not entirely clear to me.  Certainly doesn't look like a DaemonSet
> pod to me.  But whatever, seems like MCO is able to drain this pod without
> the 'controller: true' setting.


Aha, this is because the MCO is forcing the drain [1].  So when we fix this bug and declare 'controller: true' on an ownerReferences entry, folks will no longer to force when using the upstream drain library to drain these openshift-marketplace pods.

[1]: https://github.com/openshift/machine-config-operator/blob/b7f7bb950e1d1ee66c90ed6761a162d402b74664/pkg/daemon/daemon.go#L315

--- Additional comment from W. Trevor King on 2022-02-24 02:36:41 UTC ---

(In reply to W. Trevor King from comment #0)
>   E0223 16:07:08.199572  195651 daemon.go:340] WARNING: ignoring
> DaemonSet-managed Pods: ...,
> openshift-marketplace/certified-operators-zbb6r,
> ...

Better ellipsis for this log line:

  E0223 16:07:08.199572  195651 daemon.go:340] WARNING: ignoring DaemonSet-managed Pods: ...; deleting Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet: openshift-kube-apiserver/kube-apiserver-guard-ip-10-0-151-30.us-west-1.compute.internal, openshift-kube-controller-manager/kube-controller-manager-guard-ip-10-0-151-30.us-west-1.compute.internal, openshift-kube-scheduler/openshift-kube-scheduler-guard-ip-10-0-151-30.us-west-1.compute.internal, openshift-marketplace/certified-operators-zbb6r, openshift-marketplace/community-operators-qpvff, openshift-marketplace/redhat-marketplace-dxpbn, openshift-marketplace/redhat-operators-mhlf5

I've filed [1] to clean up the messaging a bit.  And looks like I need to follow up with whoever creates those guard-ip pods too...

[1]: https://github.com/kubernetes/kubernetes/pull/108314

---

Bug 2057740 covers a lack of 'controller: true' ownerReferences keeping some openshift-marketplace from being able to be drained without --force.  This bug tracks the new-in-4.10 guard pods lacking ownerReferences entirely.  Ideally they'd be marked so that it was clear to drain (and everyone else) that there was a controller in charge of creating those Pods and pointing at some resource associated with that controller.  Definition of done is emptying out the following query:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-aws-upgrade/1496494490028871680/artifacts/e2e-aws-upgrade/gather-extra/artifacts/pods.json | jq -r '.items[].metadata | select((.name | contains("-guard-")) and ([(.ownerReferences // [])[] | select(.controller)] | length == 0)) | .namespace + " " + .name + " " + (.ownerReferences | tostring)'
openshift-kube-apiserver kube-apiserver-guard-ip-10-0-151-30.us-west-1.compute.internal null
openshift-kube-apiserver kube-apiserver-guard-ip-10-0-184-130.us-west-1.compute.internal null
openshift-kube-apiserver kube-apiserver-guard-ip-10-0-193-89.us-west-1.compute.internal null
openshift-kube-controller-manager kube-controller-manager-guard-ip-10-0-151-30.us-west-1.compute.internal null
openshift-kube-controller-manager kube-controller-manager-guard-ip-10-0-184-130.us-west-1.compute.internal null
openshift-kube-controller-manager kube-controller-manager-guard-ip-10-0-193-89.us-west-1.compute.internal null
openshift-kube-scheduler openshift-kube-scheduler-guard-ip-10-0-151-30.us-west-1.compute.internal null
openshift-kube-scheduler openshift-kube-scheduler-guard-ip-10-0-184-130.us-west-1.compute.internal null
openshift-kube-scheduler openshift-kube-scheduler-guard-ip-10-0-193-89.us-west-1.compute.internal null

where I'm using [1] as an example CI run.

[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-aws-upgrade/1496494490028871680

Comment 1 Jan Chaloupka 2022-03-11 14:25:01 UTC

Due to higher priority tasks I have been able to resolve this issue in time. Moving to the next sprint.

Comment 2 Jan Chaloupka 2022-04-25 13:26:34 UTC

Due to higher priority tasks I have been able to resolve this issue in time. Moving to the next sprint.

Comment 5 Jan Chaloupka 2023-01-16 09:35:42 UTC

Ported to https://issues.redhat.com/browse/WRKLDS-646

Note You need to log in before you can comment on or make changes to this bug.