Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2057756

Summary: static-pod guard pods lack ownerReferences
Product: OpenShift Container Platform Reporter: W. Trevor King <wking>
Component: kube-controller-managerAssignee: Jan Chaloupka <jchaloup>
Status: CLOSED DEFERRED QA Contact: zhou ying <yinzhou>
Severity: low Docs Contact:
Priority: low    
Version: 4.10CC: jiazha, kramraja, mbargenq, mfojtik, mimccune, oarribas, pegoncal, pmagotra, wking
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2057740 Environment:
Last Closed: 2023-01-16 09:35:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description W. Trevor King 2022-02-24 02:47:55 UTC
+++ This bug was initially created as a clone of Bug #2057740 +++

+++ This bug was initially created as a clone of Bug #2053343 +++

--- Additional comment from W. Trevor King on 2022-02-24 00:21:08 UTC ---

(In reply to W. Trevor King from comment #0)
> Dropping into Loki, machine-config-daemon-zk9tj logs have:
> 
>   E0223 16:07:08.199572  195651 daemon.go:340] WARNING: ignoring
> DaemonSet-managed Pods: ...,
> openshift-marketplace/certified-operators-zbb6r,
> openshift-marketplace/community-operators-qpvff,
> openshift-marketplace/redhat-marketplace-dxpbn,
> openshift-marketplace/redhat-operators-mhlf5
>   ...
>   I0223 16:07:08.201839  195651 daemon.go:340] evicting pod
> openshift-marketplace/certified-operators-zbb6r
>   ...
>   I0223 16:07:19.831014  195651 daemon.go:325] Evicted pod
> openshift-marketplace/certified-operators-zbb6r
> 
> That's... not entirely clear to me.  Certainly doesn't look like a DaemonSet
> pod to me.  But whatever, seems like MCO is able to drain this pod without
> the 'controller: true' setting.


Aha, this is because the MCO is forcing the drain [1].  So when we fix this bug and declare 'controller: true' on an ownerReferences entry, folks will no longer to force when using the upstream drain library to drain these openshift-marketplace pods.

[1]: https://github.com/openshift/machine-config-operator/blob/b7f7bb950e1d1ee66c90ed6761a162d402b74664/pkg/daemon/daemon.go#L315

--- Additional comment from W. Trevor King on 2022-02-24 02:36:41 UTC ---

(In reply to W. Trevor King from comment #0)
>   E0223 16:07:08.199572  195651 daemon.go:340] WARNING: ignoring
> DaemonSet-managed Pods: ...,
> openshift-marketplace/certified-operators-zbb6r,
> ...

Better ellipsis for this log line:

  E0223 16:07:08.199572  195651 daemon.go:340] WARNING: ignoring DaemonSet-managed Pods: ...; deleting Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet: openshift-kube-apiserver/kube-apiserver-guard-ip-10-0-151-30.us-west-1.compute.internal, openshift-kube-controller-manager/kube-controller-manager-guard-ip-10-0-151-30.us-west-1.compute.internal, openshift-kube-scheduler/openshift-kube-scheduler-guard-ip-10-0-151-30.us-west-1.compute.internal, openshift-marketplace/certified-operators-zbb6r, openshift-marketplace/community-operators-qpvff, openshift-marketplace/redhat-marketplace-dxpbn, openshift-marketplace/redhat-operators-mhlf5

I've filed [1] to clean up the messaging a bit.  And looks like I need to follow up with whoever creates those guard-ip pods too...

[1]: https://github.com/kubernetes/kubernetes/pull/108314

---

Bug 2057740 covers a lack of 'controller: true' ownerReferences keeping some openshift-marketplace from being able to be drained without --force.  This bug tracks the new-in-4.10 guard pods lacking ownerReferences entirely.  Ideally they'd be marked so that it was clear to drain (and everyone else) that there was a controller in charge of creating those Pods and pointing at some resource associated with that controller.  Definition of done is emptying out the following query:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-aws-upgrade/1496494490028871680/artifacts/e2e-aws-upgrade/gather-extra/artifacts/pods.json | jq -r '.items[].metadata | select((.name | contains("-guard-")) and ([(.ownerReferences // [])[] | select(.controller)] | length == 0)) | .namespace + " " + .name + " " + (.ownerReferences | tostring)'
openshift-kube-apiserver kube-apiserver-guard-ip-10-0-151-30.us-west-1.compute.internal null
openshift-kube-apiserver kube-apiserver-guard-ip-10-0-184-130.us-west-1.compute.internal null
openshift-kube-apiserver kube-apiserver-guard-ip-10-0-193-89.us-west-1.compute.internal null
openshift-kube-controller-manager kube-controller-manager-guard-ip-10-0-151-30.us-west-1.compute.internal null
openshift-kube-controller-manager kube-controller-manager-guard-ip-10-0-184-130.us-west-1.compute.internal null
openshift-kube-controller-manager kube-controller-manager-guard-ip-10-0-193-89.us-west-1.compute.internal null
openshift-kube-scheduler openshift-kube-scheduler-guard-ip-10-0-151-30.us-west-1.compute.internal null
openshift-kube-scheduler openshift-kube-scheduler-guard-ip-10-0-184-130.us-west-1.compute.internal null
openshift-kube-scheduler openshift-kube-scheduler-guard-ip-10-0-193-89.us-west-1.compute.internal null

where I'm using [1] as an example CI run.

[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-aws-upgrade/1496494490028871680

Comment 1 Jan Chaloupka 2022-03-11 14:25:01 UTC
Due to higher priority tasks I have been able to resolve this issue in time. Moving to the next sprint.

Comment 2 Jan Chaloupka 2022-04-25 13:26:34 UTC
Due to higher priority tasks I have been able to resolve this issue in time. Moving to the next sprint.

Comment 5 Jan Chaloupka 2023-01-16 09:35:42 UTC
Ported to https://issues.redhat.com/browse/WRKLDS-646