Force deleting a pod is not allowed for automated processes within Kube or OpenShift unless done so by a human. That action is reserved because it is effectively bypassing the safety mechanisms of a cluster which ensure that only one pod is on any one node at a time, and leaves the state of the system inconsistency between apiserver and node (the node may still run that old process indefinitely). OLM is force deleting (grace period zero) the community and marketplace operators. It may not do so, and instead should delete with a grace period of 1 if it wants "the pod to be deleted ASAP". This was caught by debug code we added while looking at another bug where pods were force deleted when they should not have been (in https://github.com/openshift/kubernetes/pull/613) I0314 00:01:24.469711 18 store.go:926] DEBUG: Consumer that is not node system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount requested delete of pods openshift-marketplace/community-operators-sdjjv with explicit grace period zero (deletionTimestamp=<nil>) I0314 00:01:25.069608 18 store.go:926] DEBUG: Consumer that is not node system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount requested delete of pods openshift-marketplace/redhat-operators-px42b with explicit grace period zero (deletionTimestamp=<nil>) This may not be deferred from 4.8.
For OCP4.8, the OLM is built from the https://github.com/openshift/operator-framework-olm repo, not the https://github.com/operator-framework/operator-lifecycle-manager repo anymore. And, I don't find that above fixed PR https://github.com/operator-framework/operator-lifecycle-manager/pull/2047 was cherry-picked to this https://github.com/openshift/operator-framework-olm repo, change the status to ASSIGNED first.
I think I'm still seeing this Apr 22 15:21:57.646 W ns/openshift-marketplace pod/redhat-operators-4lscq node/ip-10-0-193-52.ec2.internal reason/DeleteWithoutGracePeriod Apr 22 15:21:57.646 I ns/openshift-marketplace pod/redhat-operators-4lscq node/ip-10-0-193-52.ec2.internal reason/Deleted Apr 22 15:21:58.362 W ns/openshift-marketplace pod/community-operators-w4lhz node/ip-10-0-188-231.ec2.internal reason/DeleteWithoutGracePeriod Apr 22 15:21:58.362 I ns/openshift-marketplace pod/community-operators-w4lhz node/ip-10-0-188-231.ec2.internal reason/Deleted This is a new test condition I was checking, but the test condition may be wrong. We're positive that the operator code inside CI is up to date?
I just double-checked that the downstream repository still contains the bug fixes that were introduced in the PR(s) that are linked in this bug. It sounds like those fixes either weren't enough, the test condition needs to be updated, or our CI wasn't properly set up when building up the downstream prow configuration. I'd like to rule out the latter is quick as possible, but we've been promoting images from the downstream repository for a couple of weeks now and already verified a couple of other bugs at this point.
I think based on logs my test condition just may not be accurate, because I do see this: [36mINFO[0m[2021-04-22T20:41:23Z] Apr 22 20:33:16.081 W ns/openshift-marketplace pod/community-operators-ql69j node/ip-10-0-166-178.us-east-2.compute.internal reason/GracefulDelete in 1s [36mINFO[0m[2021-04-22T20:41:23Z] Apr 22 20:34:02.593 W ns/openshift-marketplace pod/redhat-operators-wjcfm node/ip-10-0-166-178.us-east-2.compute.internal reason/GracefulDelete in 1s [36mINFO[0m[2021-04-22T20:41:23Z] Apr 22 20:40:07.831 W ns/openshift-marketplace pod/redhat-marketplace-2vjpt node/ip-10-0-166-178.us-east-2.compute.internal reason/GracefulDelete in 1s So I also consider this verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438