Description of problem: Have multiple projects stuck in "Terminating" state due to loss of node. Each has an underlying pod which is stuck in an "Unknown" state. These objects cannot be deleted or renamed Version-Release number of selected component (if applicable): 3.7.23-1 How reproducible: Frequent Steps to Reproduce: 1. Delete project containing many resources. Coolstore MSA is one example 2. Fail a node running a pod 3. oc delete --force project <<project_name>> 4. oc delete --force pod <<pod_name>> -n <<project_name>> Actual results: 3. Error from server (Conflict): Operation cannot be fulfilled on namespaces "<<project_name>>": The system is ensuring all content is removed from this namespace. Upon completion, this namespace will automatically be purged by the system. 4. pod "<<pod_name>>" deleted ^ But the pod is never deleted, even after several days Expected results: Both resource and project promptly deleted Additional info: oc get project <<project_name>> <<project_name>> CoolStore Terminating oc get all -n <<project_name>> po.<<pod_name>> 1/1 Unknown 0 3d Having a method to rename the errant project would also be acceptable (though not preferred)
cc'ing Jordan for advice
Possible workaround: * oc project <<project_name>> * oc delete <<resource_type>> <<resource_name>> --grace-period=0 --force
Can you provide the full yaml of the pod which is stuck in unknown state?
Sorry, not atm - the workaround in comment 2 removed it You can see the source at https://github.com/jbossdemocentral/coolstore-microservice but it a bit of a slog to find the underlying yaml. If it reoccurs, I'll update this bz.
Lowering severity per workaround from comment 2. Will investigate this a bit more once it is able to be reproduced again.
My mistake, there are multiple KCS articles (now attached) and related bzs. Since we have valid ways of accomplishing the reported case, closing this bz.
Added the pod yaml in private attachment from a similar situation. In this case, trying a node upgrade resulted in the "Drain Node" hanging overnight on a 3.6 system. As with the above, following comment 2 to delete the offending pod allowed the Ansible script to continue. Reopening
--force on its own won't do anything, there's a PR upstream [1] that updates the information so that it's clear that for forceful deletion you need to specify both --force and --grace-period=0 otherwise this deletion is nothing more than a regular oc delete call. [1] https://github.com/kubernetes/kubernetes/pull/61378
Picked PR from comment 9 into Origin [1]. Adds additional warnings to the `delete` command when using --force with a non-zero grace period. 1. https://github.com/openshift/origin/pull/19213
Verified in oc v3.10.0-0.54.0, --force only, or --force --grace-period non-zero have warning $ oc delete --force pod mydc-1-jh6pv warning: --force is ignored because --grace-period is not 0 pod "mydc-1-jh6pv" deleted $ oc delete --force --grace-period=5 pod mydc-1-jh6pv warning: --force is ignored because --grace-period is not 0. pod "mydc-1-jh6pv" deleted $ oc get pod NAME READY STATUS RESTARTS AGE mydc-1-jh6pv 1/1 Terminating 0 9m $ oc delete --force --grace-period=0 pod mydc-1-jh6pv warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely. pod "mydc-1-jh6pv" force deleted $ oc get pod | grep mydc-1-jh6pv # none
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0405