Bug 1562961 - Unable to force delete of zombie resources (including projects)
Summary: Unable to force delete of zombie resources (including projects)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: oc
Version: 3.7.1
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: ---
: 3.10.0
Assignee: Maciej Szulik
QA Contact: Xingxing Xia
URL:
Whiteboard:
Depends On:
Blocks: 1724792
TreeView+ depends on / blocked
 
Reported: 2018-04-02 19:09 UTC by Thom Carlin
Modified: 2019-06-28 16:04 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Force delete was not properly passed to the server. Consequence: Some resources could not be removed. Fix: Properly pass force deletion option to the server when removing an object. Result: Force delete is working as expected.
Clone Of:
Environment:
Last Closed: 2019-03-14 02:15:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1489082 0 unspecified CLOSED Deleting a project with a pod that uses a CNS PV results in pod stuck in terminating state 2021-06-10 12:57:48 UTC
Red Hat Bugzilla 1548311 0 medium CLOSED Zombie project issue - while delete a project that contain provision failed instance 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 1563673 0 medium CLOSED [RFE] Add timeout when draining a node for update 2021-12-10 15:54:29 UTC
Red Hat Knowledge Base (Solution) 2317401 0 None None None 2018-04-04 15:22:57 UTC
Red Hat Knowledge Base (Solution) 3263161 0 None None None 2018-04-04 15:21:37 UTC
Red Hat Product Errata RHBA-2019:0405 0 None None None 2019-03-14 02:15:41 UTC

Internal Links: 1489082 1548311 1563673

Description Thom Carlin 2018-04-02 19:09:32 UTC
Description of problem:

Have multiple projects stuck in "Terminating" state due to loss of node.  Each has an underlying pod which is stuck in an "Unknown" state.  These objects cannot be deleted or renamed

Version-Release number of selected component (if applicable):

3.7.23-1

How reproducible:

Frequent

Steps to Reproduce:
1. Delete project containing many resources.  Coolstore MSA is one example
2. Fail a node running a pod
3. oc delete --force project <<project_name>>
4. oc delete --force pod <<pod_name>> -n <<project_name>>

Actual results:

3. Error from server (Conflict): Operation cannot be fulfilled on namespaces "<<project_name>>": The system is ensuring all content is removed from this namespace.  Upon completion, this namespace will automatically be purged by the system.

4. pod "<<pod_name>>" deleted 
^ But the pod is never deleted, even after several days

Expected results:

Both resource and project promptly deleted

Additional info:

oc get project <<project_name>>

<<project_name>>          CoolStore      Terminating

oc get all -n <<project_name>>

po.<<pod_name>>   1/1       Unknown   0          3d

Having a method to rename the errant project would also be acceptable (though not preferred)

Comment 1 Juan Vallejo 2018-04-02 20:31:09 UTC
cc'ing Jordan for advice

Comment 2 Thom Carlin 2018-04-03 14:32:17 UTC
Possible workaround:
* oc project <<project_name>>
* oc delete <<resource_type>> <<resource_name>> --grace-period=0 --force

Comment 3 Maciej Szulik 2018-04-03 15:13:43 UTC
Can you provide the full yaml of the pod which is stuck in unknown state?

Comment 4 Thom Carlin 2018-04-03 15:20:47 UTC
Sorry, not atm - the workaround in comment 2 removed it

You can see the source at https://github.com/jbossdemocentral/coolstore-microservice but it a bit of a slog to find the underlying yaml.

If it reoccurs, I'll update this bz.

Comment 5 Juan Vallejo 2018-04-03 15:36:44 UTC
Lowering severity per workaround from comment 2.
Will investigate this a bit more once it is able to be reproduced again.

Comment 6 Thom Carlin 2018-04-03 16:00:17 UTC
My mistake, there are multiple KCS articles (now attached) and related bzs.

Since we have valid ways of accomplishing the reported case, closing this bz.

Comment 8 Thom Carlin 2018-04-04 10:35:12 UTC
Added the pod yaml in private attachment from a similar situation.  In this case, trying a node upgrade resulted in the "Drain Node" hanging overnight on a 3.6 system.  As with the above, following comment 2 to delete the offending pod allowed the Ansible script to continue.

Reopening

Comment 9 Maciej Szulik 2018-04-04 14:05:48 UTC
--force on its own won't do anything, there's a PR upstream [1] that updates the information so that it's clear that for forceful deletion you need to specify both --force and --grace-period=0 otherwise this deletion is nothing more than a regular oc delete call. 


[1] https://github.com/kubernetes/kubernetes/pull/61378

Comment 10 Juan Vallejo 2018-04-04 15:04:33 UTC
Picked PR from comment 9 into Origin [1].
Adds additional warnings to the `delete` command when using --force with a non-zero grace period.

1. https://github.com/openshift/origin/pull/19213

Comment 11 Xingxing Xia 2018-05-31 06:31:41 UTC
Verified in oc v3.10.0-0.54.0, --force only, or --force --grace-period non-zero have warning
$ oc delete --force pod mydc-1-jh6pv
warning: --force is ignored because --grace-period is not 0
pod "mydc-1-jh6pv" deleted

$ oc delete --force --grace-period=5 pod mydc-1-jh6pv
warning: --force is ignored because --grace-period is not 0.
pod "mydc-1-jh6pv" deleted

$ oc get pod
NAME                  READY     STATUS        RESTARTS   AGE
mydc-1-jh6pv          1/1       Terminating   0          9m

$ oc delete --force --grace-period=0 pod mydc-1-jh6pv
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "mydc-1-jh6pv" force deleted

$ oc get pod | grep mydc-1-jh6pv # none

Comment 18 errata-xmlrpc 2019-03-14 02:15:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0405


Note You need to log in before you can comment on or make changes to this bug.