1562961 – Unable to force delete of zombie resources (including projects)

Bug 1562961 - Unable to force delete of zombie resources (including projects)

Summary: Unable to force delete of zombie resources (including projects)

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	oc
Sub Component:
Version:	3.7.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	low
Target Milestone:	---
Target Release:	3.10.0
Assignee:	Maciej Szulik
QA Contact:	Xingxing Xia
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1724792
TreeView+	depends on / blocked

Reported:	2018-04-02 19:09 UTC by Thom Carlin
Modified:	2019-06-28 16:04 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: Force delete was not properly passed to the server. Consequence: Some resources could not be removed. Fix: Properly pass force deletion option to the server when removing an object. Result: Force delete is working as expected.
Clone Of:
Environment:
Last Closed:	2019-03-14 02:15:34 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1489082	unspecified	CLOSED	Deleting a project with a pod that uses a CNS PV results in pod stuck in terminating state	2021-06-10 12:57:48 UTC
Red Hat Bugzilla	1548311	medium	CLOSED	Zombie project issue - while delete a project that contain provision failed instance	2021-02-22 00:41:40 UTC
Red Hat Bugzilla	1563673	medium	CLOSED	[RFE] Add timeout when draining a node for update	2021-12-10 15:54:29 UTC
Red Hat Knowledge Base (Solution)	2317401	None	None	None	2018-04-04 15:22:57 UTC
Red Hat Knowledge Base (Solution)	3263161	None	None	None	2018-04-04 15:21:37 UTC
Red Hat Product Errata	RHBA-2019:0405	None	None	None	2019-03-14 02:15:41 UTC

Internal Links: 1489082 1548311 1563673

Description Thom Carlin 2018-04-02 19:09:32 UTC

Description of problem:

Have multiple projects stuck in "Terminating" state due to loss of node.  Each has an underlying pod which is stuck in an "Unknown" state.  These objects cannot be deleted or renamed

Version-Release number of selected component (if applicable):

3.7.23-1

How reproducible:

Frequent

Steps to Reproduce:
1. Delete project containing many resources.  Coolstore MSA is one example
2. Fail a node running a pod
3. oc delete --force project <<project_name>>
4. oc delete --force pod <<pod_name>> -n <<project_name>>

Actual results:

3. Error from server (Conflict): Operation cannot be fulfilled on namespaces "<<project_name>>": The system is ensuring all content is removed from this namespace.  Upon completion, this namespace will automatically be purged by the system.

4. pod "<<pod_name>>" deleted 
^ But the pod is never deleted, even after several days

Expected results:

Both resource and project promptly deleted

Additional info:

oc get project <<project_name>>

<<project_name>>          CoolStore      Terminating

oc get all -n <<project_name>>

po.<<pod_name>>   1/1       Unknown   0          3d

Having a method to rename the errant project would also be acceptable (though not preferred)

Comment 1 Juan Vallejo 2018-04-02 20:31:09 UTC

cc'ing Jordan for advice

Comment 2 Thom Carlin 2018-04-03 14:32:17 UTC

Possible workaround:
* oc project <<project_name>>
* oc delete <<resource_type>> <<resource_name>> --grace-period=0 --force

Comment 3 Maciej Szulik 2018-04-03 15:13:43 UTC

Can you provide the full yaml of the pod which is stuck in unknown state?

Comment 4 Thom Carlin 2018-04-03 15:20:47 UTC

Sorry, not atm - the workaround in comment 2 removed it

You can see the source at https://github.com/jbossdemocentral/coolstore-microservice but it a bit of a slog to find the underlying yaml.

If it reoccurs, I'll update this bz.

Comment 5 Juan Vallejo 2018-04-03 15:36:44 UTC

Lowering severity per workaround from comment 2.
Will investigate this a bit more once it is able to be reproduced again.

Comment 6 Thom Carlin 2018-04-03 16:00:17 UTC

My mistake, there are multiple KCS articles (now attached) and related bzs.

Since we have valid ways of accomplishing the reported case, closing this bz.

Comment 8 Thom Carlin 2018-04-04 10:35:12 UTC

Added the pod yaml in private attachment from a similar situation.  In this case, trying a node upgrade resulted in the "Drain Node" hanging overnight on a 3.6 system.  As with the above, following comment 2 to delete the offending pod allowed the Ansible script to continue.

Reopening

Comment 9 Maciej Szulik 2018-04-04 14:05:48 UTC

--force on its own won't do anything, there's a PR upstream [1] that updates the information so that it's clear that for forceful deletion you need to specify both --force and --grace-period=0 otherwise this deletion is nothing more than a regular oc delete call. 


[1] https://github.com/kubernetes/kubernetes/pull/61378

Comment 10 Juan Vallejo 2018-04-04 15:04:33 UTC

Picked PR from comment 9 into Origin [1].
Adds additional warnings to the `delete` command when using --force with a non-zero grace period.

1. https://github.com/openshift/origin/pull/19213

Comment 11 Xingxing Xia 2018-05-31 06:31:41 UTC

Verified in oc v3.10.0-0.54.0, --force only, or --force --grace-period non-zero have warning
$ oc delete --force pod mydc-1-jh6pv
warning: --force is ignored because --grace-period is not 0
pod "mydc-1-jh6pv" deleted

$ oc delete --force --grace-period=5 pod mydc-1-jh6pv
warning: --force is ignored because --grace-period is not 0.
pod "mydc-1-jh6pv" deleted

$ oc get pod
NAME                  READY     STATUS        RESTARTS   AGE
mydc-1-jh6pv          1/1       Terminating   0          9m

$ oc delete --force --grace-period=0 pod mydc-1-jh6pv
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "mydc-1-jh6pv" force deleted

$ oc get pod | grep mydc-1-jh6pv # none

Comment 18 errata-xmlrpc 2019-03-14 02:15:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0405

Note You need to log in before you can comment on or make changes to this bug.