1559987 – Unable to delete deploymentconfig

Bug 1559987 - Unable to delete deploymentconfig

Summary: Unable to delete deploymentconfig

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Master
Sub Component:
Version:	3.7.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	3.7.z
Assignee:	Michal Fojtik
QA Contact:	Wang Haoran
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1267746 1678028
TreeView+	depends on / blocked

Reported:	2018-03-23 16:29 UTC by Robert Bost
Modified:	2021-12-10 15:50 UTC (History)
CC List:	21 users (show)
Fixed In Version:	v3.7.49-1
Doc Type:	Bug Fix
Doc Text:	Cause: In some cases the shared informer caches is was not initialized properly or failed to initialize. Consequence: Controllers like garbage collection stuck in wait for caches to be initialized Fix: In case the cache is stuck, don't wait for it to initialize but forward the request to storage (etcd) directly to unblock controllers. Result: Controllers can reach the resources without being stuck on cache to initialize.
Clone Of:
Clones:	1678028 (view as bug list)
Environment:
Last Closed:	2018-07-11 09:57:57 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	3391401	0	None	None	None	2018-03-24 00:07:51 UTC
Red Hat Product Errata	RHBA-2018:1798	0	None	None	None	2018-06-26 06:43:51 UTC

Description Robert Bost 2018-03-23 16:29:20 UTC

Description of problem: 

Unable to delete deploymentconfig resource. The deploymentconfig has a finalizer but unable to find the blocking resource:

# oc get dc -o yaml NAME_OF_DC
apiVersion: v1
kind: DeploymentConfig
metadata:
  creationTimestamp: 2018-03-12T00:49:16Z
  deletionGracePeriodSeconds: 0
  deletionTimestamp: 2018-03-21T18:41:29Z
  finalizers:
  - foregroundDeletion
  generation: 29
  labels:
    app: firsttestgateway
  name: firsttestgateway
  namespace: first-dt
  resourceVersion: "125930908"
  selfLink: /oapi/v1/namespaces/first-dt/deploymentconfigs/firsttestgateway
  ...

Version-Release number of selected component (if applicable): atomic-openshift-3.7.23-1.git.0.8edc154.el7.x86_64


How reproducible: Reproducer steps unclear

Actual results: Running `oc delete dc/firsttestgateway` returned successful message but running `oc get dc` after still showed the deploymentconfig. The deploymentconfig hung around for days until manually deleting the finalizer from the dc yaml and running `oc delete` again.

Comment 1 Michal Fojtik 2018-03-26 08:22:32 UTC

Can you please provide a dump of pods/replication controllers associated with this DC? Additionally an API server and controllers journal will be helpful to analyze.

Comment 2 Robert Bost 2018-03-26 15:40:15 UTC

We do not currently have those details but requesting them now from another instance of the issue. Leaving needfinfo set.

Comment 4 Maciej Szulik 2018-03-28 10:19:50 UTC

We would need to see controller logs from the time this removal was being invoked (at least for +1h after the initial oc delete invocation). It looks like there were some problems removing the dependant objects (either replication controllers or pods) the DC owned. Without the dependants being properly removed the actual DC won't be removed either. I'd like to investigate the logs to further confirm that theory and examine what might be causing this problem.

Comment 6 Maciej Szulik 2018-04-05 07:44:17 UTC

I've reviewed the attached logs and unfortunately I can't figure out what's exactly going on. The logs suggest as if everything is working as expected (with that I mean I don't see any errors), but I can't verify any theory without the full yaml of dependant resources or garbage collector logs at a higher level. 

I'd suggest the next time this situation happens, before applying the workaround, please gather the following data:

- controller logs but with loglevel at least 2 or higher (this is level at which garbage collector produces valuable output)
- full yamls for all the resources involved, in the case similar to the one described in comment 1 that will be: deployment config, replication controllers and pods.

Comment 29 Robert Bost 2018-05-21 13:44:47 UTC

Resetting needinfo

Comment 43 errata-xmlrpc 2018-06-26 06:43:43 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1798

Note You need to log in before you can comment on or make changes to this bug.

acomabon
aos-bugs
bfurtado
deads
dsafford
fshaikh
glamb
jdesousa
jkaur
jmalde
jokerman
kmendez
maszulik
mfojtik
mmccomas
openshift-bugs-escalate
rbost
smunilla
sthangav
stwalter
suchaudh