Bug 1559987 - Unable to delete deploymentconfig
Summary: Unable to delete deploymentconfig
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Master
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.7.z
Assignee: Michal Fojtik
QA Contact: Wang Haoran
URL:
Whiteboard:
Depends On:
Blocks: 1267746 1678028
TreeView+ depends on / blocked
 
Reported: 2018-03-23 16:29 UTC by Robert Bost
Modified: 2021-12-10 15:50 UTC (History)
21 users (show)

Fixed In Version: v3.7.49-1
Doc Type: Bug Fix
Doc Text:
Cause: In some cases the shared informer caches is was not initialized properly or failed to initialize. Consequence: Controllers like garbage collection stuck in wait for caches to be initialized Fix: In case the cache is stuck, don't wait for it to initialize but forward the request to storage (etcd) directly to unblock controllers. Result: Controllers can reach the resources without being stuck on cache to initialize.
Clone Of:
: 1678028 (view as bug list)
Environment:
Last Closed: 2018-07-11 09:57:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3391401 0 None None None 2018-03-24 00:07:51 UTC
Red Hat Product Errata RHBA-2018:1798 0 None None None 2018-06-26 06:43:51 UTC

Description Robert Bost 2018-03-23 16:29:20 UTC
Description of problem: 

Unable to delete deploymentconfig resource. The deploymentconfig has a finalizer but unable to find the blocking resource:

# oc get dc -o yaml NAME_OF_DC
apiVersion: v1
kind: DeploymentConfig
metadata:
  creationTimestamp: 2018-03-12T00:49:16Z
  deletionGracePeriodSeconds: 0
  deletionTimestamp: 2018-03-21T18:41:29Z
  finalizers:
  - foregroundDeletion
  generation: 29
  labels:
    app: firsttestgateway
  name: firsttestgateway
  namespace: first-dt
  resourceVersion: "125930908"
  selfLink: /oapi/v1/namespaces/first-dt/deploymentconfigs/firsttestgateway
  ...

Version-Release number of selected component (if applicable): atomic-openshift-3.7.23-1.git.0.8edc154.el7.x86_64


How reproducible: Reproducer steps unclear

Actual results: Running `oc delete dc/firsttestgateway` returned successful message but running `oc get dc` after still showed the deploymentconfig. The deploymentconfig hung around for days until manually deleting the finalizer from the dc yaml and running `oc delete` again.

Comment 1 Michal Fojtik 2018-03-26 08:22:32 UTC
Can you please provide a dump of pods/replication controllers associated with this DC? Additionally an API server and controllers journal will be helpful to analyze.

Comment 2 Robert Bost 2018-03-26 15:40:15 UTC
We do not currently have those details but requesting them now from another instance of the issue. Leaving needfinfo set.

Comment 4 Maciej Szulik 2018-03-28 10:19:50 UTC
We would need to see controller logs from the time this removal was being invoked (at least for +1h after the initial oc delete invocation). It looks like there were some problems removing the dependant objects (either replication controllers or pods) the DC owned. Without the dependants being properly removed the actual DC won't be removed either. I'd like to investigate the logs to further confirm that theory and examine what might be causing this problem.

Comment 6 Maciej Szulik 2018-04-05 07:44:17 UTC
I've reviewed the attached logs and unfortunately I can't figure out what's exactly going on. The logs suggest as if everything is working as expected (with that I mean I don't see any errors), but I can't verify any theory without the full yaml of dependant resources or garbage collector logs at a higher level. 

I'd suggest the next time this situation happens, before applying the workaround, please gather the following data:

- controller logs but with loglevel at least 2 or higher (this is level at which garbage collector produces valuable output)
- full yamls for all the resources involved, in the case similar to the one described in comment 1 that will be: deployment config, replication controllers and pods.

Comment 29 Robert Bost 2018-05-21 13:44:47 UTC
Resetting needinfo

Comment 43 errata-xmlrpc 2018-06-26 06:43:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1798


Note You need to log in before you can comment on or make changes to this bug.