Description of problem: In an OCP cluster in 3.7.54-1 recently upgraded from 3.5, starting with migration to 3.6.173, deleted objects via web console are marked with: finalizers: - foregroundDeletion But are never being garbage collected. Might be related to https://bugzilla.redhat.com/show_bug.cgi?id=1559987 (CLOSED ERRATA) Version-Release number of selected component (if applicable): oadm v3.7.54 kubernetes v1.7.6+a08f5eeb62 openshift v3.7.54 kubernetes v1.7.6+a08f5eeb62 How reproducible: Any time a resource is deleted from the Web Console, from the CLI works perfectly. Steps to Reproduce: 1. 2. 3. Actual results: Objects remain undeleted. Expected results: Objects should be deleted. Additional info:
The garbage collector maintains a graph of all objects and their ownerReferences Before processing any deletions that involve inter-object relationships, it must have a complete graph of all resources. Deletion with foregroundDeletion means "delete objects whose ownerReferences point to this object, then delete this object". The worker responsible for doing that is not run until caches are filled for all object types and the graph is complete. Persistent failure to list/watch HPA objects prevented that graph from ever being ready.
Closing, issue was due to incorrect upgrade procedure. https://github.com/openshift/openshift-docs/issues/10015 is open to improve the documentation for manual upgrade.
Need update on the progress of this Bug
Yes. As @liggitt pointed out above in the chain, foreground GC requires building a complete graph of all objects before removing the target object. Because an owner reference can come from any object to any object, all resources must be list/watchable and must be successfully list/watched before any foreground deletion is processed. Otherwise a reference could be missed and foreground deletion would fail.
Closing based on previous comment. To summarize the problem was that customer did not run migration during upgrades which resulted in old objects (in version not any more recognized by the server) to linger in etcd and blocking proper GC cycles. Solution is to downgrade server to previous version and run migration.