Bug 1508716
| Summary: | [master_public_1033]Can't release the used quota when delete the resource | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | zhou ying <yinzhou> |
| Component: | Master | Assignee: | Dan Mace <dmace> |
| Status: | CLOSED ERRATA | QA Contact: | Wang Haoran <haowang> |
| Severity: | high | Docs Contact: | |
| Priority: | medium | ||
| Version: | unspecified | CC: | aos-bugs, jokerman, mfojtik, mmccomas |
| Target Milestone: | --- | ||
| Target Release: | 3.9.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-01-30 15:10:24 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
zhou ying
2017-11-02 02:59:38 UTC
Can't reproduce this issue every time. [root@ip-172-18-11-218 kubernetes]# cluster/kubectl.sh get po No resources found. [root@ip-172-18-11-218 kubernetes]# cluster/kubectl.sh describe quota Name: test Namespace: default Resource Used Hard -------- ---- ---- count/deployments.extensions 0 2 count/pods 2 3 count/replicasets.extensions 0 4 count/secrets 1 4 Are you sure the pods were still not terminating? Derek Carr:
Yes, please see the "https://bugzilla.redhat.com/show_bug.cgi?id=1508716#c2", when get pod, couldn't see any pod, but the used quota not released. But when the cluster restart , can't reproduce it.
I was able to reproduce suspicious behavior on master with a simpler setup:
$ openshift start master ...
$ oc new-project test
$ oc create quota cm --hard=count/configmaps=1
$ oc create cm test --from-literal=foo=bar && oc delete cm/test && oc describe quota
Repeat the `oc create cm test ...` command until quota becomes stale; the user will be overcharged for what seems like forever:
$ oc get cm && oc describe quota
No resources found.
Name: cm
Namespace: test
Resource Used Hard
-------- ---- ----
count/configmaps 1 1
Restarting the master corrects the issue. Setting the quota resync period to something shorter had no apparent effect:
kubernetesMasterConfig:
controllerArguments:
resource-quota-sync-period:
- "1m"
I'll begin debugging.
There are actually several confounding issues here which need untangled: 1. Origin is starting up two ResourceQuotaController instances (one in its own controller manager, and another via kube's controller manager) 2. The ResourceQuotaController has a worker pool deadlock condition I just discovered, related to periodic discovery resync 3. The Origin ResourceQuotaController instance isn't setting up periodic discovery resync, avoiding the deadlock issue (but breaking discovery sync) 4. The other ResourceQuotaController instance via the kube controller manager has discovery resync enabled, triggering the deadlock bug, and often the kube ResourceQuotaController will steal all the work from the Origin instance (as they share informers) There will probably be at least three things to do from here: 1. Fix the upstream ResourceQuotaController deadlock issue 2. Fix ResourceQuotaController bootstrapping in origin so we're only starting one controller 3. Ensure ResourceQuotaController discovery sync is enabled during bootstrapping for the one controller we decide to start I'll provide updates here with origin and kube issues/PRs as I get them sorted out. Upstream PR to fix the deadlock: https://github.com/kubernetes/kubernetes/pull/58107 Upstream 1.9 backport: https://github.com/kubernetes/kubernetes/pull/58158 (In reply to Dan Mace from comment #8) > There are actually several confounding issues here which need untangled: > > 1. Origin is starting up two ResourceQuotaController instances (one in its > own controller manager, and another via kube's controller manager) Turns out this is normal. > 3. The Origin ResourceQuotaController instance isn't setting up periodic > discovery resync, avoiding the deadlock issue (but breaking discovery sync) Turns out sync isn't necessary for origin's controller (as the upstream controller already handles custom resources via discovery sync). So, the only work product resulting from this bug will be the deadlock fix. Verified with: openshift v3.9.0-0.42.0 kubernetes v1.9.1+a0ce1bc657 etcd 3.2.8 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0098 |