Bug 1730434
Summary: | api.ci ran out of UID ranges to assign to newly created namespaces | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Petr Muller <pmuller> |
Component: | apiserver-auth | Assignee: | Stefan Schimanski <sttts> |
Status: | CLOSED DUPLICATE | QA Contact: | Wei Sun <wsun> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 3.11.0 | CC: | aos-bugs, ccoleman, maszulik, mfojtik, nagrawal, padillon, skuznets |
Target Milestone: | --- | ||
Target Release: | 3.11.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-06-04 15:03:49 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Petr Muller
2019-07-16 17:40:15 UTC
This problem reoccurred today. This is a failed test from the first failing PR I see: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-4.3/750 Hongkai Liu restarted the master controllers and now tests are running. Slack thread: https://coreos.slack.com/archives/CEKNRGF25/p1578188048085100 This bug hasn't had any engineering activity in the last ~30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale". If you have further information on the current state of the bug, please update it and remove the "LifecycleStale" keyword, otherwise this bug will be automatically closed in 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Apparently this is still a problem happening on api.ci cluster, last logged occurence seems to be Patrick's comment above on 2020-05-01. Heh, Patrick's comment was actually January, not May, but because the world like its stories, we just had an occurrence *today*. This happens when a namespace is created and deleted. CI creates ~12-14k namespaces per week. The default config is 100k namespaces can be uniquely allocated. Therefore this will fail every 8 weeks. If you restart the controller, it restarts the clock. If this is not fixed in 4.4... it needs to be fixed. Bumping both urgency and severity because this is a "cluster stops working after 8 weeks". This may not be marked stale. I don't think we'll be fixing this in 3.11 and there's already bug 1808588 which is tracking that exact same thing. I'm closing this as a duplicate. The quick and dirty workaround up to 4.5 is to restart kube-controller-manager, which triggers the repair method responsible for cleaning the unused ranges. *** This bug has been marked as a duplicate of bug 1808588 *** |