Are we sure this is openshift-apiserver component?? It was originally under openshift-controller-manager component. Only reference I see to "scc-uid" RangeAllocation is in pkg/security/controller/namespace_scc_allocation_controller.go from openshift-controller-manager. I suspect problem might be in func (c *NamespaceSCCAllocationController) allocate(ns *corev1.Namespace) // do uid allocation. We reserve the UID we want first, lock it in etcd, then update the namespace. // We allocate by reading in a giant bit int bitmap (one bit per offset location), finding the next step, // then calculating the offset location I believe function above updates the bitmask in "scc-uid" RangeAllocation and it is only "scc-uid" RangeAllocation showing this behaviour. [1] https://github.com/openshift/openshift-controller-manager/blob/79fb7a5f3d8417766529150986eb5648bf65b733/pkg/security/controller/namespace_scc_allocation_controller.go#L139
There's a repair method being called upon restart, so maybe not ideal but I could suggest restarting kube-controller-manager pods which contains cluster-policy-controller. That repair method is responsible for scanning the current namespaces and fix the existing range allocations to match ns state. That's all I can suggest for now. I doubt we'll be able to fix it in the short term, from what I learned it's not an easy task to do right away, so we'll have to tackle this as a RFE if needed to, since that functionality never existed, other than that Repair I've mentioned above.
*** Bug 1730434 has been marked as a duplicate of this bug. ***
This is being actively worked on.
*** Bug 1850144 has been marked as a duplicate of this bug. ***
controller-manager logs: E0701 15:13:21.517402 1 runtime.go:78] Observed a panic: "assignment to entry in nil map" (assignment to entry in nil map) E0701 15:23:21.516920 1 runtime.go:78] Observed a panic: "assignment to entry in nil map" (assignment to entry in nil map) E0701 15:33:21.518902 1 runtime.go:78] Observed a panic: "assignment to entry in nil map" (assignment to entry in nil map) E0701 15:43:21.518227 1 runtime.go:78] Observed a panic: "assignment to entry in nil map" (assignment to entry in nil map) E0701 15:53:21.519810 1 runtime.go:78] Observed a panic: "assignment to entry in nil map" (assignment to entry in nil map) E0701 16:03:21.519510 1 runtime.go:78] Observed a panic: "assignment to entry in nil map" (assignment to entry in nil map) E0701 16:13:21.520017 1 runtime.go:78] Observed a panic: "assignment to entry in nil map" (assignment to entry in nil map) E0701 16:18:50.805760 1 runtime.go:78] Observed a panic: "assignment to entry in nil map" (assignment to entry in nil map) E0701 16:28:50.806520 1 runtime.go:78] Observed a panic: "assignment to entry in nil map" (assignment to entry in nil map) E0701 16:38:50.807027 1 runtime.go:78] Observed a panic: "assignment to entry in nil map" (assignment to entry in nil map) I0620 06:49:57.542182 1 ingress.go:294] Starting controller I0620 06:49:57.565701 1 factory.go:85] deploymentconfig controller caches are synced. Starting workers. E0620 06:49:57.567247 1 runtime.go:78] Observed a panic: "assignment to entry in nil map" (assignment to entry in nil map) goroutine 38416 [running]: github.com/openshift/openshift-controller-manager/vendor/k8s.io/apimachinery/pkg/util/runtime.logPanic(0x1ef6e00, 0x25b9310) /go/src/github.com/openshift/openshift-controller-manager/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0xa3 github.com/openshift/openshift-controller-manager/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) /go/src/github.com/openshift/openshift-controller-manager/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x82 panic(0x1ef6e00, 0x25b9310) /opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/panic.go:522 +0x1b5 github.com/openshift/openshift-controller-manager/vendor/github.com/openshift/library-go/pkg/apps/appsutil.SetCancelledByNewerDeployment(...) /go/src/github.com/openshift/openshift-controller-manager/vendor/github.com/openshift/library-go/pkg/apps/appsutil/util.go:313 github.com/openshift/openshift-controller-manager/pkg/apps/deploymentconfig.(*DeploymentConfigController).cancelRunningRollouts.func1(0x7f7d07243eb0, 0x0) /go/src/github.com/openshift/openshift-controller-manager/pkg/apps/deploymentconfig/deploymentconfig_controller.go:382 +0x19c github.com/openshift/openshift-controller-manager/vendor/k8s.io/client-go/util/retry.OnError.func1(0x2038500, 0x2221f01, 0xc00068a080) /go/src/github.com/openshift/openshift-controller-manager/vendor/k8s.io/client-go/util/retry/util.go:64 +0x3c github.com/openshift/openshift-controller-manager/vendor/k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff(0x989680, 0x4014000000000000, 0x3fb999999999999a, 0x4, 0x0, 0xc00068a080, 0x413798, 0x30) /go/src/github.com/openshift/openshift-controller-manager/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:292 +0x51 github.com/openshift/openshift-controller-manager/vendor/k8s.io/client-go/util/retry.OnError(0x989680, 0x4014000000000000, 0x3fb999999999999a, 0x4, 0x0, 0x234b3f8, 0xc0022ae480, 0xc0022ae450, 0xc0019bb548) /go/src/github.com/openshift/openshift-controller-manager/vendor/k8s.io/client-go/util/retry/util.go:63 +0xb2 github.com/openshift/openshift-controller-manager/vendor/k8s.io/client-go/util/retry.RetryOnConflict(...) /go/src/github.com/openshift/openshift-controller-manager/vendor/k8s.io/client-go/util/retry/util.go:83 github.com/openshift/openshift-controller-manager/pkg/apps/deploymentconfig.(*DeploymentConfigController).cancelRunningRollouts(0xc000642280, 0xc0008e87e0, 0xc00190c058, 0x1, 0x1, 0xc0021fc120, 0x1, 0x0) /go/src/github.com/openshift/openshift-controller-manager/pkg/apps/deploymentconfig/deploymentconfig_controller.go:364 +0x1c2 github.com/openshift/openshift-controller-manager/pkg/apps/deploymentconfig.(*DeploymentConfigController).Handle(0xc000642280, 0xc0008e87e0, 0x0, 0x0) /go/src/github.com/openshift/openshift-controller-manager/pkg/apps/deploymentconfig/deploymentconfig_controller.go:152 +0x2180 github.com/openshift/openshift-controller-manager/pkg/apps/deploymentconfig.(*DeploymentConfigController).work(0xc000642280, 0xc00044bb00) /go/src/github.com/openshift/openshift-controller-manager/pkg/apps/deploymentconfig/factory.go:222 +0x1f8 github.com/openshift/openshift-controller-manager/pkg/apps/deploymentconfig.(*DeploymentConfigController).worker(0xc000642280) /go/src/github.com/openshift/openshift-controller-manager/pkg/apps/deploymentconfig/factory.go:195 +0x2b github.com/openshift/openshift-controller-manager/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc0019b6050) /go/src/github.com/openshift/openshift-controller-manager/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152 +0x54 github.com/openshift/openshift-controller-manager/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0019b6050, 0x3b9aca00, 0x0, 0x1, 0xc00045c660) /go/src/github.com/openshift/openshift-controller-manager/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153 +0xf8 github.com/openshift/openshift-controller-manager/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0xc0019b6050, 0x3b9aca00, 0xc00045c660) /go/src/github.com/openshift/openshift-controller-manager/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d created by github.com/openshift/openshift-controller-manager/pkg/apps/deploymentconfig.(*DeploymentConfigController).Run /go/src/github.com/openshift/openshift-controller-manager/pkg/apps/deploymentconfig/factory.go:88 +0x1af I0620 06:49:57.590185 1 buildconfig_controller.go:200] Starting buildconfig controller I0620 06:49:57.630722 1 templateinstance_finalizer.go:193] Starting TemplateInstanceFinalizer controller I0620 06:49:57.639704 1 templateinstance_controller.go:296] Starting TemplateInstance controller
Apologies ignore my previous update it seems unrelated to this issue. No further logs to provide currently.
kube-controller-manager pod cluster-policy-controller container: 2020-06-20T06:52:49.42737378Z E0620 06:52:49.427336 1 namespace_scc_allocation_controller.go:334] error syncing namespace, it will be retried: the server could not find the requested resource (get rangeallocations.security.openshift.io scc-uid) 2020-06-20T06:52:49.433217038Z E0620 06:52:49.433190 1 namespace_scc_allocation_controller.go:334] error syncing namespace, it will be retried: the server could not find the requested resource (get rangeallocations.security.openshift.io scc-uid) 2020-06-30T20:52:51.274460847Z E0630 20:52:51.274409 1 namespace_scc_allocation_controller.go:334] error syncing namespace, it will be retried: Operation cannot be fulfilled on namespaces "calico-system": the object has been modified; please apply your changes to the latest version and try again 2020-07-01T10:04:21.311371413Z E0701 10:04:21.311315 1 namespace_scc_allocation_controller.go:334] error syncing namespace, it will be retried: uid range exceeded 2020-07-01T10:04:21.323634787Z E0701 10:04:21.323608 1 namespace_scc_allocation_controller.go:334] error syncing namespace, it will be retried: uid range exceeded 2020-07-01T10:04:21.339554799Z E0701 10:04:21.339525 1 namespace_scc_allocation_controller.go:334] error syncing namespace, it will be retried: uid range exceeded
Created attachment 1700107 [details] cluster-policy-controller log
Some of the retry errors will be fixed with https://bugzilla.redhat.com/show_bug.cgi?id=1829327 (4.4) https://bugzilla.redhat.com/show_bug.cgi?id=1829328 (4.3). The uid range error is a separate and it's being currently worked on.
I'm currently working on addressing the comments Clayton left on that PR.
Created attachment 1700506 [details] scc-uid rangeallocation scc-uid rangeAllocation when UID range exceeded issue affected cluster
Verified on 4.6.0-0.nightly-2020-07-25-091217 On 4.5.z after creating and deleting thousands of projects multiple times oc get rangeallocations scc-uid -o yaml | grep -o "/" | wc -l continuously increased. Running on 4.6.0-0.nightly-2020-07-25-091217, after deleting projects (oc get rangeallocations scc-uid -o yaml | grep -o "/" | wc -l) returned to 17 which was the pre-test number.
I have a case of this where the uid continually increases in a brand new 4.5 cluster. Initially ran into it right after cluster creation with 4.5.3 and would have to reset the count manually right away. Now a new 4.5.7 exhibiting the same behavior but the increase in uid count is a little slower. oc get rangeallocations scc-uid -o yaml | grep -o "/" | wc -l 4882 oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.7 True False 141m Cluster version is 4.5.7
> I have a case of this where the uid continually increases in a brand new 4.5 > cluster. Initially ran into it right after cluster creation with 4.5.3 and > would have to reset the count manually right away. Now a new 4.5.7 > exhibiting the same behavior but the increase in uid count is a little > slower. The increase will always happen as that's how the mechanism works, it will only release the not-used ones once they are release, iow. the namespace/project is removed.
So what is the solution for a brand new cluster encountering this? The stats I posted are typical and the odd time (1/10) it will be north of 10k and nothing can be deployed on the cluster.
(In reply to aamirian from comment #38) > So what is the solution for a brand new cluster encountering this? The stats > I posted are typical and the odd time (1/10) it will be north of 10k and > nothing can be deployed on the cluster. This was backported to all previous versions, so I'd suggest upgrading cluster.
backported to which version? We are running 4.5.7 and the issue still exists.
4.5 backport landed in 4.5.5 [1], so open a new for 4.5.7 issues? [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1858798#c8
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196