Description of problem: Resources sync among multiple clusters is very slow Version-Release number of selected component (if applicable): KubeFed controller-manager version: version.Info{Version:"v4.2.0", GitCommit:"b8ae65cee603cc9c746911debd3dc23b922222d8", GitTreeState:"clean", BuildDate:"2019-08-08T18:58:34Z", GoVersion:"go1.12.6", Compiler:"gc", Platform:"linux/amd64"} kubefedctl version: version.Info{Version:"v4.2.0", GitCommit:"b8ae65cee603cc9c746911debd3dc23b922222d8", GitTreeState:"clean", BuildDate:"2019-08-08T18:18:54Z", GoVersion:"go1.12.6", Compiler:"gc", Platform:"linux/amd64"} How reproducible: 100% Steps to Reproduce: 1. Deployed a cluster scoped kubefed controller manager 2. Added to clusters in the federation $ oc get kubefedcluster -n openshift-operators NAME READY AGE cluster1 True 77m cluster2 True 77m 3. Sync a namespace to both clusters $ oc create ns test-namespace $ kubefedctl federate ns test-namespace -e --kubefed-namespace=openshift-operators 4. Sync a secret to both clusters(it's not slow) $ oc create -f secret.yaml -n test-namespace $ cat secret.yaml apiVersion: v1 data: token: xxxx kind: Secret metadata: name: kubefed type: Opaque $ kubefedctl federate secret -n test-namespace -e --kubefed-namespace=openshift-operators $ oc get federatedsecret -n test-namespace --context=cluster1 NAME AGE kubefed 66m $ for c in cluster1 cluster2; do oc get secret kubefed -n test-namespace --context=$c; done NAME TYPE DATA AGE kubefed Opaque 1 71m NAME TYPE DATA AGE kubefed Opaque 1 66m 5. Sync another secret to both cluster(it's slow, about 5 minutes, the secret is synced to cluster2) $ oc create ns nofed-namespace $ oc create -f secret.yaml -n nofed-namespace $ kubfedctl federate secret -n nofed-namespace --kubefed-namespace=openshift-operators $ kubefedctl federated ns nofed-namespace --kubefed-namespace=openshift-operators $ oc get ns|grep nofed-namespace nofed-namespace Active 52m $ oc get federatednamespace -n nofed-namespace NAME AGE nofed-namespace 40m $ oc get ns nofed-namespace --context=cluster2 NAME STATUS AGE nefod-namespace Active 40m $ oc get federatedsecret -n nofed-namespace NAME AGE kubefed 52m $ for c in cluster1 cluster2; do oc get secret kubefed -n nofed-namespace --context=$c; done NAME TYPE DATA AGE kubefed Opaque 1 55m NAME TYPE DATA AGE kubefed Opaque 1 35m 6. Sync secrets resource again $ oc get federatedsecret kubefed-test -n nofed-namespace NAME AGE kubefed-test 41m $ for c in cluster1 cluster2; do oc get secret kubefed-test -n nofed-namespace --context=$c; done NAME TYPE DATA AGE kubefed-test Opaque 1 41m NAME TYPE DATA AGE kubefed-test Opaque 1 7m29s (this resource is synced after about 33 minutes) $ oc get federatedsecret -n test1 NAME AGE kubefed 43m $ for c in cluster1 cluster2; do oc get secret kubefed -n test1 --context=$c; done (secret resources are synced to the target clusters after 40 minutes) NAME TYPE DATA AGE kubefed Opaque 1 2m26s NAME TYPE DATA AGE kubefed Opaque 1 2m29s Actual results: We can see the secrets resources are synced to the target clusters is slower and slower. Expected results: Additional info: Seems no helpful info in the kubfed controller manager log. $ oc logs kubefed-controller-manager-54f578454b-v96zg -n openshift-operators|grep "test1\/kubefed" I0809 08:48:04.473679 1 controller.go:268] Starting to reconcile FederatedSecret "test1/kubefed" I0809 08:48:04.473688 1 controller.go:276] Ensuring finalizer exists on FederatedSecret "test1/kubefed" I0809 08:48:04.473697 1 controller.go:638] Adding finalizer kubefed.io/sync-controller to FederatedSecret "test1/kubefed" I0809 08:48:04.482318 1 controller.go:303] Syncing Secret "test1/kubefed" in underlying clusters, selected clusters are: map[cluster1:{} cluster2:{}] I0809 08:48:09.147178 1 controller.go:283] Finished reconciling FederatedSecret "test1/kubefed" (duration: 93ns) I0809 08:48:37.134601 1 controller.go:268] Starting to reconcile FederatedSecret "test1/kubefed" I0809 08:48:37.134606 1 controller.go:276] Ensuring finalizer exists on FederatedSecret "test1/kubefed" I0809 08:48:37.134710 1 controller.go:303] Syncing Secret "test1/kubefed" in underlying clusters, selected clusters are: map[cluster1:{} cluster2:{}] I0809 08:48:41.819347 1 controller.go:283] Finished reconciling FederatedSecret "test1/kubefed" (duration: 68ns) I0809 08:48:46.456534 1 controller.go:268] Starting to reconcile FederatedSecret "test1/kubefed" I0809 08:48:46.456559 1 controller.go:276] Ensuring finalizer exists on FederatedSecret "test1/kubefed" I0809 08:48:46.456676 1 controller.go:303] Syncing Secret "test1/kubefed" in underlying clusters, selected clusters are: map[cluster1:{} cluster2:{}] I0809 08:48:51.101569 1 controller.go:283] Finished reconciling FederatedSecret "test1/kubefed" (duration: 136ns) I0809 08:48:55.740040 1 controller.go:268] Starting to reconcile FederatedSecret "test1/kubefed" I0809 08:48:55.740048 1 controller.go:276] Ensuring finalizer exists on FederatedSecret "test1/kubefed" I0809 08:48:55.740177 1 controller.go:303] Syncing Secret "test1/kubefed" in underlying clusters, selected clusters are: map[cluster1:{} cluster2:{}] I0809 08:49:00.418007 1 controller.go:283] Finished reconciling FederatedSecret "test1/kubefed" (duration: 100ns)
I am able to replicate the behavior on the test clusters Ping was using: - a FederatedSecret named `test-ns2/kubefed` is propagating fine, and changes to the template or placement are propagated within ~30s (the host is very slow) - I created a new FederatedSecret named `test-ns2/kubefed-maru` copied from the `test-ns2/kubefed` secret and it took ~53m to be noticed by the controller manager This suggests that the problem is not with the controller but the watch. Updates to existing resources are being noticed in a timely manner by KubeFed controllers, but new resources are only noticed after a considerable interval. That suggests to me that KubeFed watches are somehow misconfigured or that watches are somehow broken on the build of OCP being used for testing.
The problem was not the watch (i.e. select is not broken: https://blog.codinghorror.com/the-first-rule-of-programming-its-always-your-fault/). KubeFed was performing unnecessary updates and this was bogging down the cluster. A fix has been proposed upstream: https://github.com/kubernetes-sigs/kubefed/pull/1099
Ping, The fix should be available in the latest downstream image.
Verified with KubeFed controller-manager version: version.Info{Version:"v4.2.0", GitCommit:"7f002471b9dd8366e1e0f080b46bc79864682f71", GitTreeState:"clean", BuildDate:"2019-08-25T20:09:07Z", GoVersion:"go1.12.6", Compiler:"gc", Platform:"linux/amd64"}
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922