Description of problem: Operations cluster reporting out of space due to core files being generated. Attaching example created by relatively recent 3.7 install. [/var/log/origin] ls -rw-------. 1 root root 522891264 Aug 29 03:22 core.104479 Version-Release number of selected component (if applicable): oc v3.7.0-0.104.0 kubernetes v1.7.0+695f48a16f features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://internal.api.free-int.openshift.com:443 openshift v3.7.0-0.104.0 kubernetes v1.7.0+695f48a16f How reproducible: ? Steps to Reproduce: 1. Little to no load on this cluster at the time. Actual results: http://file.rdu.redhat.com/~jupierce/core/ core.104479.tgz - The gzipped core which was created core-logs.tgz - Logs from all masters Expected results: No cores
Quick update while I continue investigating. Here are the stacks of two goroutines obtained from the provided core dump: (dlv) goroutine 14266 bt 1000 0 0x000000000045b5b0 in runtime.systemstack_switch at /usr/lib/golang/src/runtime/asm_amd64.s:281 1 0x000000000042ef51 in runtime.dopanic at /usr/lib/golang/src/runtime/panic.go:579 2 0x000000000042f045 in runtime.throw at /usr/lib/golang/src/runtime/panic.go:596 3 0x000000000040ce41 in runtime.mapassign at /usr/lib/golang/src/runtime/hashmap.go:589 4 0x00000000010e64b5 in github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/api/v1.Convert_v1_Pod_To_api_Pod at /builddir/build/BUILD/atomic-openshift-git-0.c420cf9/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/api/v1/conversion.go:625 5 0x00000000038e2988 in github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/quota/evaluator/core.toInternalPodOrError at /builddir/build/BUILD/atomic-openshift-git-0.c420cf9/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/quota/evaluator/core/pods.go:200 6 0x00000000038e2a49 in github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/quota/evaluator/core.podMatchesScopeFunc at /builddir/build/BUILD/atomic-openshift-git-0.c420cf9/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/quota/evaluator/core/pods.go:213 7 0x00000000038ddf92 in github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/quota/generic.CalculateUsageStats at /builddir/build/BUILD/atomic-openshift-git-0.c420cf9/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/quota/generic/evaluator.go:104 8 0x00000000038e1e72 in github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/quota/evaluator/core.(*podEvaluator).UsageStats at /builddir/build/BUILD/atomic-openshift-git-0.c420cf9/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/quota/evaluator/core/pods.go:156 9 0x00000000038dd75b in github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/quota.CalculateUsage at /builddir/build/BUILD/atomic-openshift-git-0.c420cf9/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/quota/resources.go:241 10 0x0000000003b2006a in github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/resourcequota.(*ResourceQuotaController).syncResourceQuota at /builddir/build/BUILD/atomic-openshift-git-0.c420cf9/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/resourcequota/resource_quota_controller.go:304 11 0x0000000003b1fd0b in github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/resourcequota.(*ResourceQuotaController).syncResourceQuotaFromKey at /builddir/build/BUILD/atomic-openshift-git-0.c420cf9/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/resourcequota/resource_quota_controller.go:280 12 0x0000000003b21b7e in github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/resourcequota.(*ResourceQuotaController).(github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/resourcequota.syncResourceQuotaFromKey)-fm at /builddir/build/BUILD/atomic-openshift-git-0.c420cf9/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/resourcequota/resource_quota_controller.go:98 13 0x0000000003b2165c in github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/resourcequota.(*ResourceQuotaController).worker.func1 at /builddir/build/BUILD/atomic-openshift-git-0.c420cf9/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/resourcequota/resource_quota_controller.go:212 14 0x0000000003b2176b in github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/resourcequota.(*ResourceQuotaController).worker.func2 at /builddir/build/BUILD/atomic-openshift-git-0.c420cf9/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/resourcequota/resource_quota_controller.go:224 15 0x0000000000578ede in github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1 at /builddir/build/BUILD/atomic-openshift-git-0.c420cf9/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:97 16 0x00000000005784cd in github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil at /builddir/build/BUILD/atomic-openshift-git-0.c420cf9/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:98 17 0x000000000057839d in github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.Until at /builddir/build/BUILD/atomic-openshift-git-0.c420cf9/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:52 18 0x000000000045e191 in runtime.goexit at /usr/lib/golang/src/runtime/asm_amd64.s:2197 (dlv) goroutine 14260 bt 1000 0 0x00000000004b700c in reflect.Value.Field at /usr/lib/golang/src/reflect/value.go:757 1 0x000000000074c4b3 in encoding/json.(*decodeState).object at /usr/lib/golang/src/encoding/json/decode.go:708 2 0x000000000074aca4 in encoding/json.(*decodeState).value at /usr/lib/golang/src/encoding/json/decode.go:402 3 0x000000000074b663 in encoding/json.(*decodeState).array at /usr/lib/golang/src/encoding/json/decode.go:555 4 0x000000000074ac37 in encoding/json.(*decodeState).value at /usr/lib/golang/src/encoding/json/decode.go:399 5 0x000000000074a11a in encoding/json.(*decodeState).unmarshal at /usr/lib/golang/src/encoding/json/decode.go:184 6 0x0000000000749ad8 in encoding/json.Unmarshal at /usr/lib/golang/src/encoding/json/decode.go:104 7 0x00000000010e63d9 in github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/api/v1.Convert_v1_Pod_To_api_Pod at /builddir/build/BUILD/atomic-openshift-git-0.c420cf9/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/api/v1/conversion.go:629 8 0x00000000038e2988 in github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/quota/evaluator/core.toInternalPodOrError at /builddir/build/BUILD/atomic-openshift-git-0.c420cf9/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/quota/evaluator/core/pods.go:200 9 0x00000000038e2a49 in github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/quota/evaluator/core.podMatchesScopeFunc at /builddir/build/BUILD/atomic-openshift-git-0.c420cf9/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/quota/evaluator/core/pods.go:213 10 0x00000000038ddf92 in github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/quota/generic.CalculateUsageStats at /builddir/build/BUILD/atomic-openshift-git-0.c420cf9/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/quota/generic/evaluator.go:104 11 0x00000000038e1e72 in github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/quota/evaluator/core.(*podEvaluator).UsageStats at /builddir/build/BUILD/atomic-openshift-git-0.c420cf9/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/quota/evaluator/core/pods.go:156 12 0x00000000038dd75b in github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/quota.CalculateUsage at /builddir/build/BUILD/atomic-openshift-git-0.c420cf9/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/quota/resources.go:241 13 0x0000000003b2006a in github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/resourcequota.(*ResourceQuotaController).syncResourceQuota at /builddir/build/BUILD/atomic-openshift-git-0.c420cf9/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/resourcequota/resource_quota_controller.go:304 14 0x0000000003b1fd0b in github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/resourcequota.(*ResourceQuotaController).syncResourceQuotaFromKey at /builddir/build/BUILD/atomic-openshift-git-0.c420cf9/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/resourcequota/resource_quota_controller.go:280 15 0x0000000003b21b7e in github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/resourcequota.(*ResourceQuotaController).(github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/resourcequota.syncResourceQuotaFromKey)-fm at /builddir/build/BUILD/atomic-openshift-git-0.c420cf9/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/resourcequota/resource_quota_controller.go:98 16 0x0000000003b2165c in github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/resourcequota.(*ResourceQuotaController).worker.func1 at /builddir/build/BUILD/atomic-openshift-git-0.c420cf9/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/resourcequota/resource_quota_controller.go:212 17 0x0000000003b2176b in github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/resourcequota.(*ResourceQuotaController).worker.func2 at /builddir/build/BUILD/atomic-openshift-git-0.c420cf9/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/resourcequota/resource_quota_controller.go:224 18 0x0000000000578ede in github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1 at /builddir/build/BUILD/atomic-openshift-git-0.c420cf9/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:97 19 0x00000000005784cd in github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil at /builddir/build/BUILD/atomic-openshift-git-0.c420cf9/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:98 20 0x000000000057839d in github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.Until at /builddir/build/BUILD/atomic-openshift-git-0.c420cf9/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:52 21 0x000000000045e191 in runtime.goexit at /usr/lib/golang/src/runtime/asm_amd64.s:2197 My theory is a race between resource quota controllers when the component configuration ConcurrentResourceQuotaSyncs value is > 2 and shared informers are used. Each controller will: 1. Receive the same *api.Pod instance for processing 2. Pass the *api.Pod to Convert_v1_Pod_To_api_Pod 3. Panic on a concurrent map write to the *api.Pod (as Convert_v1_Pod_To_api_Pod unsafely mutates the input) I'll try to reproduce in a test.
The kube-controller-manager `--concurrent-resource-quota-syncs` flag (default: 5) controls the number of resource quota controller worker threads. In Kube 1.7, competing quota resource controller instances can crash due to concurrent write attempts during pod type conversion using an unsafe conversion function that mutates input[1]. The most obvious conditions for triggering the race would be the quota controllers analyzing pods with any of the following annotations: * PodInitContainersBetaAnnotationKey * PodInitContainersAnnotationKey * PodInitContainerStatusesBetaAnnotationKey * PodInitContainerStatusesAnnotationKey This list of conditions is just the ones I easily identified, and isn't assumed to be exhaustive. The problematic pod conversion mutations have been removed in Kube 1.8. However, it's not clear to me yet whether conversion functions should be generally considered thread-safe. If not, the controller code may need refactored to be more defensive in dealing with conversion functions. I'll open a discussion upstream. In the meantime, a temporary stabilizing workaround would be to reduce `--concurrent-resource-quota-syncs` to zero, at the cost of quota calculation performance. [1] https://github.com/kubernetes/kubernetes/blob/release-1.7/pkg/api/v1/conversion.go#L592
(In reply to Dan Mace from comment #2) > The kube-controller-manager `--concurrent-resource-quota-syncs` flag > (default: 5) controls the number of resource quota controller worker > threads. In Kube 1.7, competing quota resource controller instances can > crash due to concurrent write attempts during pod type conversion using an > unsafe conversion function that mutates input[1]. The most obvious > conditions for triggering the race would be the quota controllers analyzing > pods with any of the following annotations: > > * PodInitContainersBetaAnnotationKey > * PodInitContainersAnnotationKey > * PodInitContainerStatusesBetaAnnotationKey > * PodInitContainerStatusesAnnotationKey I failed to provide the serialized annotation key names to look for: * pod.beta.kubernetes.io/init-containers * pod.alpha.kubernetes.io/init-containers * pod.beta.kubernetes.io/init-container-statuses * pod.alpha.kubernetes.io/init-container-statuses
Upstream Kubernetes 1.7 PR: https://github.com/kubernetes/kubernetes/pull/52092
OpenShift PR: https://github.com/openshift/origin/pull/16241
*** Bug 1477233 has been marked as a duplicate of this bug. ***
Can't reproduce this issue with latest OCP 3.7, will verify it. openshift version openshift v3.7.0-0.127.0 kubernetes v1.7.0+80709908fd etcd 3.2.1
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188