Description of problem: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/3149 Jun 26 12:15:06 ip-10-0-12-118 bootkube.sh[1389]: https://etcd-2.ci-op-0dr21wv1-77109.origin-ci-int-aws.dev.rhcloud.com:2379 is unhealthy: failed to connect: dial tcp 10.0.165.46:2379: connect: connection refused Jun 26 12:15:06 ip-10-0-12-118 bootkube.sh[1389]: https://etcd-0.ci-op-0dr21wv1-77109.origin-ci-int-aws.dev.rhcloud.com:2379 is unhealthy: failed to connect: dial tcp 10.0.130.178:2379: connect: connection refused Jun 26 12:15:06 ip-10-0-12-118 bootkube.sh[1389]: https://etcd-1.ci-op-0dr21wv1-77109.origin-ci-int-aws.dev.rhcloud.com:2379 is unhealthy: failed to connect: dial tcp 10.0.145.70:2379: connect: connection refused Jun 26 12:15:06 ip-10-0-12-118 bootkube.sh[1389]: Error: unhealthy cluster Version-Release number of the following components: 4.2.0-0.nightly-2019-06-26-111723 How reproducible: Only saw one example of this as today's build cop
From Sam on slack: looks like it might be dns? creates records what feels really late. https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/3149/artifacts/e2e-aws-upgrade/installer/.openshift_install.log time="2019-06-26T11:37:42Z" level=debug msg="2019/06/26 11:37:42 [DEBUG] Resource instance state not found for node \"module.dns.aws_route53_record.etcd_a_nodes[0]\", instance module.dns.aws_route53_record.etcd_a_nodes[0]" then 8 mins later time="2019-06-26T11:45:02Z" level=debug msg="module.dns.aws_route53_record.etcd_a_nodes[0]: Creation complete after 48s [id=Z068822582SCO3XXVJLR_etcd-0.ci-op-0dr21wv1-77109.origin-ci-int-aws.dev.rhcloud.com_A]"
I found a bit more in the bootstrap must gather that might be of more interest. curl https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/3149/artifacts/e2e-aws-upgrade/installer/log-bundle-20190626121617.tar | gunzip lines 1079->38678 Jun 26 11:43:55 ip-10-0-12-118 systemd[1]: Started Kubernetes Kubelet. Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: E0626 11:43:55.775558 1316 kubelet.go:1282] Image garbage collection failed once. Stats initialization may not have completed yet: failed to get imageFs info: unable to find data in memory cache Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: W0626 11:43:55.775599 1316 kubelet.go:1394] No api server defined - no node status update will be sent. Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: I0626 11:43:55.776681 1316 fs_resource_analyzer.go:64] Starting FS ResourceAnalyzer Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: I0626 11:43:55.777382 1316 status_manager.go:148] Kubernetes client is nil, not starting status manager. Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: I0626 11:43:55.797933 1316 server.go:141] Starting to listen on 0.0.0.0:10250 Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: I0626 11:43:55.801752 1316 server.go:347] Adding debug handlers to kubelet server. Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: E0626 11:43:55.813077 1316 runtime.go:69] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: /builddir/build/BUILD/openshift-git-0.04ae0f4/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:76 Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: /builddir/build/BUILD/openshift-git-0.04ae0f4/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65 Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: /builddir/build/BUILD/openshift-git-0.04ae0f4/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51 Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: /usr/lib/golang/src/runtime/asm_amd64.s:522 Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: /usr/lib/golang/src/runtime/panic.go:513 Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: /builddir/build/BUILD/openshift-git-0.04ae0f4/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/client-go/tools/cache/reflector.go:189 Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: /builddir/build/BUILD/openshift-git-0.04ae0f4/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/client-go/tools/cache/reflector.go:214 Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: /builddir/build/BUILD/openshift-git-0.04ae0f4/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/client-go/tools/cache/reflector.go:125 Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: /builddir/build/BUILD/openshift-git-0.04ae0f4/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152 Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: /builddir/build/BUILD/openshift-git-0.04ae0f4/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153 Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: /builddir/build/BUILD/openshift-git-0.04ae0f4/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: /builddir/build/BUILD/openshift-git-0.04ae0f4/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/client-go/tools/cache/reflector.go:124 Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: /builddir/build/BUILD/openshift-git-0.04ae0f4/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/client-go/tools/cache/controller.go:122 Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: /builddir/build/BUILD/openshift-git-0.04ae0f4/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:54 Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: /builddir/build/BUILD/openshift-git-0.04ae0f4/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71 Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: /usr/lib/golang/src/runtime/asm_amd64.s:1333
A little later this starts looping as part of above. Jun 26 11:48:35 ip-10-0-12-118 hyperkube[1316]: E0626 11:48:35.805079 1316 remote_runtime.go:332] ContainerStatus of "9af3a36f500219ad37fad4cf1a286f900bb2b8f315a4cf7364687a9b14d947cc" failed: Image is not set
bootstrap logs from this job aren't about "Image is not set" https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/901/pull-ci-openshift-machine-config-operator-master-e2e-aws/4112 but have same panic
Looks like a kubelet issue from logs, assigning Node team https://github.com/kubernetes/kubernetes/issues/76075
Upstream issue is from kubeadm but it looks like a generic issue with 1.14 kubelet (logs match also)
Fixed as of RHCOS build 420.8.20190708.1
My fault... Moved into the wrong state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922