Bug 1724346
| Summary: | Install failure flake - timed out waiting for etcd - delayed DNS record creation? | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Mark McLoughlin <markmc> |
| Component: | Node | Assignee: | Ryan Phillips <rphillips> |
| Status: | CLOSED ERRATA | QA Contact: | Sunil Choudhary <schoudha> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.2.0 | CC: | amurdaca, aos-bugs, jokerman, kgarriso, mmccomas, sjenning, xtian |
| Target Milestone: | --- | ||
| Target Release: | 4.2.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-10-16 06:32:41 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Mark McLoughlin
2019-06-26 20:23:00 UTC
From Sam on slack: looks like it might be dns? creates records what feels really late. https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/3149/artifacts/e2e-aws-upgrade/installer/.openshift_install.log time="2019-06-26T11:37:42Z" level=debug msg="2019/06/26 11:37:42 [DEBUG] Resource instance state not found for node \"module.dns.aws_route53_record.etcd_a_nodes[0]\", instance module.dns.aws_route53_record.etcd_a_nodes[0]" then 8 mins later time="2019-06-26T11:45:02Z" level=debug msg="module.dns.aws_route53_record.etcd_a_nodes[0]: Creation complete after 48s [id=Z068822582SCO3XXVJLR_etcd-0.ci-op-0dr21wv1-77109.origin-ci-int-aws.dev.rhcloud.com_A]" I found a bit more in the bootstrap must gather that might be of more interest. curl https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/3149/artifacts/e2e-aws-upgrade/installer/log-bundle-20190626121617.tar | gunzip lines 1079->38678 Jun 26 11:43:55 ip-10-0-12-118 systemd[1]: Started Kubernetes Kubelet. Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: E0626 11:43:55.775558 1316 kubelet.go:1282] Image garbage collection failed once. Stats initialization may not have completed yet: failed to get imageFs info: unable to find data in memory cache Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: W0626 11:43:55.775599 1316 kubelet.go:1394] No api server defined - no node status update will be sent. Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: I0626 11:43:55.776681 1316 fs_resource_analyzer.go:64] Starting FS ResourceAnalyzer Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: I0626 11:43:55.777382 1316 status_manager.go:148] Kubernetes client is nil, not starting status manager. Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: I0626 11:43:55.797933 1316 server.go:141] Starting to listen on 0.0.0.0:10250 Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: I0626 11:43:55.801752 1316 server.go:347] Adding debug handlers to kubelet server. Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: E0626 11:43:55.813077 1316 runtime.go:69] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: /builddir/build/BUILD/openshift-git-0.04ae0f4/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:76 Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: /builddir/build/BUILD/openshift-git-0.04ae0f4/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65 Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: /builddir/build/BUILD/openshift-git-0.04ae0f4/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51 Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: /usr/lib/golang/src/runtime/asm_amd64.s:522 Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: /usr/lib/golang/src/runtime/panic.go:513 Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: /builddir/build/BUILD/openshift-git-0.04ae0f4/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/client-go/tools/cache/reflector.go:189 Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: /builddir/build/BUILD/openshift-git-0.04ae0f4/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/client-go/tools/cache/reflector.go:214 Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: /builddir/build/BUILD/openshift-git-0.04ae0f4/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/client-go/tools/cache/reflector.go:125 Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: /builddir/build/BUILD/openshift-git-0.04ae0f4/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152 Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: /builddir/build/BUILD/openshift-git-0.04ae0f4/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153 Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: /builddir/build/BUILD/openshift-git-0.04ae0f4/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: /builddir/build/BUILD/openshift-git-0.04ae0f4/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/client-go/tools/cache/reflector.go:124 Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: /builddir/build/BUILD/openshift-git-0.04ae0f4/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/client-go/tools/cache/controller.go:122 Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: /builddir/build/BUILD/openshift-git-0.04ae0f4/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:54 Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: /builddir/build/BUILD/openshift-git-0.04ae0f4/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71 Jun 26 11:43:55 ip-10-0-12-118 hyperkube[1316]: /usr/lib/golang/src/runtime/asm_amd64.s:1333 A little later this starts looping as part of above. Jun 26 11:48:35 ip-10-0-12-118 hyperkube[1316]: E0626 11:48:35.805079 1316 remote_runtime.go:332] ContainerStatus of "9af3a36f500219ad37fad4cf1a286f900bb2b8f315a4cf7364687a9b14d947cc" failed: Image is not set bootstrap logs from this job aren't about "Image is not set" https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/901/pull-ci-openshift-machine-config-operator-master-e2e-aws/4112 but have same panic Looks like a kubelet issue from logs, assigning Node team https://github.com/kubernetes/kubernetes/issues/76075 Upstream issue is from kubeadm but it looks like a generic issue with 1.14 kubelet (logs match also) Fixed as of RHCOS build 420.8.20190708.1 My fault... Moved into the wrong state. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922 |