Bug 1747246
| Summary: | [osp] machine-api-controller pod stuck in CrashLoopBackOff | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | sunzhaohua <zhsun> |
| Component: | Cloud Compute | Assignee: | Andrew McDermott <amcdermo> |
| Status: | CLOSED ERRATA | QA Contact: | sunzhaohua <zhsun> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.2.0 | CC: | agarcial, jhou |
| Target Milestone: | --- | ||
| Target Release: | 4.2.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | osp | ||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-10-16 06:38:50 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Verified. clusterversion: 4.2.0-0.nightly-2019-09-02-172410 Didn't met this issue again, mark as verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922 |
Description of problem: Keep a cluster running for a while, machine-api-controller pod stuck in CrashLoopBackOff. Have met this in 2 clusters. Version-Release number of selected component (if applicable): 4.2.0-0.nightly-2019-08-14-211610 How reproducible: Sometimes Steps to Reproduce: 1. Running a cluster for a while 2. Check machine-api-controller pod Actual results: machine-api-controller pod stuck in CrashLoopBackOff. $ oc get pod NAME READY STATUS RESTARTS AGE cluster-autoscaler-operator-fff44d57f-vrgcp 1/1 Running 0 17h machine-api-controllers-5fb9f8f668-gw6xt 3/4 CrashLoopBackOff 46 17h machine-api-operator-6f89c74764-khxbr 1/1 Running 1 17h $ oc edit pod machine-api-controllers-5fb9f8f668-gw6xt name: nodelink-controller ready: false restartCount: 47 state: waiting: message: Back-off 5m0s restarting failed container=nodelink-controller pod=machine-api-controllers-5fb9f8f668-gw6xt_openshift-machine-api(3d500681-c9ab-11e9-ad00-fa163ea99686) reason: CrashLoopBackOff $ oc logs -f machine-api-controllers-5fb9f8f668-gw6xt -c nodelink-controller I0829 08:47:03.348079 1 nodelink_controller.go:92] Adding internal IP "192.168.0.34" for node "preserve-groupg-4cf4r-master-1" to indexer I0829 08:47:03.396320 1 nodelink_controller.go:188] Reconciling Node /preserve-groupg-4cf4r-worker-f6ffp I0829 08:47:03.396606 1 nodelink_controller.go:409] Finding machine from node "preserve-groupg-4cf4r-worker-f6ffp" I0829 08:47:03.396659 1 nodelink_controller.go:426] Finding machine from node "preserve-groupg-4cf4r-worker-f6ffp" by ProviderID I0829 08:47:03.396721 1 nodelink_controller.go:449] Finding machine from node "preserve-groupg-4cf4r-worker-f6ffp" by IP I0829 08:47:03.396764 1 nodelink_controller.go:454] Found internal IP for node "preserve-groupg-4cf4r-worker-f6ffp": "192.168.0.26" E0829 08:47:03.397032 1 runtime.go:69] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) /go/src/github.com/openshift/machine-api-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:76 /go/src/github.com/openshift/machine-api-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65 /go/src/github.com/openshift/machine-api-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51 /opt/rh/go-toolset-1.11/root/usr/lib/go-toolset-1.11-golang/src/runtime/asm_amd64.s:522 /opt/rh/go-toolset-1.11/root/usr/lib/go-toolset-1.11-golang/src/runtime/panic.go:513 /opt/rh/go-toolset-1.11/root/usr/lib/go-toolset-1.11-golang/src/runtime/panic.go:82 /opt/rh/go-toolset-1.11/root/usr/lib/go-toolset-1.11-golang/src/runtime/signal_unix.go:390 /go/src/github.com/openshift/machine-api-operator/vendor/k8s.io/apimachinery/pkg/apis/meta/v1/meta.go:135 /go/src/github.com/openshift/machine-api-operator/pkg/controller/nodelink/nodelink_controller.go:205 /go/src/github.com/openshift/machine-api-operator/pkg/controller/nodelink/nodelink_controller.go:205 /go/src/github.com/openshift/machine-api-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:210 /go/src/github.com/openshift/machine-api-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158 /go/src/github.com/openshift/machine-api-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152 /go/src/github.com/openshift/machine-api-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153 /go/src/github.com/openshift/machine-api-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 /opt/rh/go-toolset-1.11/root/usr/lib/go-toolset-1.11-golang/src/runtime/asm_amd64.s:1333 panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x10753f7] goroutine 438 [running]: github.com/openshift/machine-api-operator/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) /go/src/github.com/openshift/machine-api-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x108 panic(0x11c6a80, 0x1fee7d0) /opt/rh/go-toolset-1.11/root/usr/lib/go-toolset-1.11-golang/src/runtime/panic.go:513 +0x1b9 github.com/openshift/machine-api-operator/vendor/k8s.io/apimachinery/pkg/apis/meta/v1.(*ObjectMeta).GetName(...) /go/src/github.com/openshift/machine-api-operator/vendor/k8s.io/apimachinery/pkg/apis/meta/v1/meta.go:135 github.com/openshift/machine-api-operator/pkg/controller/nodelink.(*ReconcileNodeLink).findMachineFromNode(0xc000443230, 0xc0003d6580, 0xc0000c6008, 0x0, 0x0) /go/src/github.com/openshift/machine-api-operator/pkg/controller/nodelink/nodelink_controller.go:420 +0x267 github.com/openshift/machine-api-operator/pkg/controller/nodelink.(*ReconcileNodeLink).Reconcile(0xc000443230, 0x0, 0x0, 0xc000044e40, 0x22, 0x2001c40, 0xc0001ced50, 0x444e37, 0x8) /go/src/github.com/openshift/machine-api-operator/pkg/controller/nodelink/nodelink_controller.go:205 +0x324 github.com/openshift/machine-api-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0000ce1e0, 0x0) /go/src/github.com/openshift/machine-api-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:210 +0x17d github.com/openshift/machine-api-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1() /go/src/github.com/openshift/machine-api-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158 +0x36 github.com/openshift/machine-api-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc000514ac0) /go/src/github.com/openshift/machine-api-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152 +0x54 github.com/openshift/machine-api-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000514ac0, 0x3b9aca00, 0x0, 0x13b2201, 0xc0003d8120) /go/src/github.com/openshift/machine-api-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153 +0xbe github.com/openshift/machine-api-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0xc000514ac0, 0x3b9aca00, 0xc0003d8120) /go/src/github.com/openshift/machine-api-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d created by github.com/openshift/machine-api-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start /go/src/github.com/openshift/machine-api-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:157 +0x32a Expected results: machine-api-controller pod work normally. Additional info: