Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1389770

Summary:

Node panics repeatedly with unkeyable object error after migrating storage etcd2->etcd3 with node up

Product:

OpenShift Container Platform

Reporter:

Mike Fiedler <mifiedle>

Component:

Networking

Assignee:

Dan Williams <dcbw>

Status:

CLOSED ERRATA

QA Contact:

Mike Fiedler <mifiedle>

Severity:

high

Docs Contact:

Priority:

medium

Version:

3.4.0

CC:

aos-bugs, bbennett, bmeng, jokerman, mifiedle, mmccomas, tdawson, tstclair, wmeng

Target Milestone:

---

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

No Doc Update

Doc Text:

undefined

Story Points:

---

Clone Of:

Environment:

Last Closed:

2017-01-18 12:47:45 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Node log	none

Description Mike Fiedler 2016-10-28 14:25:58 UTC

Created attachment 1215014 [details]
Node log

Description of problem:

After migrating etcd storage from V2 to V3 and configuring the API servers to use storage-backend=etcd3, the nodes (which were not stopped during this time) started panic-ing repeatedly when the api servers came back up.

Sample:

Oct 27 19:59:52 localhost atomic-openshift-node: E1027 19:59:52.799799   16576 runtime.go:64] Observed a panic: "unkeyable object: {svt664 &TypeMeta{Kind:,APIVersion:,}}, object has no meta: object does not implement the Object interfaces" (unkeyable object: {svt664 &TypeMeta{Kind:,APIVersion:,}}, object has no meta: object does not implement the Object interfaces)
Oct 27 19:59:52 localhost atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.cc70b72/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/runtime/runtime.go:70
Oct 27 19:59:52 localhost atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.cc70b72/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/runtime/runtime.go:63
Oct 27 19:59:52 localhost atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.cc70b72/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/runtime/runtime.go:49
Oct 27 19:59:52 localhost atomic-openshift-node: /usr/lib/golang/src/runtime/asm_amd64.s:479
Oct 27 19:59:52 localhost atomic-openshift-node: /usr/lib/golang/src/runtime/panic.go:458
Oct 27 19:59:52 localhost atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.cc70b72/_output/local/go/src/github.com/openshift/origin/pkg/sdn/plugin/eventqueue.go:187
Oct 27 19:59:52 localhost atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.cc70b72/_output/local/go/src/github.com/openshift/origin/pkg/sdn/plugin/eventqueue.go:34
Oct 27 19:59:52 localhost atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.cc70b72/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache/delta_fifo.go:573
Oct 27 19:59:52 localhost atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.cc70b72/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache/delta_fifo.go:312
Oct 27 19:59:52 localhost atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.cc70b72/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache/delta_fifo.go:490
Oct 27 19:59:52 localhost atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.cc70b72/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:343
Oct 27 19:59:52 localhost atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.cc70b72/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:271
Oct 27 19:59:52 localhost atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.cc70b72/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:202
Oct 27 19:59:52 localhost atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.cc70b72/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/wait/wait.go:88
Oct 27 19:59:52 localhost atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.cc70b72/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/wait/wait.go:89
Oct 27 19:59:52 localhost atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.cc70b72/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/wait/wait.go:49
Oct 27 19:59:52 localhost atomic-openshift-node: /usr/lib/golang/src/runtime/asm_amd64.s:2086


I let it run for about 10 minutes and there was no recovery.   I'll run again and see if there is eventual recovery.


Version-Release number of selected component (if applicable): 3.4.0.16


How reproducible: always


Steps to Reproduce:
1.  Install an HA cluster (3 masters, 3 etcd) with OCP 3.4.0.16 + etcd 2.3.7
2.  Create projects with running deployments
3.  Shutdown masters and etcd.   Leave OpenShift nodes running.
4.  On each etcd:  yum swap etcd3 etcd to install etcd3 3.0.12-3. 
5.  On each etcd:  etcdctl migrate --data-dir /var/lib/etcd
6.  Start etcd on each 
7.  Start OpenShift masters

Actual results:

Nodes will get repeated panics (see above).  Cluster is inoperable - no operations involving nodes work.

Expected results:

Nodes recover and re-set their watches/lists when an etcd API version change occurs without having to restart the entire cluster.

Comment 1 Timothy St. Clair 2016-10-28 15:19:02 UTC

It's expected that ResourceVersion be out of date and force a re-list.  It's not expected to panic or cause an outage.

Comment 2 Timothy St. Clair 2016-10-31 18:47:19 UTC

This panic is inside the openshift sdn code on a object conversion, re-assigning.

Comment 3 Dan Williams 2016-11-04 18:54:14 UTC

upstream fix: https://github.com/openshift/origin/pull/11792

Comment 4 Troy Dawson 2016-11-09 19:43:23 UTC

This has been merged into ose and is in OSE v3.4.0.24 or newer.

Comment 8 Mike Fiedler 2016-11-10 08:50:04 UTC

Verified in 3.4.0.24.  The non-restarted node no longer panics when the master is brought up in etcd3 storage mode.   There are other issues with the node communicating with the master, but this issue is gone.

Comment 10 errata-xmlrpc 2017-01-18 12:47:45 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0066