Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1389770

Summary: Node panics repeatedly with unkeyable object error after migrating storage etcd2->etcd3 with node up
Product: OpenShift Container Platform Reporter: Mike Fiedler <mifiedle>
Component: NetworkingAssignee: Dan Williams <dcbw>
Status: CLOSED ERRATA QA Contact: Mike Fiedler <mifiedle>
Severity: high Docs Contact:
Priority: medium    
Version: 3.4.0CC: aos-bugs, bbennett, bmeng, jokerman, mifiedle, mmccomas, tdawson, tstclair, wmeng
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-18 12:47:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Node log none

Description Mike Fiedler 2016-10-28 14:25:58 UTC
Created attachment 1215014 [details]
Node log

Description of problem:

After migrating etcd storage from V2 to V3 and configuring the API servers to use storage-backend=etcd3, the nodes (which were not stopped during this time) started panic-ing repeatedly when the api servers came back up.

Sample:

Oct 27 19:59:52 localhost atomic-openshift-node: E1027 19:59:52.799799   16576 runtime.go:64] Observed a panic: "unkeyable object: {svt664 &TypeMeta{Kind:,APIVersion:,}}, object has no meta: object does not implement the Object interfaces" (unkeyable object: {svt664 &TypeMeta{Kind:,APIVersion:,}}, object has no meta: object does not implement the Object interfaces)
Oct 27 19:59:52 localhost atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.cc70b72/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/runtime/runtime.go:70
Oct 27 19:59:52 localhost atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.cc70b72/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/runtime/runtime.go:63
Oct 27 19:59:52 localhost atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.cc70b72/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/runtime/runtime.go:49
Oct 27 19:59:52 localhost atomic-openshift-node: /usr/lib/golang/src/runtime/asm_amd64.s:479
Oct 27 19:59:52 localhost atomic-openshift-node: /usr/lib/golang/src/runtime/panic.go:458
Oct 27 19:59:52 localhost atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.cc70b72/_output/local/go/src/github.com/openshift/origin/pkg/sdn/plugin/eventqueue.go:187
Oct 27 19:59:52 localhost atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.cc70b72/_output/local/go/src/github.com/openshift/origin/pkg/sdn/plugin/eventqueue.go:34
Oct 27 19:59:52 localhost atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.cc70b72/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache/delta_fifo.go:573
Oct 27 19:59:52 localhost atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.cc70b72/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache/delta_fifo.go:312
Oct 27 19:59:52 localhost atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.cc70b72/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache/delta_fifo.go:490
Oct 27 19:59:52 localhost atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.cc70b72/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:343
Oct 27 19:59:52 localhost atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.cc70b72/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:271
Oct 27 19:59:52 localhost atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.cc70b72/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:202
Oct 27 19:59:52 localhost atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.cc70b72/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/wait/wait.go:88
Oct 27 19:59:52 localhost atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.cc70b72/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/wait/wait.go:89
Oct 27 19:59:52 localhost atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.cc70b72/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/wait/wait.go:49
Oct 27 19:59:52 localhost atomic-openshift-node: /usr/lib/golang/src/runtime/asm_amd64.s:2086


I let it run for about 10 minutes and there was no recovery.   I'll run again and see if there is eventual recovery.


Version-Release number of selected component (if applicable): 3.4.0.16


How reproducible: always


Steps to Reproduce:
1.  Install an HA cluster (3 masters, 3 etcd) with OCP 3.4.0.16 + etcd 2.3.7
2.  Create projects with running deployments
3.  Shutdown masters and etcd.   Leave OpenShift nodes running.
4.  On each etcd:  yum swap etcd3 etcd to install etcd3 3.0.12-3. 
5.  On each etcd:  etcdctl migrate --data-dir /var/lib/etcd
6.  Start etcd on each 
7.  Start OpenShift masters

Actual results:

Nodes will get repeated panics (see above).  Cluster is inoperable - no operations involving nodes work.

Expected results:

Nodes recover and re-set their watches/lists when an etcd API version change occurs without having to restart the entire cluster.

Comment 1 Timothy St. Clair 2016-10-28 15:19:02 UTC
It's expected that ResourceVersion be out of date and force a re-list.  It's not expected to panic or cause an outage.

Comment 2 Timothy St. Clair 2016-10-31 18:47:19 UTC
This panic is inside the openshift sdn code on a object conversion, re-assigning.

Comment 3 Dan Williams 2016-11-04 18:54:14 UTC
upstream fix: https://github.com/openshift/origin/pull/11792

Comment 4 Troy Dawson 2016-11-09 19:43:23 UTC
This has been merged into ose and is in OSE v3.4.0.24 or newer.

Comment 8 Mike Fiedler 2016-11-10 08:50:04 UTC
Verified in 3.4.0.24.  The non-restarted node no longer panics when the master is brought up in etcd3 storage mode.   There are other issues with the node communicating with the master, but this issue is gone.

Comment 10 errata-xmlrpc 2017-01-18 12:47:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0066