Bug 1501319

Summary: Panic error "cap out of range" on node when deleting other node in the cluster
Product: OpenShift Container Platform Reporter: Meng Bo <bmeng>
Component: NetworkingAssignee: Dan Winship <danw>
Status: CLOSED ERRATA QA Contact: Meng Bo <bmeng>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.7.0CC: aos-bugs, bbennett
Target Milestone: ---   
Target Release: 3.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Unchecked boundary condition Consequence: In a cluster with two nodes, deleting one of them would cause the other one to crash (though it would be fine again after restarting). Fix: The erroneous code was fixed Result: No crash
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-28 22:16:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Meng Bo 2017-10-12 11:22:52 UTC
Description of problem:
Setup multinode env with at least two nodes.
Delete any of the node, check the node log on the other node.
There will be panic error appear regarding to the deletion.

Version-Release number of selected component (if applicable):
v3.7.0-0.147.1

How reproducible:
always

Steps to Reproduce:
1. Setup multi node env with at least two nodes
2. Delete the node1 from master side
# oc delete node node1
3. Check the node log on the node2

Actual results:
Panic error `cap out of range` appears.

Expected results:
Don't panic.

Additional info:
Node logs on the other node:

Oct 12 19:15:17 ose-node2.bmeng.local atomic-openshift-node[29328]: E1012 19:15:17.299497   29328 runtime.go:66] Observed a panic: "makeslice: cap out of range" (runtime error: makeslice: cap out of range)
Oct 12 19:15:17 ose-node2.bmeng.local atomic-openshift-node[29328]: /builddir/build/BUILD/atomic-openshift-git-0.f5af375/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
Oct 12 19:15:17 ose-node2.bmeng.local atomic-openshift-node[29328]: /builddir/build/BUILD/atomic-openshift-git-0.f5af375/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
Oct 12 19:15:17 ose-node2.bmeng.local atomic-openshift-node[29328]: /builddir/build/BUILD/atomic-openshift-git-0.f5af375/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
Oct 12 19:15:17 ose-node2.bmeng.local atomic-openshift-node[29328]: /usr/lib/golang/src/runtime/asm_amd64.s:514
Oct 12 19:15:17 ose-node2.bmeng.local atomic-openshift-node[29328]: /usr/lib/golang/src/runtime/panic.go:489
Oct 12 19:15:17 ose-node2.bmeng.local atomic-openshift-node[29328]: /usr/lib/golang/src/runtime/slice.go:51
Oct 12 19:15:17 ose-node2.bmeng.local atomic-openshift-node[29328]: /builddir/build/BUILD/atomic-openshift-git-0.f5af375/_output/local/go/src/github.com/openshift/origin/pkg/network/node/subnets.go:23
Oct 12 19:15:17 ose-node2.bmeng.local atomic-openshift-node[29328]: /builddir/build/BUILD/atomic-openshift-git-0.f5af375/_output/local/go/src/github.com/openshift/origin/pkg/network/node/subnets.go:66
Oct 12 19:15:17 ose-node2.bmeng.local atomic-openshift-node[29328]: /builddir/build/BUILD/atomic-openshift-git-0.f5af375/_output/local/go/src/github.com/openshift/origin/pkg/network/common/eventqueue.go:110
Oct 12 19:15:17 ose-node2.bmeng.local atomic-openshift-node[29328]: /builddir/build/BUILD/atomic-openshift-git-0.f5af375/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/client-go/tools/cache/delta_fifo.go:451
Oct 12 19:15:17 ose-node2.bmeng.local atomic-openshift-node[29328]: /builddir/build/BUILD/atomic-openshift-git-0.f5af375/_output/local/go/src/github.com/openshift/origin/pkg/network/common/eventqueue.go:116
Oct 12 19:15:17 ose-node2.bmeng.local atomic-openshift-node[29328]: /builddir/build/BUILD/atomic-openshift-git-0.f5af375/_output/local/go/src/github.com/openshift/origin/pkg/network/common/common.go:223
Oct 12 19:15:17 ose-node2.bmeng.local atomic-openshift-node[29328]: /builddir/build/BUILD/atomic-openshift-git-0.f5af375/_output/local/go/src/github.com/openshift/origin/pkg/network/node/subnets.go:68
Oct 12 19:15:17 ose-node2.bmeng.local atomic-openshift-node[29328]: /builddir/build/BUILD/atomic-openshift-git-0.f5af375/_output/local/go/src/github.com/openshift/origin/pkg/network/node/subnets.go:16
Oct 12 19:15:17 ose-node2.bmeng.local atomic-openshift-node[29328]: /builddir/build/BUILD/atomic-openshift-git-0.f5af375/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:97
Oct 12 19:15:17 ose-node2.bmeng.local atomic-openshift-node[29328]: /builddir/build/BUILD/atomic-openshift-git-0.f5af375/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:98
Oct 12 19:15:17 ose-node2.bmeng.local atomic-openshift-node[29328]: /builddir/build/BUILD/atomic-openshift-git-0.f5af375/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:52
Oct 12 19:15:17 ose-node2.bmeng.local atomic-openshift-node[29328]: /builddir/build/BUILD/atomic-openshift-git-0.f5af375/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:43
Oct 12 19:15:17 ose-node2.bmeng.local atomic-openshift-node[29328]: /usr/lib/golang/src/runtime/asm_amd64.s:2197

Comment 1 Dan Winship 2017-10-12 14:56:51 UTC
https://github.com/openshift/origin/pull/16831

Comment 2 Meng Bo 2017-10-13 07:09:02 UTC
Code merged in v3.7.0-0.150.0.

Comment 3 Meng Bo 2017-10-13 07:09:56 UTC
Tested on v3.7.0-0.150.0
The issue has been fixed. No panic error during node deletion.

Verify the bug.

Comment 4 Dan Winship 2017-10-15 13:22:19 UTC
(removed egress IP trello ID from the summary because this bug wasn't related to egress IPs at all, it was just a random bug introduced in the 3.7 cycle)

Comment 7 errata-xmlrpc 2017-11-28 22:16:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188