Bug 1284018 - Multitenant SDN plugin crashes with 'index out of range' error
Summary: Multitenant SDN plugin crashes with 'index out of range' error
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OKD
Classification: Red Hat
Component: Networking
Version: 3.x
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Ben Bennett
QA Contact: Meng Bo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-11-20 14:50 UTC by Ben Bennett
Modified: 2016-02-17 17:04 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-02-17 17:04:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
The full trace (312.03 KB, text/plain)
2015-11-20 14:50 UTC, Ben Bennett
no flags Details

Description Ben Bennett 2015-11-20 14:50:09 UTC
Created attachment 1097232 [details]
The full trace

Description of problem:

Reported by Diego Spinola Castro <spinolacastro>:
 I have an all-in-one origin-1.0.8-0 install running about 30 pods on a node.
 After a machine reboot i origin-node service didn't start, [the trace is
 attached].

 The only way to bring it back was changing the networkPluginName to 
 redhat/openshift-ovs-subnet on node and master configurations.


Version-Release number of selected component (if applicable):
 origin v1.0.8-1-g8f1868d, kubernetes v1.1.0-origin-1107-g4c8e6f4

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Relevant bit from the logs:
Nov 20 12:38:23 origin.v3.ops.getupcloud.com origin-node[9040]: panic: runtime error: index out of range
Nov 20 12:38:23 origin.v3.ops.getupcloud.com origin-node[9040]: goroutine 82 [running]:
Nov 20 12:38:23 origin.v3.ops.getupcloud.com origin-node[9040]: runtime.gopanic(0x2622a20, 0xc20802a000)
Nov 20 12:38:23 origin.v3.ops.getupcloud.com origin-node[9040]: /usr/lib/golang/src/runtime/panic.go:425 +0x2a3 fp=0xc20901f300 sp=0xc20901f298
Nov 20 12:38:23 origin.v3.ops.getupcloud.com origin-node[9040]: runtime.panicindex()
Nov 20 12:38:23 origin.v3.ops.getupcloud.com origin-node[9040]: /usr/lib/golang/src/runtime/panic.go:12 +0x4e fp=0xc20901f328 sp=0xc20901f300
Nov 20 12:38:23 origin.v3.ops.getupcloud.com origin-node[9040]: github.com/openshift/openshift-sdn/plugins/osdn.newSDNPod(0xc20901f538, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
Nov 20 12:38:23 origin.v3.ops.getupcloud.com origin-node[9040]: /builddir/build/BUILD/origin-git-7227.8f1868d/_thirdpartyhacks/src/github.com/openshift/openshift-sdn/plugins/osdn/osdn.go:122 +0x151 fp=0xc20901f3c0 sp=0xc20901f328
Nov 20 12:38:23 origin.v3.ops.getupcloud.com origin-node[9040]: github.com/openshift/openshift-sdn/plugins/osdn.(*OsdnRegistryInterface).GetPods(0xc20866e370, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
Nov 20 12:38:23 origin.v3.ops.getupcloud.com origin-node[9040]: /builddir/build/BUILD/origin-git-7227.8f1868d/_thirdpartyhacks/src/github.com/openshift/openshift-sdn/plugins/osdn/osdn.go:142 +0x428 fp=0xc20901f920 sp=0xc20901f3c0
Nov 20 12:38:23 origin.v3.ops.getupcloud.com origin-node[9040]: github.com/openshift/openshift-sdn/pkg/ovssubnet.(*OvsController).watchAndGetResource(0xc20870d050, 0x2a5d850, 0x3, 0x0, 0x0, 0x0, 0x0)
Nov 20 12:38:23 origin.v3.ops.getupcloud.com origin-node[9040]: /builddir/build/BUILD/origin-git-7227.8f1868d/_thirdpartyhacks/src/github.com/openshift/openshift-sdn/pkg/ovssubnet/common.go:798 +0x98c fp=0xc20901fab0 sp=0xc20901f920
Nov 20 12:38:23 origin.v3.ops.getupcloud.com origin-node[9040]: github.com/openshift/openshift-sdn/pkg/ovssubnet.(*OvsController).StartNode(0xc20870d050, 0x22f7, 0x0, 0x0)
Nov 20 12:38:23 origin.v3.ops.getupcloud.com origin-node[9040]: /builddir/build/BUILD/origin-git-7227.8f1868d/_thirdpartyhacks/src/github.com/openshift/openshift-sdn/pkg/ovssubnet/common.go:487 +0xf69 fp=0xc20901fed0 sp=0xc20901fab0
Nov 20 12:38:23 origin.v3.ops.getupcloud.com origin-node[9040]: github.com/openshift/openshift-sdn/plugins/osdn/multitenant.Node(0xc20866e370, 0xc2084cf640, 0x1c, 0xc2084c34a0, 0xa, 0xc20865e180, 0x7f27b9657890, 0xc208660560, 0x22f7)
Nov 20 12:38:23 origin.v3.ops.getupcloud.com origin-node[9040]: /builddir/build/BUILD/origin-git-7227.8f1868d/_thirdpartyhacks/src/github.com/openshift/openshift-sdn/plugins/osdn/multitenant/multitenant.go:47 +0x254 fp=0xc20901ff98 sp=0xc20901fed0

Comment 1 Ben Bennett 2015-11-20 14:55:07 UTC
func newSDNPod(kPod *kapi.Pod) osdnapi.Pod {
	containerID := ""
	if len(kPod.Status.ContainerStatuses) > 0 {
		// Extract only container ID, pod.Status.ContainerStatuses[0].ContainerID is of the format: docker://<containerID>
		containerID = strings.Split(kPod.Status.ContainerStatuses[0].ContainerID, "://")[1]
	}

The error is on the Split line.  My hunch is that because there are so many pods on the node, it hasn't managed to start them all, so there is a status, but it has not started completely so no container id yet.

Comment 2 Ravi Sankar 2015-11-23 18:59:11 UTC
Fixed in https://github.com/openshift/openshift-sdn/pull/214

Comment 3 Meng Bo 2015-11-26 08:30:33 UTC
@Ben

I had been met the panic error several times before, which are like the error in the attachment.

But I found all the panic errors I met, and the one in this bug were happening after adding the openflow rules by *multitenant.go*, like:
"Oct 26 13:18:28 node2 openshift-node: I1026 13:18:28.117049   16948 multitenant.go:82] Output of adding table=4,priority=200,tcp,nw_dst=172.30.0.1,tp_dst=443,actions=output:2:  (<nil>)
Oct 26 13:18:28 node2 openshift-node: panic: runtime error: index out of range
Oct 26 13:18:28 node2 openshift-node: goroutine 44 [running]:
"


But in the current build (v1.1-224-gb994599), they were adding such rules by controller.go, like:
Nov 23 14:46:32 node1 openshift-node: I1123 14:46:32.414260    6787 controller.go:82] Output of adding table=4,tcp,nw_dst=172.30.0.1,tp_dst=443,priority=200,actions=output:2:  (<nil>)
Nov 23 14:46:32 node1 openshift-node: I1123 14:46:32.416477    6787 controller.go:82] Output of adding table=4,udp,nw_dst=172.30.0.1,tp_dst=53,priority=200,actions=output:2:  (<nil>)
Nov 23 14:46:32 node1 openshift-node: I1123 14:46:32.418416    6787 controller.go:82] Output of adding table=4,tcp,nw_dst=172.30.0.1,tp_dst=53,priority=200,actions=output:2:  (<nil>)


I cannot reproduce the panic error now even w/o the fix in comment#2, I suspect that the issue may have been fixed by some other refactors.
Like this commit https://github.com/openshift/openshift-sdn/commit/c298d4776b6fc29e522e90d1bb01bc57d2307f14


Do you think it is ok to mark this bug as VERIFIED since the issue cannot be reproduced anymore?

Comment 4 Ben Bennett 2015-11-30 19:04:12 UTC
I think it is fine to mark it verified.

Comment 5 Meng Bo 2015-12-02 05:20:49 UTC
Move the bug to verified since the 'index out of range' issue cannot be reproduced.

Build number: v1.1-266-gba7f510-dirty

Comment 6 Josep 'Pep' Turro Mauri 2016-02-17 17:04:18 UTC
A fix for this was released downstream some time ago (bug 1288014), and the last few comments here suggest it was verified upstream too, so I believe this can be closed.

Please reopen if needed.


Note You need to log in before you can comment on or make changes to this bug.