Bug 1532060 - Router Panic: panic: runtime error: index out of range in cockroachdb
Summary: Router Panic: panic: runtime error: index out of range in cockroachdb
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 3.9.0
Assignee: Rajat Chopra
QA Contact: zhaozhanqi
URL:
Whiteboard:
: 1552742 1582818 (view as bug list)
Depends On:
Blocks: 1590826
TreeView+ depends on / blocked
 
Reported: 2018-01-07 20:11 UTC by Eric Paris
Modified: 2022-08-04 22:20 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1590826 (view as bug list)
Environment:
Last Closed: 2018-02-22 23:25:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
oc logs -p router-1142-d5s4f (3.23 KB, text/plain)
2018-01-07 20:11 UTC, Eric Paris
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Origin (Github) 18423 0 None None None 2018-02-05 18:47:24 UTC
Red Hat Product Errata RHBA-2018:0489 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.9 RPM Release Advisory 2018-03-28 18:06:38 UTC

Description Eric Paris 2018-01-07 20:11:39 UTC
Created attachment 1378189 [details]
oc logs -p router-1142-d5s4f

registry.reg-aws.openshift.com:443/openshift3/ose-haproxy-router:v3.7.9-1

Found this in us-east-1:

$ oc get pod -n default -l router=router
NAME                READY     STATUS    RESTARTS   AGE
router-1142-5zmgr   1/1       Running   34         23d
router-1142-8zkgv   1/1       Running   20         23d
router-1142-d5s4f   1/1       Running   9          23d
router-1142-vkqq7   1/1       Running   8          23d
router-1142-xpjhw   1/1       Running   37         23d

Looking at all 5 routers `oc get logs -p` I see this at the end of all of them:

panic: runtime error: index out of range

goroutine 174195 [running]:
github.com/openshift/origin/vendor/github.com/cockroachdb/cmux.(*ptNode).match(0xc420c75dd0, 0xc420c84658, 0x0, 0x8, 0x1, 0x0)
	/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/github.com/cockroachdb/cmux/patricia.go:148 +0x197
github.com/openshift/origin/vendor/github.com/cockroachdb/cmux.(*patriciaTree).matchPrefix(0xc420c7fa00, 0xf287600, 0xc44ea815b8, 0xf2a0700)
	/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/github.com/cockroachdb/cmux/patricia.go:38 +0x90
github.com/openshift/origin/vendor/github.com/cockroachdb/cmux.(*patriciaTree).(github.com/openshift/origin/vendor/github.com/cockroachdb/cmux.matchPrefix)-fm(0xf287600, 0xc44ea815b8, 0xc431896ac8)
	/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/github.com/cockroachdb/cmux/matchers.go:23 +0x3e
github.com/openshift/origin/vendor/github.com/cockroachdb/cmux.(*cMux).serve(0xc420c654c0, 0xf2fa960, 0xc431896ac8, 0xc420c82360, 0xc420ff9190)
	/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/github.com/cockroachdb/cmux/cmux.go:129 +0x265
created by github.com/openshift/origin/vendor/github.com/cockroachdb/cmux.(*cMux).Serve
	/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/github.com/cockroachdb/cmux/cmux.go:119 +0x16c

Comment 1 Ben Bennett 2018-01-17 21:53:28 UTC
This was introduced by: https://github.com/openshift/origin/pull/16975

Comment 2 Rajat Chopra 2018-02-01 02:07:13 UTC
The bug fix is in the vendor tree. We need to update cockroachdb/cmux to the following commit id: b64f5908f4945f4b11ed4a0a9d3cc1e23350866d (at least)

The fix is in patricia tree overflowing on a boundary condition on http 1.1 Fast request match. Either we switch to a slower but more accurate match i.e. not use FastMatch, or we update the repo to include this fix.

Glide update PR coming up.

Comment 4 zhaozhanqi 2018-02-22 07:16:02 UTC
hi,@Eric @Ben @Rajat

I'm wondering I still cannot understand how to reproduce this issue. since we did some round of testing in 3.9. but not found this kind of issue. Could you give some clue or steps to reproduce this in order to avoid happen same issue in future. thanks.

Comment 5 Eric Paris 2018-02-22 15:31:53 UTC
I honestly have no idea how to reproduce other than run it in online. I'm ok with QA just verifying the code has changed and we'll see if they continue online. Rajat, what version has the fix?

Comment 6 Rajat Chopra 2018-02-22 23:25:32 UTC
The master branch has the fix. Anyway the fix was in an upstream package. Its a time-sensitive bug that is difficult to reproduce, so I support comment#5.
Closing this bug as 'fixed' upstream.

Comment 7 Ben Bennett 2018-03-07 20:07:48 UTC
*** Bug 1552742 has been marked as a duplicate of this bug. ***

Comment 14 Ben Bennett 2018-05-29 14:14:45 UTC
*** Bug 1582818 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.