Bug 2058086 - oc get rolebindings -A fails reproducible with apiserver panic'd on GET runtime error: index out of range
Summary: oc get rolebindings -A fails reproducible with apiserver panic'd on GET runti...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: oc
Version: 3.11.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: 3.11.z
Assignee: Arda Guclu
QA Contact: zhou ying
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-02-24 11:33 UTC by Arne Gogala
Modified: 2022-03-31 05:17 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-31 05:17:19 UTC
Target Upstream Version:
Embargoed:
agogala: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 26922 0 None Merged Bug 2058086: Add item length check in legacyPrinterToTable 2022-06-06 08:46:20 UTC
Red Hat Product Errata RHBA-2022:1033 0 None None None 2022-03-31 05:17:23 UTC

Description Arne Gogala 2022-02-24 11:33:23 UTC
Description of problem:

# `oc get rolebindings --all-namespaces` fails reproducible with:

F0201 14:15:42.906900  209818 helpers.go:119] Error from server (InternalError): an error on the server ("This request caused apiserver to panic. Look in the logs for details.") has prevented the request from succeeding (get rolebindings.authorization.openshift.io)

Version-Release number of selected component (if applicable):

3.11.380

How reproducible:

Always when using `oc get rolebindings --all`

Steps to Reproduce:
1. `$ oc get rolebindings --all-namespaces --loglevel=10`


Actual results:

I0201 14:15:42.906879  209818 helpers.go:201] server response object: [{
  "metadata": {},
  "status": "Failure",
  "message": "an error on the server (\"This request caused apiserver to panic. Look in the logs for details.\") has prevented the request from succeeding (get rolebindings.authorization.openshift.io)",
  "reason": "InternalError",
  "details": {
    "group": "authorization.openshift.io",
    "kind": "rolebindings",
    "causes": [
      {
        "reason": "UnexpectedServerResponse",
        "message": "This request caused apiserver to panic. Look in the logs for details."
      }
    ]
  },
  "code": 500
}]
F0201 14:15:42.906900  209818 helpers.go:119] Error from server (InternalError): an error on the server ("This request caused apiserver to panic. Look in the logs for details.") has prevented the request from succeeding (get rolebindings.authorization.openshift.io)

###### master-logs_api_api ###### -> full message in attached file

     1 runtime.go:67] Observed a panic: runtime error: index out of range
goroutine 598166367 [running]:
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP.func1.1(0xc4d75496e0)
        /builddir/build/BUILD/atomic-openshift-git-0.f875174/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:104 +0xe1
panic(0x47e4d20, 0xa35caa0)
        /opt/rh/go-toolset-1.10/root/usr/lib/go-toolset-1.10-golang/src/runtime/panic.go:502 +0x229
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithAudit.func1.1(0xc44b413400, 0x7fa7cafe7cb8, 0xc421e2a080, 0xa6c23b0, 0x0, 0x0, 0x0, 0x0)
        /builddir/build/BUILD/atomic-openshift-git-0.f875174/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters/audit.go:84 +0x1f8
panic(0x47e4d20, 0xa35caa0)
        /opt/rh/go-toolset-1.10/root/usr/lib/go-toolset-1.10-golang/src/runtime/panic.go:502 +0x229
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/printers.(*HumanReadablePrinter).legacyPrinterToTable(0xc421a36300, 0x7814f20, 0xc45edb3420, 0xc4216bc320, 0x1, 0x0, 0xc4d8b6f7e0)
        /builddir/build/BUILD/atomic-openshift-git-0.f875174/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/printers/humanreadable.go:671 +0xc96
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/printers.(*HumanReadablePrinter).PrintTable(0xc421a36300, 0x7814f20, 0xc45edb3420, 0x0, 0x0, 0x0, 0x0, 0x1000000, 0x0, 0x0, ...)
        /builddir/build/BUILD/atomic-openshift-git-0.f875174/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/printers/humanreadable.go:502 +0x97f
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/printers/storage.TableConvertor.ConvertToTable(0x780ba20, 0xc421a36300, 0x78621a0, 0xc48e58cc90, 0x7814f20, 0xc45edb3420, 0x7821fa0, 0xc469196f30, 0x1, 0x7, ...)
        /builddir/build/BUILD/atomic-openshift-git-0.f875174/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/printers/storage/storage.go:32 +0x8e
github.com/openshift/origin/pkg/util/registry.(*noWatchStorageErrWrapper).ConvertToTable(0xc4203e7430, 0x78621a0, 0xc48e58cc90, 0x7814f20, 0xc45edb3420, 0x7821fa0, 0xc469196f30, 0xc469196f30, 0x0, 0x0)
        /builddir/build/BUILD/atomic-openshift-git-0.f875174/_output/local/go/src/github.com/openshift/origin/pkg/util/registry/wrapper.go:51 +0x7c
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers.transformResponseObject(0x78621a0, 0xc48e58cc90, 0x787aea0, 0xc4234d5400, 0x785cea0, 0xc421445080, 0x7822120, 0xc42010c940, 0x7809dc0, 0xc42015bf80, ...)
        /builddir/build/BUILD/atomic-openshift-git-0.f875174/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers/response.go:117 +0x49f
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers.ListResource.func1(0x7855820, 0xc478624d58, 0xc452015000)
        /builddir/build/BUILD/atomic-openshift-git-0.f875174/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers/get.go:282 +0xa02
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints.restfulListResource.func1(0xc48e58cb70, 0xc4e5d03b00)
        /builddir/build/BUILD/atomic-openshift-git-0.f875174/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/installer.go:1010 +0xd0


Expected results:

get a list of all rolebindings

Additional info:

1. command is running for 5-6 seconds before failing, always reproducible, always failing somewhere around the same role in the same project "XYZ"
2. using oc get rolebindings -n XYZ is running fine
3. looping over all projects (for i in $(oc get projects -o json|jq -r .items[].metadata.name); do echo "Working on Project $i:" && oc get rolebindings.rbac -n $i ; done) is runnning fine
4. etcd seems to be okay regarding performance and any suspicious error messages
5. testing was done with different oc-client releases, up to 3.11.570, doesn't change anything
6. removed .kube directory for oc-client to exclude any caching issues
7. issue only to be observed on one out of >25 Clusters running the same RHOCP 3.11.380 release

Comment 7 Anand Paladugu 2022-03-15 18:37:15 UTC
So the command does not immediately fail, and scoping it to specific namespace works or iterating over namespaces works.

The command only fails with -all-namespaces, and the error  "index out of range"  likely implies that there are more rolebindings than we can handle.  Can we get a count of the role bindings from Cu's iteration loop, so we can internally repro to see at what number the command fails ?  

I guess code review could help us point that out too.

Comment 9 Anand Paladugu 2022-03-16 17:36:38 UTC
@agogala 

I am from OpenShift support team, trying to see if the issue is due to # of role-bindings.

Customer had 181 namespaces with 3585 role-bindings.

I have tested 231 namespaces with 4269 role bindings with one name space having 200 role bindings.  So it does not look like it's either the overall role bindings number or per namespace threshold issue.  Could be a column /field content issue in the table and that would be hard to reproduce.

[anandpaladugu@localhost github]$ oc-311 get rolebindings.rbac --all-namespaces | wc -l
4269


OCP versions in my setup are as below:

[anandpaladugu@localhost github]$ oc-311 version
oc v3.11.117
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://openshift.sharedocp311cns.lab.upshift.rdu2.redhat.com:443
openshift v3.11.570
kubernetes v1.11.0+d4cacc0


What versions is customer using ?



@maszulik   seems to have tagged this with target release 3.11.Z, and I am wondering if he has more insights.

Comment 12 Maciej Szulik 2022-03-17 15:19:12 UTC
> @maszulik   seems to have tagged this with target release 3.11.Z, and I am wondering if he has more insights. 

Based on the place in code it looks like it's specifically with this particular dataset, which triggers the index out of bounds error.

Comment 21 zhou ying 2022-03-24 02:09:56 UTC
can't reproduce the issue now:


oc  get rolebindings --all-namespaces
NAMESPACE                                          NAME                                                              ROLE                                                                   AGE
default                                            machine-config-controller-events                                  ClusterRole/machine-config-controller-events                           122m
default                                            machine-config-daemon-events                                      ClusterRole/machine-config-daemon-events                               123m
default                                            prometheus-k8s                                                    Role/prometheus-k8s                                                    116m
default                                            system:deployers                                                  ClusterRole/system:deployer                                            118m
default                                            system:image-builders                                             ClusterRole/system:image-builder                                       118m
default                                            system:image-pullers                                              ClusterRole/system:image-puller                                        118m
......                                        
[root@localhost roottest]# echo $?
0


[root@localhost roottest]# oc version 
oc v3.11.664
kubernetes v1.11.0+d4cacc0

Comment 23 errata-xmlrpc 2022-03-31 05:17:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 3.11.664 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:1033


Note You need to log in before you can comment on or make changes to this bug.