Bug 1609463 - oc get on a custom resource is fetching discovery every time [NEEDINFO]
Summary: oc get on a custom resource is fetching discovery every time
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: oc
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: ---
: 4.1.0
Assignee: Maciej Szulik
QA Contact: zhou ying
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-28 05:50 UTC by Clayton Coleman
Modified: 2019-06-04 10:40 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Discovery data was ignored during some kubectl invocations. Consequence: Every operation against CRDs downloaded entire discovery. Fix: Refactor the code to use cached data always. Result: Discovery data is fetched less frequently.
Clone Of:
Environment:
Last Closed: 2019-06-04 10:40:22 UTC
Target Upstream Version:
jvallejo: needinfo? (deads)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 0 None None None 2019-06-04 10:40:28 UTC

Description Clayton Coleman 2018-07-28 05:50:18 UTC
Running a recent oc against a recent server, accessing a CR resource via the CLI always fetches discovery info.

$ oc version
oc v3.11.0-alpha.0+fd71d11-547
kubernetes v1.11.0+d4cacc0
features: Basic-Auth

Server https://api.ci.openshift.org:443
openshift v3.11.0-alpha.0+7e5415c-595
kubernetes v1.11.0+d4cacc0

Run

$ oc get prowjob.prow.k8s.io/96d05744-9229-11e8-9d5c-f218980cdfde --loglevel=6
I0728 01:49:32.690802   51191 loader.go:359] Config loaded from file /Users/clayton/.kube/ci.kubeconfig
I0728 01:49:32.691347   51191 loader.go:359] Config loaded from file /Users/clayton/.kube/ci.kubeconfig
I0728 01:49:32.695777   51191 discovery.go:215] Invalidating discovery information
I0728 01:49:32.934911   51191 round_trippers.go:405] GET https://api.ci.openshift.org:443/api?timeout=32s 200 OK in 239 milliseconds
I0728 01:49:32.963856   51191 round_trippers.go:405] GET https://api.ci.openshift.org:443/apis?timeout=32s 200 OK in 28 milliseconds
I0728 01:49:32.993381   51191 round_trippers.go:405] GET https://api.ci.openshift.org:443/api/v1?timeout=32s 200 OK in 28 milliseconds

Comment 1 Juan Vallejo 2018-08-10 17:41:42 UTC
It appears that when the Kind for a custom resource is looked up by the resource builder [1], the discovery RESTMapper returns a `no matches for GVK` error every time. This in turn, causes the cachedDiscoveryClient's cache to be invalidated [2] every time we attempt to lookup GVK information for custom resources.

Apparently, for any resource (not just custom resources), the discovery client's cache is always stale at this step [3], which is what prompts the "no matches for GVK" error in the case of a custom resource. It is only after the cache is invalidated and we attempt to "discover" the custom resource a second time that we end up successfully discovering our custom resource.

Not sure why this happens. David, I was hoping you would have some insight on this.

1. https://github.com/kubernetes/kubernetes/blob/master/pkg/kubectl/genericclioptions/resource/builder.go#L692

2. https://github.com/kubernetes/client-go/blob/master/restmapper/discovery.go#L233

3. https://github.com/kubernetes/client-go/blob/master/restmapper/discovery.go#L232

Comment 2 Juan Vallejo 2018-08-10 19:23:17 UTC
It also appears that when we attempt to look up the Kind for a custom resource (I am using [1] as my example), by the time we get to [2], the partiallySpecifiedResource contains an incorrect Group and Version, but the correct resource.

It is only after the cache has been invalidated that we end up with the correct GVR. For example:

```
$ oc get foo.samplecontroller.k8s.io/example-foo --loglevel 5
# GVR passed to [2]: {G: "k8s.io" V: "samplecontroller" R: "foo"}
I0810 15:20:26.841015   26443 discovery.go:215] Invalidating discovery information
# GVR passed to [2]: {G: "k8s.io" V: "samplecontroller" R: "foo"}
# GVR passed to [2]: {G: "samplecontroller.k8s.io" V: "" R: "foo"}
I0810 15:20:26.930256   26443 get.go:443] no kind is registered for the type v1beta1.Table in scheme "k8s.io/kubernetes/pkg/api/legacyscheme/scheme.go:29"
NAME          REPLICAS
example-foo   1
```

Not sure why we end up passing correct GVR information to the priority RESTMapper _after_ invalidating caches.

1. https://gist.github.com/soltysh/b1c38b1660eea4a4c4741b722774aede
2. https://github.com/kubernetes/apimachinery/blob/master/pkg/api/meta/priority.go#L92

Comment 3 Jordan Liggitt 2018-08-10 20:17:37 UTC
happens with kubectl as well... should open an upstream issue to track

interestingly, a partial name with no group (oc get foo/example-foo) does not trigger it

Comment 4 Juan Vallejo 2018-08-13 17:49:35 UTC
Using the same CRD linked in comment 2, I am able to confirm locally that this problem only occurs when a Resource is not the same as an object's Kind:

```
$ oc get DeploymentConfig.apps.openshift.io/mydc --loglevel 5
# Calculated GVR: openshift.io/apps, Resource=DeploymentConfig
# cache is not invalidated while trying to lookup KindFor
NAME      REVISION   DESIRED   CURRENT   TRIGGERED BY
pictre2   1          1         1         config,image(pictre2:latest)
```

```
$ oc get Foo.samplecontroller.k8s.io/example-foo --loglevel 5
# Calculated GVR: k8s.io/samplecontroller, Resource=Foo
# cache _is_ invalidated while looking up KindFor
I0813 13:35:00.363452   30946 discovery.go:215] Invalidating discovery information
I0813 13:35:00.443568   30946 get.go:443] no kind is registered for the type v1beta1.Table in scheme "k8s.io/kubernetes/pkg/api/legacyscheme/scheme.go:29"
NAME          REPLICAS
example-foo   1
```

Because there is no resource "Foo", this line [1] gets a "not found" error, invalidates caches, and then does a live discovery on a retry.

This only appears to happen when a fully qualified "Kind.Group/name" format is given. Since we attempt to lookup GVR information first, the "Kind" is always assumed to be the Resource name until a failure occurs.

Lowering the severity of this bug, as not specifying the fully qualified Kind.Group/Name format does not cause the cache to be invalidated:

```
$ oc get foos --loglevel 5
I0813 13:49:05.217210    5468 get.go:443] no kind is registered for the type v1beta1.Table in scheme "k8s.io/kubernetes/pkg/api/legacyscheme/scheme.go:29"
NAME          REPLICAS
example-foo   1
```

1. https://github.com/kubernetes/kubernetes/blob/master/pkg/kubectl/genericclioptions/resource/builder.go#L692

Comment 6 Maciej Szulik 2019-02-28 15:15:24 UTC
https://github.com/openshift/origin/pull/22020 and a few others PRs landed to fix this issue. Moving to qa.

Comment 8 Maciej Szulik 2019-03-01 14:52:54 UTC
This is because you're passing full name, if you pass the name of the resource (can even be short) it'll get matched and won't re-fetch discovery.

Comment 9 zhou ying 2019-03-04 02:41:41 UTC
Confirmed the the short name, won't re-fetch discovery:
[root@preserve-master-yinzhou auth]# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-02-27-213933   True        False         27m     Cluster version is 4.0.0-0.nightly-2019-02-27-213933


[root@dhcp-140-138 ~]# oc get dc/ruby-ex --loglevel=6
I0304 10:40:47.214075   17326 loader.go:359] Config loaded from file /root/.kube/config
I0304 10:40:47.215367   17326 loader.go:359] Config loaded from file /root/.kube/config
I0304 10:40:47.225579   17326 loader.go:359] Config loaded from file /root/.kube/config
I0304 10:40:48.359508   17326 round_trippers.go:405] GET https://api.qe-yinzhou.qe.devcluster.openshift.com:6443/apis/apps.openshift.io/v1/namespaces/zhouy/deploymentconfigs/ruby-ex 200 OK in 1133 milliseconds
I0304 10:40:48.360464   17326 get.go:558] no kind is registered for the type v1beta1.Table in scheme "k8s.io/kubernetes/pkg/api/legacyscheme/scheme.go:29"
NAME      REVISION   DESIRED   CURRENT   TRIGGERED BY
ruby-ex   1          1         1         config,image(ruby-ex:latest)

[root@dhcp-140-138 ~]# oc get deploymentconfig/ruby-ex --loglevel=6
I0304 10:41:22.256238   17343 loader.go:359] Config loaded from file /root/.kube/config
I0304 10:41:22.257534   17343 loader.go:359] Config loaded from file /root/.kube/config
I0304 10:41:22.266430   17343 loader.go:359] Config loaded from file /root/.kube/config
I0304 10:41:23.275211   17343 round_trippers.go:405] GET https://api.qe-yinzhou.qe.devcluster.openshift.com:6443/apis/apps.openshift.io/v1/namespaces/zhouy/deploymentconfigs/ruby-ex 200 OK in 1008 milliseconds
I0304 10:41:23.276064   17343 get.go:558] no kind is registered for the type v1beta1.Table in scheme "k8s.io/kubernetes/pkg/api/legacyscheme/scheme.go:29"
NAME      REVISION   DESIRED   CURRENT   TRIGGERED BY
ruby-ex   1          1         1         config,image(ruby-ex:latest)

Comment 12 errata-xmlrpc 2019-06-04 10:40:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.