Bug 1609463

Summary: oc get on a custom resource is fetching discovery every time
Product: OpenShift Container Platform Reporter: Clayton Coleman <ccoleman>
Component: ocAssignee: Maciej Szulik <maszulik>
Status: CLOSED ERRATA QA Contact: zhou ying <yinzhou>
Severity: low Docs Contact:
Priority: medium    
Version: 3.11.0CC: aos-bugs, deads, jokerman, maszulik, mfojtik, mmccomas
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Discovery data was ignored during some kubectl invocations. Consequence: Every operation against CRDs downloaded entire discovery. Fix: Refactor the code to use cached data always. Result: Discovery data is fetched less frequently.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:40:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Clayton Coleman 2018-07-28 05:50:18 UTC
Running a recent oc against a recent server, accessing a CR resource via the CLI always fetches discovery info.

$ oc version
oc v3.11.0-alpha.0+fd71d11-547
kubernetes v1.11.0+d4cacc0
features: Basic-Auth

Server https://api.ci.openshift.org:443
openshift v3.11.0-alpha.0+7e5415c-595
kubernetes v1.11.0+d4cacc0

Run

$ oc get prowjob.prow.k8s.io/96d05744-9229-11e8-9d5c-f218980cdfde --loglevel=6
I0728 01:49:32.690802   51191 loader.go:359] Config loaded from file /Users/clayton/.kube/ci.kubeconfig
I0728 01:49:32.691347   51191 loader.go:359] Config loaded from file /Users/clayton/.kube/ci.kubeconfig
I0728 01:49:32.695777   51191 discovery.go:215] Invalidating discovery information
I0728 01:49:32.934911   51191 round_trippers.go:405] GET https://api.ci.openshift.org:443/api?timeout=32s 200 OK in 239 milliseconds
I0728 01:49:32.963856   51191 round_trippers.go:405] GET https://api.ci.openshift.org:443/apis?timeout=32s 200 OK in 28 milliseconds
I0728 01:49:32.993381   51191 round_trippers.go:405] GET https://api.ci.openshift.org:443/api/v1?timeout=32s 200 OK in 28 milliseconds

Comment 1 Juan Vallejo 2018-08-10 17:41:42 UTC
It appears that when the Kind for a custom resource is looked up by the resource builder [1], the discovery RESTMapper returns a `no matches for GVK` error every time. This in turn, causes the cachedDiscoveryClient's cache to be invalidated [2] every time we attempt to lookup GVK information for custom resources.

Apparently, for any resource (not just custom resources), the discovery client's cache is always stale at this step [3], which is what prompts the "no matches for GVK" error in the case of a custom resource. It is only after the cache is invalidated and we attempt to "discover" the custom resource a second time that we end up successfully discovering our custom resource.

Not sure why this happens. David, I was hoping you would have some insight on this.

1. https://github.com/kubernetes/kubernetes/blob/master/pkg/kubectl/genericclioptions/resource/builder.go#L692

2. https://github.com/kubernetes/client-go/blob/master/restmapper/discovery.go#L233

3. https://github.com/kubernetes/client-go/blob/master/restmapper/discovery.go#L232

Comment 2 Juan Vallejo 2018-08-10 19:23:17 UTC
It also appears that when we attempt to look up the Kind for a custom resource (I am using [1] as my example), by the time we get to [2], the partiallySpecifiedResource contains an incorrect Group and Version, but the correct resource.

It is only after the cache has been invalidated that we end up with the correct GVR. For example:

```
$ oc get foo.samplecontroller.k8s.io/example-foo --loglevel 5
# GVR passed to [2]: {G: "k8s.io" V: "samplecontroller" R: "foo"}
I0810 15:20:26.841015   26443 discovery.go:215] Invalidating discovery information
# GVR passed to [2]: {G: "k8s.io" V: "samplecontroller" R: "foo"}
# GVR passed to [2]: {G: "samplecontroller.k8s.io" V: "" R: "foo"}
I0810 15:20:26.930256   26443 get.go:443] no kind is registered for the type v1beta1.Table in scheme "k8s.io/kubernetes/pkg/api/legacyscheme/scheme.go:29"
NAME          REPLICAS
example-foo   1
```

Not sure why we end up passing correct GVR information to the priority RESTMapper _after_ invalidating caches.

1. https://gist.github.com/soltysh/b1c38b1660eea4a4c4741b722774aede
2. https://github.com/kubernetes/apimachinery/blob/master/pkg/api/meta/priority.go#L92

Comment 3 Jordan Liggitt 2018-08-10 20:17:37 UTC
happens with kubectl as well... should open an upstream issue to track

interestingly, a partial name with no group (oc get foo/example-foo) does not trigger it

Comment 4 Juan Vallejo 2018-08-13 17:49:35 UTC
Using the same CRD linked in comment 2, I am able to confirm locally that this problem only occurs when a Resource is not the same as an object's Kind:

```
$ oc get DeploymentConfig.apps.openshift.io/mydc --loglevel 5
# Calculated GVR: openshift.io/apps, Resource=DeploymentConfig
# cache is not invalidated while trying to lookup KindFor
NAME      REVISION   DESIRED   CURRENT   TRIGGERED BY
pictre2   1          1         1         config,image(pictre2:latest)
```

```
$ oc get Foo.samplecontroller.k8s.io/example-foo --loglevel 5
# Calculated GVR: k8s.io/samplecontroller, Resource=Foo
# cache _is_ invalidated while looking up KindFor
I0813 13:35:00.363452   30946 discovery.go:215] Invalidating discovery information
I0813 13:35:00.443568   30946 get.go:443] no kind is registered for the type v1beta1.Table in scheme "k8s.io/kubernetes/pkg/api/legacyscheme/scheme.go:29"
NAME          REPLICAS
example-foo   1
```

Because there is no resource "Foo", this line [1] gets a "not found" error, invalidates caches, and then does a live discovery on a retry.

This only appears to happen when a fully qualified "Kind.Group/name" format is given. Since we attempt to lookup GVR information first, the "Kind" is always assumed to be the Resource name until a failure occurs.

Lowering the severity of this bug, as not specifying the fully qualified Kind.Group/Name format does not cause the cache to be invalidated:

```
$ oc get foos --loglevel 5
I0813 13:49:05.217210    5468 get.go:443] no kind is registered for the type v1beta1.Table in scheme "k8s.io/kubernetes/pkg/api/legacyscheme/scheme.go:29"
NAME          REPLICAS
example-foo   1
```

1. https://github.com/kubernetes/kubernetes/blob/master/pkg/kubectl/genericclioptions/resource/builder.go#L692

Comment 6 Maciej Szulik 2019-02-28 15:15:24 UTC
https://github.com/openshift/origin/pull/22020 and a few others PRs landed to fix this issue. Moving to qa.

Comment 8 Maciej Szulik 2019-03-01 14:52:54 UTC
This is because you're passing full name, if you pass the name of the resource (can even be short) it'll get matched and won't re-fetch discovery.

Comment 9 zhou ying 2019-03-04 02:41:41 UTC
Confirmed the the short name, won't re-fetch discovery:
[root@preserve-master-yinzhou auth]# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-02-27-213933   True        False         27m     Cluster version is 4.0.0-0.nightly-2019-02-27-213933


[root@dhcp-140-138 ~]# oc get dc/ruby-ex --loglevel=6
I0304 10:40:47.214075   17326 loader.go:359] Config loaded from file /root/.kube/config
I0304 10:40:47.215367   17326 loader.go:359] Config loaded from file /root/.kube/config
I0304 10:40:47.225579   17326 loader.go:359] Config loaded from file /root/.kube/config
I0304 10:40:48.359508   17326 round_trippers.go:405] GET https://api.qe-yinzhou.qe.devcluster.openshift.com:6443/apis/apps.openshift.io/v1/namespaces/zhouy/deploymentconfigs/ruby-ex 200 OK in 1133 milliseconds
I0304 10:40:48.360464   17326 get.go:558] no kind is registered for the type v1beta1.Table in scheme "k8s.io/kubernetes/pkg/api/legacyscheme/scheme.go:29"
NAME      REVISION   DESIRED   CURRENT   TRIGGERED BY
ruby-ex   1          1         1         config,image(ruby-ex:latest)

[root@dhcp-140-138 ~]# oc get deploymentconfig/ruby-ex --loglevel=6
I0304 10:41:22.256238   17343 loader.go:359] Config loaded from file /root/.kube/config
I0304 10:41:22.257534   17343 loader.go:359] Config loaded from file /root/.kube/config
I0304 10:41:22.266430   17343 loader.go:359] Config loaded from file /root/.kube/config
I0304 10:41:23.275211   17343 round_trippers.go:405] GET https://api.qe-yinzhou.qe.devcluster.openshift.com:6443/apis/apps.openshift.io/v1/namespaces/zhouy/deploymentconfigs/ruby-ex 200 OK in 1008 milliseconds
I0304 10:41:23.276064   17343 get.go:558] no kind is registered for the type v1beta1.Table in scheme "k8s.io/kubernetes/pkg/api/legacyscheme/scheme.go:29"
NAME      REVISION   DESIRED   CURRENT   TRIGGERED BY
ruby-ex   1          1         1         config,image(ruby-ex:latest)

Comment 12 errata-xmlrpc 2019-06-04 10:40:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758

Comment 13 Red Hat Bugzilla 2023-09-14 04:32:18 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days