Bug 1981867 - [sig-cli] oc explain should contain proper fields description for special types [Suite:openshift/conformance/parallel]
Summary: [sig-cli] oc explain should contain proper fields description for special typ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: oc
Version: 4.9
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.10.0
Assignee: Maciej Szulik
QA Contact: zhou ying
URL:
Whiteboard: tag-ci, non-multi-arch LifecycleReset
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-13 15:18 UTC by Micah Abbott
Modified: 2022-03-11 18:15 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
[sig-cli] oc explain should contain proper fields description for special types [Suite:openshift/conformance/parallel]
Last Closed: 2022-03-11 18:15:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 26728 0 None open Bug 1981867: add backoff retries for oc explain tests 2022-01-05 16:15:39 UTC

Description Micah Abbott 2021-07-13 15:18:35 UTC
test:
[sig-cli] oc explain should contain proper fields description for special types [Suite:openshift/conformance/parallel]

is failing frequently in CI, see search results:
https://search.ci.openshift.org/?search=%5C%5Bsig-cli%5C%5D+oc+explain+should+contain+proper+fields+description+for+special+types+%5C%5BSuite%3Aopenshift%2Fconformance%2Fparallel%5C%5D&maxAge=168h&context=1&type=bug%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Saw this on `pull-ci-openshift-installer-master-e2e-metal-ipi-ovn-ipv6` for a PR to `openshift/installer`

https://github.com/openshift/installer/pull/5049 -> https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_installer/5049/pull-ci-openshift-installer-master-e2e-metal-ipi-ovn-ipv6/1413171524772302848

In this particular case, the failure looked like:

```
[BeforeEach] [Top Level]
  github.com/openshift/origin/test/extended/util/framework.go:1453
[BeforeEach] [Top Level]
  github.com/openshift/origin/test/extended/util/framework.go:1453
[BeforeEach] [Top Level]
  github.com/openshift/origin/test/extended/util/test.go:59
[BeforeEach] [sig-cli] oc explain
  github.com/openshift/origin/test/extended/util/client.go:142
STEP: Creating a kubernetes client
[BeforeEach] [sig-cli] oc explain
  github.com/openshift/origin/test/extended/util/client.go:116
Jul  8 18:05:47.006: INFO: configPath is now "/tmp/configfile057833447"
Jul  8 18:05:47.006: INFO: The user is now "e2e-test-oc-explain-mw2vb-user"
Jul  8 18:05:47.006: INFO: Creating project "e2e-test-oc-explain-mw2vb"
Jul  8 18:05:47.129: INFO: Waiting on permissions in project "e2e-test-oc-explain-mw2vb" ...
Jul  8 18:05:47.134: INFO: Waiting for ServiceAccount "default" to be provisioned...
Jul  8 18:05:47.240: INFO: Waiting for ServiceAccount "deployer" to be provisioned...
Jul  8 18:05:47.345: INFO: Waiting for ServiceAccount "builder" to be provisioned...
Jul  8 18:05:47.452: INFO: Waiting for RoleBinding "system:image-pullers" to be provisioned...
Jul  8 18:05:47.458: INFO: Waiting for RoleBinding "system:image-builders" to be provisioned...
Jul  8 18:05:47.464: INFO: Waiting for RoleBinding "system:deployers" to be provisioned...
Jul  8 18:05:47.984: INFO: Project "e2e-test-oc-explain-mw2vb" has been fully provisioned.
[It] should contain proper fields description for special types [Suite:openshift/conformance/parallel]
  github.com/openshift/origin/test/extended/cli/explain.go:454
Jul  8 18:05:47.984: INFO: Checking apps.openshift.io/v1, Field=deploymentconfigs.status.replicas...
Jul  8 18:05:47.984: INFO: Running 'oc --namespace=e2e-test-oc-explain-mw2vb --kubeconfig=/tmp/configfile057833447 explain deploymentconfigs.status.replicas --api-version=apps.openshift.io/v1'
Jul  8 18:05:48.602: INFO: Checking route.openshift.io/v1, Field=route.metadata.name...
Jul  8 18:05:48.602: INFO: Running 'oc --namespace=e2e-test-oc-explain-mw2vb --kubeconfig=/tmp/configfile057833447 explain route.metadata.name --api-version=route.openshift.io/v1'
Jul  8 18:05:49.224: INFO: Checking authorization.openshift.io/v1, Field=clusterrolebindings.userNames...
Jul  8 18:05:49.224: INFO: Running 'oc --namespace=e2e-test-oc-explain-mw2vb --kubeconfig=/tmp/configfile057833447 explain clusterrolebindings.userNames --api-version=authorization.openshift.io/v1'
Jul  8 18:05:49.801: INFO: Checking authorization.openshift.io/v1, Field=clusterroles.rules...
Jul  8 18:05:49.801: INFO: Running 'oc --namespace=e2e-test-oc-explain-mw2vb --kubeconfig=/tmp/configfile057833447 explain clusterroles.rules --api-version=authorization.openshift.io/v1'
Jul  8 18:05:50.111: INFO: Error running /usr/local/bin/oc --namespace=e2e-test-oc-explain-mw2vb --kubeconfig=/tmp/configfile057833447 explain clusterroles.rules --api-version=authorization.openshift.io/v1:
StdOut>
error: couldn't find resource for "authorization.openshift.io/v1, Kind=ClusterRole"
StdErr>
error: couldn't find resource for "authorization.openshift.io/v1, Kind=ClusterRole"

[AfterEach] [sig-cli] oc explain
  github.com/openshift/origin/test/extended/util/client.go:140
STEP: Collecting events from namespace "e2e-test-oc-explain-mw2vb".
STEP: Found 1 events.
Jul  8 18:05:50.116: INFO: At 2021-07-08 18:05:46 +0000 UTC - event for e2e-test-oc-explain-mw2vb: {namespace-security-allocation-controller } CreatedSCCRanges: created SCC ranges
Jul  8 18:05:50.120: INFO: POD  NODE  PHASE  GRACE  CONDITIONS
Jul  8 18:05:50.120: INFO: 
Jul  8 18:05:50.136: INFO: skipping dumping cluster info - cluster too large
Jul  8 18:05:50.154: INFO: Deleted {user.openshift.io/v1, Resource=users  e2e-test-oc-explain-mw2vb-user}, err: <nil>
Jul  8 18:05:50.171: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthclients  e2e-client-e2e-test-oc-explain-mw2vb}, err: <nil>
Jul  8 18:05:50.182: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthaccesstokens  sha256~JZJH3T52V-40qjKYzfgPxCWBdV5ImI511AQtfSr7xn8}, err: <nil>
[AfterEach] [sig-cli] oc explain
  github.com/openshift/origin/test/extended/util/client.go:141
STEP: Destroying namespace "e2e-test-oc-explain-mw2vb" for this suite.
fail [github.com/openshift/origin/test/extended/cli/explain.go:460]: Unexpected error:
    <*errors.errorString | 0xc00325c2e0>: {
        s: "failed to explain [\"clusterroles.rules\" \"--api-version=authorization.openshift.io/v1\"]: exit status 1",
    }
    failed to explain ["clusterroles.rules" "--api-version=authorization.openshift.io/v1"]: exit status 1
occurred
```

Comment 1 Maciej Szulik 2021-07-21 13:09:06 UTC
Looks like a temporary glitch, but I'll need to have a closer look.

Comment 2 Maciej Szulik 2021-08-19 13:58:02 UTC
From what I was looking this failed usually when one of the operands was down causing a particular API not to be available during the test.

Comment 3 W. Trevor King 2021-10-27 23:17:42 UTC
Can we get another look at this?  Seems like it's not particularly rare for some release-informing and -blocking jobs:

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=168h&type=junit&search=failed%20to%20explain.*exit%20status' | grep 'failures match' | grep '^periodic\|^release' | sort
periodic-ci-openshift-hypershift-main-e2e-aws-pooled-periodic-conformance (all) - 28 runs, 100% failed, 75% of failures match = 75% impact
periodic-ci-openshift-multiarch-master-nightly-4.6-ocp-e2e-remote-libvirt-s390x (all) - 14 runs, 64% failed, 11% of failures match = 7% impact
periodic-ci-openshift-multiarch-master-nightly-4.7-ocp-e2e-remote-libvirt-s390x (all) - 15 runs, 80% failed, 8% of failures match = 7% impact
periodic-ci-openshift-multiarch-master-nightly-4.8-ocp-e2e-remote-libvirt-s390x (all) - 14 runs, 36% failed, 40% of failures match = 14% impact
periodic-ci-openshift-multiarch-master-nightly-4.9-ocp-e2e-remote-libvirt-s390x (all) - 14 runs, 93% failed, 8% of failures match = 7% impact
periodic-ci-openshift-ovn-kubernetes-release-4.8-e2e-metal-ipi-ovn-dualstack-periodic (all) - 7 runs, 43% failed, 33% of failures match = 14% impact
periodic-ci-openshift-release-master-ci-4.10-e2e-gcp-upgrade (all) - 542 runs, 42% failed, 0% of failures match = 0% impact
periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-azure-upgrade (all) - 492 runs, 95% failed, 1% of failures match = 1% impact
periodic-ci-openshift-release-master-ci-4.7-upgrade-from-stable-4.6-e2e-openstack-upgrade (all) - 2 runs, 100% failed, 50% of failures match = 50% impact
periodic-ci-openshift-release-master-ci-4.8-e2e-gcp (all) - 52 runs, 17% failed, 11% of failures match = 2% impact
periodic-ci-openshift-release-master-ci-4.9-e2e-gcp-upgrade (all) - 66 runs, 52% failed, 3% of failures match = 2% impact
periodic-ci-openshift-release-master-nightly-4.8-e2e-metal-ipi (all) - 19 runs, 58% failed, 27% of failures match = 16% impact
periodic-ci-openshift-release-master-nightly-4.8-e2e-metal-ipi-ovn-ipv6 (all) - 23 runs, 70% failed, 19% of failures match = 13% impact
periodic-ci-openshift-release-master-nightly-4.8-e2e-metal-ipi-virtualmedia (all) - 10 runs, 60% failed, 33% of failures match = 20% impact
release-openshift-origin-installer-e2e-gcp-compact-4.7 (all) - 3 runs, 100% failed, 33% of failures match = 33% impact

In particular, I saw this in a 4.8 release-blocking [1]:

: [sig-cli] oc explain should contain proper spec+status for CRDs [Suite:openshift/conformance/parallel] expand_less	21s
fail [github.com/openshift/origin/test/extended/cli/explain.go:450]: Unexpected error:
    <*errors.errorString | 0xc002e7c7a0>: {
        s: "failed to explain [\"installplans\" \"--api-version=operators.coreos.com/v1alpha1\"]: exit status 1",
    }
    failed to explain ["installplans" "--api-version=operators.coreos.com/v1alpha1"]: exit status 1
occurred

where stdout has:

  Oct 27 17:13:48.066: INFO: Error running /usr/bin/oc --namespace=e2e-test-oc-explain-hx9f4 --kubeconfig=/tmp/configfile806360138 explain installplans --api-version=operators.coreos.com/v1alpha1:
  StdOut>
  error: couldn't find resource for "operators.coreos.com/v1alpha1, Kind=InstallPlan"

despite 17:13Z being almost an hour after the CRD was created:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-metal-ipi-ovn-ipv6/1453388863056646144/artifacts/e2e-metal-ipi-ovn-ipv6/gather-must-gather/artifacts/must-gather.tar | tar xOz quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-6f386b0e4757af4e3d07c8dbd7593cc5c1f4463656ff9f476fcde262b5c7ca79/cluster-scoped-resources/apiextensions.k8s.io/customresourcedefinitions/installplans.operators.coreos.com.yaml | yaml2json | jq -r .metadata.creationTimestamp
  2021-10-27T16:16:17Z

Is this an API-server connectivity hiccup with a poorly-summarized error that makes the connection issue sound like a 404?

[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-metal-ipi-ovn-ipv6/1453388863056646144

Comment 4 Michal Fojtik 2021-11-26 23:58:56 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Whiteboard if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 5 Michal Fojtik 2021-12-29 13:22:28 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Whiteboard if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 8 Michal Fojtik 2022-01-11 01:52:20 UTC
The LifecycleStale keyword was removed because the bug moved to QE.
The bug assignee was notified.


Note You need to log in before you can comment on or make changes to this bug.