Bug 1862322 - [sig-operator] an end user can use OLM can subscribe to the operator: Timed out after 300.000s
Summary: [sig-operator] an end user can use OLM can subscribe to the operator: Timed o...
Keywords:
Status: CLOSED DUPLICATE of bug 1857928
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.6
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.6.0
Assignee: Evan Cordell
QA Contact: Jian Zhang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-31 03:51 UTC by W. Trevor King
Modified: 2020-08-06 12:07 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
[sig-operator] an end user can use OLM can subscribe to the operator
Last Closed: 2020-08-06 12:07:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 25352 0 None closed Bug 1862322: e2e/extended: disable OLM 'can subscribe to the operator' test 2020-09-17 09:12:30 UTC
Github openshift origin pull 25353 0 None closed Bug 1857928: Re-enable olm tests 2020-09-17 09:12:30 UTC

Description W. Trevor King 2020-07-31 03:51:25 UTC
test:
[sig-operator] an end user can use OLM can subscribe to the operator 

is failing frequently in CI, see search results:
https://search.svc.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=%5C%5Bsig-operator%5C%5D+an+end+user+can+use+OLM+can+subscribe+to+the+operator

For example:

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=12h&type=junit&search=%5C%5Bsig-operator%5C%5D+an+end+user+can+use+OLM+can+subscribe+to+the+operator' | grep 'failures match' | sort
promote-release-openshift-machine-os-content-e2e-aws-4.6 - 15 runs, 80% failed, 42% of failures match
pull-ci-cri-o-cri-o-master-e2e-aws - 49 runs, 82% failed, 10% of failures match
pull-ci-openshift-cloud-credential-operator-master-e2e-aws - 5 runs, 80% failed, 50% of failures match
...
pull-ci-operator-framework-operator-lifecycle-manager-master-e2e-gcp - 20 runs, 75% failed, 80% of failures match
pull-ci-operator-framework-operator-registry-master-e2e-aws - 7 runs, 86% failed, 33% of failures match
rehearse-10377-pull-ci-openshift-cluster-monitoring-operator-release-4.6-e2e - 1 runs, 100% failed, 100% of failures match
rehearse-8108-pull-ci-openshift-openshift-ansible-master-e2e-aws-scaleup-rhel7 - 5 runs, 60% failed, 67% of failures match
release-openshift-ocp-e2e-aws-scaleup-rhel7-4.6 - 2 runs, 100% failed, 100% of failures match
release-openshift-ocp-installer-e2e-aws-fips-4.6 - 2 runs, 100% failed, 50% of failures match
release-openshift-ocp-installer-e2e-aws-ovn-4.6 - 2 runs, 100% failed, 50% of failures match
release-openshift-ocp-installer-e2e-azure-4.6 - 2 runs, 100% failed, 50% of failures match
release-openshift-ocp-installer-e2e-gcp-4.6 - 2 runs, 100% failed, 50% of failures match
release-openshift-ocp-installer-e2e-gcp-ovn-4.6 - 2 runs, 100% failed, 50% of failures match
release-openshift-origin-installer-e2e-azure-4.6 - 15 runs, 87% failed, 31% of failures match

Looks like this is very sad across the board on 4.6. Example release job is [1],  which dies with:

STEP: Found 0 events.
Jul 31 02:01:33.785: INFO: POD  NODE  PHASE  GRACE  CONDITIONS
Jul 31 02:01:33.785: INFO: 
Jul 31 02:01:33.849: INFO: skipping dumping cluster info - cluster too large
Jul 31 02:01:33.908: INFO: Deleted {user.openshift.io/v1, Resource=users  e2e-test-olm-23440-bf62c-user}, err: <nil>
Jul 31 02:01:33.973: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthclients  e2e-client-e2e-test-olm-23440-bf62c}, err: <nil>
Jul 31 02:01:34.008: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthaccesstokens  0sOayY4ZSvmD5tbwFfzHXgAAAAAAAAAA}, err: <nil>
[AfterEach] [sig-operator] an end user can use OLM
  github.com/openshift/origin@/test/extended/util/client.go:134
Jul 31 02:01:34.008: INFO: Waiting up to 7m0s for all (but 100) nodes to be ready
STEP: Destroying namespace "e2e-test-olm-23440-bf62c" for this suite.
Jul 31 02:01:34.082: INFO: Running AfterSuite actions on all nodes
Jul 31 02:01:34.082: INFO: Running AfterSuite actions on node 1
fail [github.com/openshift/origin@/test/extended/operators/olm.go:199]: Timed out after 300.000s.
Expected
    <string>: 
not to equal
    <string>: 

[1]: https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-e2e-aws-scaleup-rhel7-4.6/1288974231769452544

Comment 1 Stefan Schimanski 2020-07-31 05:42:43 UTC
Seems to perma-fail now. Disabled the test.

Comment 7 Evan Cordell 2020-08-06 11:59:18 UTC
The cause of the flakiness was resolved via other fixes in operator-registry, the attached PR re-enables the test.

Comment 8 Evan Cordell 2020-08-06 12:07:11 UTC

*** This bug has been marked as a duplicate of bug 1857928 ***


Note You need to log in before you can comment on or make changes to this bug.