1861201 – [sig-cli] oc adm must-gather runs successfully for audit logs

Bug 1861201 - [sig-cli] oc adm must-gather runs successfully for audit logs

Summary: [sig-cli] oc adm must-gather runs successfully for audit logs

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	oc
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.6.0
Assignee:	Jan Chaloupka
QA Contact:	zhou ying
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-07-28 04:06 UTC by W. Trevor King
Modified:	2020-10-27 16:20 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-10-27 16:17:40 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift origin pull 25336	0	None	closed	bug 1861201: make must-gather test resilient to failures and disk timing	2020-11-13 11:05:26 UTC
Red Hat Product Errata	RHBA-2020:4196	0	None	None	None	2020-10-27 16:20:58 UTC

Description W. Trevor King 2020-07-28 04:06:42 UTC

test:
[sig-cli] oc adm must-gather runs successfully for audit logs

is failing frequently in CI, see search results:
https://search.svc.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&search=%5C%5Bsig-cli%5C%5D+oc+adm+must-gather+runs+successfully+for+audit+logs

$ w3m -dump -cols 200 'https://search.svc.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&search=%5C%5Bsig-cli%5C%5D+oc+adm+must-gather+runs+successfully+for+audit+logs' | grep 'failures match' | sort
promote-release-openshift-machine-os-content-e2e-aws-4.6 - 546 runs, 56% failed, 1% of failures match
pull-ci-cri-o-cri-o-master-e2e-aws - 241 runs, 62% failed, 17% of failures match
pull-ci-cri-o-cri-o-release-1.19-e2e-aws - 6 runs, 50% failed, 67% of failures match
pull-ci-openshift-cloud-credential-operator-master-e2e-aws - 18 runs, 39% failed, 43% of failures match
pull-ci-openshift-cluster-api-provider-aws-master-e2e-aws - 6 runs, 50% failed, 33% of failures match
...
pull-ci-operator-framework-operator-lifecycle-manager-master-e2e-gcp - 94 runs, 57% failed, 24% of failures match
pull-ci-operator-framework-operator-registry-master-e2e-aws - 33 runs, 58% failed, 26% of failures match
rehearse-10454-pull-ci-cri-o-cri-o-master-e2e-aws - 3 runs, 33% failed, 100% of failures match
rehearse-10454-pull-ci-openshift-cloud-credential-operator-master-e2e-azure - 3 runs, 67% failed, 50% of failures match
rehearse-10454-pull-ci-openshift-cluster-network-operator-master-e2e-aws-sdn-multi - 3 runs, 33% failed, 100% of failures match
rehearse-10454-pull-ci-openshift-cluster-network-operator-master-e2e-azure - 3 runs, 33% failed, 100% of failures match
rehearse-10454-pull-ci-openshift-installer-master-e2e-gcp-shared-vpc - 3 runs, 33% failed, 100% of failures match
release-openshift-ocp-e2e-aws-scaleup-rhel7-4.6 - 28 runs, 61% failed, 24% of failures match
release-openshift-ocp-installer-e2e-azure-4.6 - 71 runs, 66% failed, 28% of failures match

Picking [1] as release-informing example, the test-case flaked, failing once and passing on retry.  The failure included:

STEP: Found 0 events.
Jul 27 21:22:40.551: INFO: POD  NODE  PHASE  GRACE  CONDITIONS
Jul 27 21:22:40.551: INFO: 
Jul 27 21:22:40.675: INFO: skipping dumping cluster info - cluster too large
Jul 27 21:22:40.724: INFO: Deleted {user.openshift.io/v1, Resource=users  e2e-test-oc-adm-must-gather-gb28p-user}, err: <nil>
Jul 27 21:22:40.774: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthclients  e2e-client-e2e-test-oc-adm-must-gather-gb28p}, err: <nil>
Jul 27 21:22:40.825: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthaccesstokens  QWtOCZRnTwiQ1Op-xQtGeAAAAAAAAAAA}, err: <nil>
[AfterEach] [sig-cli] oc adm must-gather
  github.com/openshift/origin@/test/extended/util/client.go:134
Jul 27 21:22:40.825: INFO: Waiting up to 7m0s for all (but 100) nodes to be ready
STEP: Destroying namespace "e2e-test-oc-adm-must-gather-gb28p" for this suite.
Jul 27 21:22:40.914: INFO: Running AfterSuite actions on all nodes
Jul 27 21:22:40.915: INFO: Running AfterSuite actions on node 1
fail [github.com/openshift/origin@/test/extended/cli/mustgather.go:248]: Expected
    <int>: 0
to be >
    <int>: 1000

An example PR presubmit where the test-case failed both times and was the only failure is [2].  Possibly related to the PR which landed for bug 1859916.

[1]: https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-azure-4.6/1287846471231606784
[2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-version-operator/406/pull-ci-openshift-cluster-version-operator-master-e2e/1287918162578247680

Comment 3 zhou ying 2020-07-29 06:46:17 UTC

Since the bug need to check the failure ration , will check days later.

Comment 4 zhou ying 2020-08-06 01:28:23 UTC

w3m -dump -cols 200  'https://search.ci.openshift.org/?search=oc+adm+must-gather+runs+successfully+for+audit+logs&maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job' | grep 'failures match' | sort

pull-ci-openshift-cloud-credential-operator-master-e2e-aws - 14 runs, 71% failed, 10% of failures match
pull-ci-openshift-cluster-etcd-operator-master-e2e-azure - 50 runs, 96% failed, 2% of failures match
pull-ci-openshift-cluster-network-operator-master-e2e-azure - 94 runs, 94% failed, 6% of failures match
pull-ci-openshift-cluster-network-operator-master-e2e-gcp-ovn - 121 runs, 90% failed, 1% of failures match
pull-ci-openshift-cluster-network-operator-master-e2e-openstack - 138 runs, 99% failed, 1% of failures match
pull-ci-openshift-cluster-network-operator-master-e2e-ovn-step-registry - 103 runs, 93% failed, 1% of failures match
pull-ci-openshift-cluster-network-operator-master-e2e-vsphere - 119 runs, 98% failed, 1% of failures match
pull-ci-openshift-machine-api-operator-master-e2e-azure - 49 runs, 67% failed, 3% of failures match
pull-ci-openshift-machine-config-operator-master-e2e-ovn-step-registry - 171 runs, 95% failed, 1% of failures match
release-openshift-ocp-installer-e2e-azure-ovn-4.6 - 59 runs, 83% failed, 2% of failures match
release-openshift-ocp-installer-e2e-openstack-4.6 - 74 runs, 88% failed, 2% of failures match
release-openshift-origin-installer-e2e-azure-4.6 - 221 runs, 64% failed, 3% of failures match


The failure ratio has down , and checked partly failed logs, can't reproduce the issue now , will verify .

Comment 6 errata-xmlrpc 2020-10-27 16:17:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Comment 7 errata-xmlrpc 2020-10-27 16:20:55 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Note You need to log in before you can comment on or make changes to this bug.