Bug 1710973

Summary: "oc adm must-gather" test in e2e-vsphere suite failing in origin repo
Product: OpenShift Container Platform Reporter: Matthew Staebler <mstaeble>
Component: InstallerAssignee: Matthew Staebler <mstaeble>
Installer sub component: openshift-installer QA Contact: sheng.lao <shlao>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: bleanhar
Version: 4.1.0   
Target Milestone: ---   
Target Release: 4.1.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-15 14:24:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Matthew Staebler 2019-05-16 16:36:01 UTC
[cli] oc adm must-gather runs successfully [Suite:openshift/conformance/parallel] test fails when running e2e-vsphere tests in the origin repo.

---
[AfterEach] [cli] oc adm must-gather
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/cli.go:86
STEP: Collecting events from namespace "e2e-test-oc-adm-must-gather-4xm5c".
STEP: Found 0 events.
May 13 20:20:31.120: INFO: skipping dumping cluster info - cluster too large
STEP: Deleting namespaces
May 13 20:20:31.354: INFO: namespace : e2e-test-oc-adm-must-gather-4xm5c api call to delete is complete 
STEP: Waiting for namespaces to vanish
[AfterEach] [cli] oc adm must-gather
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:154
May 13 20:20:37.570: INFO: Waiting up to 3m0s for all (but 100) nodes to be ready
May 13 20:20:37.787: INFO: Running AfterSuite actions on all nodes
May 13 20:20:37.787: INFO: Running AfterSuite actions on node 1
fail [github.com/openshift/origin/test/extended/cli/mustgather.go:73]: Expected error:
    <*errors.errorString | 0xc0011d25a0>: {
        s: "expected files should not be empty: /tmp/test.oc-adm-must-gather.507599836/audit_logs/openshift-apiserver.audit_logs_listing",
    }
    expected files should not be empty: /tmp/test.oc-adm-must-gather.507599836/audit_logs/openshift-apiserver.audit_logs_listing
not to have occurred

May 13 20:18:33.993 I ns/openshift-must-gather-7nnqx pod/must-gather-q96cp node/ created
May 13 20:18:34.003 I ns/openshift-must-gather-7nnqx pod/must-gather-q96cp Successfully assigned openshift-must-gather-7nnqx/must-gather-q96cp to compute-2
May 13 20:18:42.070 I ns/openshift-must-gather-7nnqx pod/must-gather-q96cp Pulling image "registry.svc.ci.openshift.org/ci-op-dm0p7pq2/stable@sha256:2dff3143e6960606c097246f65250e8164674b2e10423244d9c5c7a8805fbdc0"
May 13 20:18:49.981 I ns/openshift-must-gather-7nnqx pod/must-gather-q96cp Successfully pulled image "registry.svc.ci.openshift.org/ci-op-dm0p7pq2/stable@sha256:2dff3143e6960606c097246f65250e8164674b2e10423244d9c5c7a8805fbdc0"
May 13 20:18:50.205 I ns/openshift-must-gather-7nnqx pod/must-gather-q96cp Created container gather
May 13 20:18:50.235 I ns/openshift-must-gather-7nnqx pod/must-gather-q96cp Started container gather
May 13 20:19:35.392 - 44s   W ns/openshift-must-gather-7nnqx pod/must-gather-q96cp node/compute-2 pod has been pending longer than a minute
May 13 20:20:21.556 I ns/openshift-must-gather-7nnqx pod/must-gather-q96cp Container image "registry.svc.ci.openshift.org/ci-op-dm0p7pq2/stable@sha256:2dff3143e6960606c097246f65250e8164674b2e10423244d9c5c7a8805fbdc0" already present on machine
May 13 20:20:21.812 I ns/openshift-must-gather-7nnqx pod/must-gather-q96cp Created container copy
May 13 20:20:21.836 I ns/openshift-must-gather-7nnqx pod/must-gather-q96cp Started container copy
May 13 20:20:36.288 W ns/openshift-must-gather-7nnqx pod/must-gather-q96cp node/compute-2 graceful deletion within 0s
May 13 20:20:36.293 W ns/openshift-must-gather-7nnqx pod/must-gather-q96cp node/compute-2 deleted

failed: (2m19s) 2019-05-13T20:20:37 "[cli] oc adm must-gather runs successfully [Suite:openshift/conformance/parallel]"

Comment 1 Matthew Staebler 2019-05-16 17:57:56 UTC
This is failing because the audit log listing is only 78 bytes. The test is verifying that the size is at least 100 bytes.

Comment 3 sheng.lao 2019-07-30 02:40:13 UTC
Cloud you provide the url of this pr, so that I can verify it with accurate log.

Comment 4 sheng.lao 2019-08-01 02:09:20 UTC
@mstaeble Could you please give me more hints about how to reproduce the bug?

Comment 5 sheng.lao 2019-08-05 06:59:58 UTC
In the latest version, if the file size is greater than 50 bytes, then it is not empty, as the following code shows:
108  stat, err := os.Stat(expectedFilePath)
109  o.Expect(err).ToNot(o.HaveOccurred())
110  if size := stat.Size(); size < 50 {
111      emptyFiles = append(emptyFiles, expectedFilePath)
112  }

Verification steps:
# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS                                                                        
version   4.2.0-0.nightly-2019-08-01-113533   True        False         23m     Cluster version is 4.2.0-0.nightly-2019-08-01-113533

# openshift-tests run-test "[cli] oc adm must-gather runs successfully [Suite:openshift/conformance/parallel]"
...
INFO: Worker host service log collection to complete.
audit_logs/kube-apiserver.audit_logs_listing
audit_logs/openshift-apiserver.audit_logs_listing
...

Comment 7 errata-xmlrpc 2019-08-15 14:24:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2417