Bug 2021135

Summary: [azure-file-csi-driver] "make unit-test" returns non-zero code, but tests pass
Product: OpenShift Container Platform Reporter: Fabio Bertinatto <fbertina>
Component: StorageAssignee: Jan Safranek <jsafrane>
Storage sub component: Kubernetes External Components QA Contact: Wei Duan <wduan>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: aos-bugs, jsafrane
Version: 4.10   
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-10 16:26:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Fabio Bertinatto 2021-11-08 12:58:08 UTC
Even though all tests pass, "make unit-test" occasionally exits with a non-zero return code.

This happens because the "vet" utility, that is called by "go test", is killed and fails:

go test -v -race ./pkg/... ./test/utils/credentials
/usr/lib/golang/pkg/tool/linux_amd64/vet: signal: killed
/usr/lib/golang/pkg/tool/linux_amd64/vet: signal: killed
/usr/lib/golang/pkg/tool/linux_amd64/vet: signal: killed
/usr/lib/golang/pkg/tool/linux_amd64/vet: signal: killed
(...)

Here is an example job:

https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_release/23165/rehearse-23165-pull-ci-openshift-azure-file-csi-driver-master-unit2/1455228476146585600/build-log.txt

For the time being we have disable "go vet" in our the "unit" CI job, but that should be reverted and they underlying issue fixed:

https://github.com/openshift/release/commit/c8066e2e385e4563433cf06f08cae62bb73dd636#diff-75299b45d9fd8e4fae4211fb7c9dcba6d02cbf691985d9c1bf776895f3cd005aR45

Comment 1 Jan Safranek 2021-11-09 15:23:52 UTC
While the CI job was "fixed" to run `go test -vet=off`, here we want to investigate *why* CI kills our `make unit-test`.

Comment 2 Jan Safranek 2021-11-10 09:28:45 UTC
This is the test container as executed by CI:

    - resources:
        limits:
          memory: 4Gi
        requests:
          cpu: 100m
          memory: 200Mi


with `/bin/time -v make unit-test` in the same pod I got:

	Command being timed: "make unit-test"
	User time (seconds): 309.93
	System time (seconds): 30.03
	Percent of CPU this job got: 493%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 1:08.92
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 1687828
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 45
	Minor (reclaiming a frame) page faults: 4179941
	Voluntary context switches: 443988
	Involuntary context switches: 55552
	Swaps: 0
	File system inputs: 15296
	File system outputs: 3266488
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0


The tests need ~ 1.6 GB of memory.

Comment 3 Jan Safranek 2021-11-10 10:31:22 UTC
This says that the current average is 2Gi: https://resources.ci.openshift.org/usage/pods?branch=master&container=test&org=openshift&repo=azure-file-csi-driver&target=unit&variant=
I'm restoring `make test-unit` and adding bigger limits in the linked PR.

Comment 8 errata-xmlrpc 2022-03-10 16:26:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056