Bug 1748150

Summary:	[ci] [sig-scheduling] SchedulerPreemption [Serial] Test Panicked: runtime error: invalid memory address or nil pointer dereference
Product:	OpenShift Container Platform	Reporter:	Wenjing Zheng <wzheng>
Component:	kube-scheduler	Assignee:	Mike Dame <mdame>
Status:	CLOSED DUPLICATE	QA Contact:	Xingxing Xia <xxia>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	4.2.0	CC:	aos-bugs, geliu, maszulik, mfojtik, xxia
Target Milestone:	---
Target Release:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-09-10 15:11:37 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Wenjing Zheng 2019-09-03 02:30:35 UTC

Description of problem:
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-serial-4.2/86:

[sig-scheduling] SchedulerPreemption [Serial] validates pod anti-affinity works in preemption [Suite:openshift/conformance/serial] [Suite:k8s]

We can see in log(blob:null/a83e4bb6-e21d-4fa3-8e61-c56dfd67d938):
[AfterEach] [sig-scheduling] SchedulerPreemption [Serial] /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/scheduling/preemption.go:64 Sep 2 18:27:22.499: INFO: Running AfterSuite actions on all nodes Sep 2 18:27:22.499: INFO: Running AfterSuite actions on node 1 /opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/panic.go:522 +0x1b5 github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/scheduling.glob..func4.5() /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/scheduling/preemption.go:415 +0x1792 github.com/openshift/origin/vendor/github.com/onsi/ginkgo/internal/leafnodes.(*runner).runSync(0xc00095ede0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:233 +0x113 github.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).execute(0xc003317680, 0xc0033b1600, 0x1, 0x1, 0xc003317680, 0xc0033b1600) /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:756 +0x465 github.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc003316f00, 0x62dd218, 0xa5716d0, 0xa5716d0) /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:846 +0x2ec github.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).Execute(...) /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:794 main.main.func1(0xc003316f00, 0x0, 0x0) /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:71 +0x93 main.main() /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:72 +0x327 fail [runtime/panic.go:82]: Test Panicked: runtime error: invalid memory address or nil pointer dereference

Version-Release number of selected component (if applicable):
4.2.0-0.nightly-2019-09-02-172410

How reproducible:
sometimes

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Maciej Szulik 2019-09-03 08:09:57 UTC

Mike, it looks like your fix from https://github.com/openshift/origin/pull/23645 is causing that pods are not created, see logs:

Sep  2 18:26:45.291: INFO: Current cpu usage and memory usage is 1410, 3176136704
STEP: verifying the node has the label node ci-op-s3mbwj7y-3a8ca-hshst-worker-centralus1-fqkwq
Sep  2 18:26:45.445: INFO: Created pod: pod0-sched-preemption-medium-priority
Sep  2 18:26:45.445: INFO: Current cpu usage and memory usage is 1560, 3582984192
Sep  2 18:26:45.445: INFO: Node is heavily utilized, let's not create a pod there
Sep  2 18:26:45.445: INFO: Current cpu usage and memory usage is 1710, 3759144960
Sep  2 18:26:45.445: INFO: Node is heavily utilized, let's not create a pod there

but further down the code you're iterating over the pods array which is always created with exactly 4 elements and that
might fail with the panic above:

	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/scheduling/preemption.go:415 +0x1792
github.com/openshift/origin/vendor/github.com/onsi/ginkgo/internal/leafnodes.(*runner).runSync(0xc00095ede0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:233 +0x113
github.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).execute(0xc003317680, 0xc0033b1600, 0x1, 0x1, 0xc003317680, 0xc0033b1600)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:756 +0x465
github.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc003316f00, 0x62dd218, 0xa5716d0, 0xa5716d0)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:846 +0x2ec
github.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).Execute(...)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:794
main.main.func1(0xc003316f00, 0x0, 0x0)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:71 +0x93
main.main()
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:72 +0x327

Fix this ASAP.

Comment 3 Xingxing Xia 2019-09-10 09:58:23 UTC

Checked https://testgrid.k8s.io/redhat-openshift-release-informing#redhat-canary-openshift-ocp-installer-e2e-azure-serial-4.2&sort-by-flakiness&show-stale-tests= , the testing "validates pod anti-affinity works in preemption" is shown as not executed since 7 Sept, and shown as failed since 3 Sept.

Comment 4 Mike Dame 2019-09-10 15:10:05 UTC

That test is no longer executed because it was removed in https://github.com/openshift/origin/pull/23728 which also addresses https://bugzilla.redhat.com/show_bug.cgi?id=1743102. Due to that at this point I think this BZ is effectively a duplicate of that one so I am closing it.

Comment 5 Mike Dame 2019-09-10 15:11:37 UTC


*** This bug has been marked as a duplicate of bug 1743102 ***