test: [sig-scheduling] SchedulerPriorities [Serial] Pod should be scheduled to node that don't match the PodAntiAffinity terms is failing frequently in CI, see search results: https://search.ci.openshift.org/?maxAge=168h&context=1&type=junit&maxMatches=5&maxBytes=20971520&groupBy=job&name=4.6.*serial|serial.*4.6&search=%5C%5Bsig-scheduling%5C%5D+SchedulerPriorities+%5C%5BSerial%5C%5D+Pod+should+be+scheduled+to+node+that+don%27t+match+the+PodAntiAffinity+terms This is failing over 5% of aws serial runs. https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-serial-4.6/1299138542483542016
Checking https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-serial-4.6/1299138542483542016: Pod got scheduled, from the scheduler logs: ``` I0828 00:58:06.407370 1 scheduler.go:598] "Successfully bound pod to node" pod="e2e-sched-priority-7789/9c5ef79c-7671-44e9-8f23-7840b599ddb5-0" node="ip-10-0-155-103.ec2.internal" evaluatedNodes=5 feasibleNodes=1 ``` kubelet did not run the pod though: ``` Aug 28 01:08:06.523: INFO: At 2020-08-28 00:58:08 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {multus } AddedInterface: Add eth0 [10.131.0.41/23] Aug 28 01:08:06.523: INFO: At 2020-08-28 00:58:09 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {kubelet ip-10-0-155-103.ec2.internal} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: time="2020-08-28T00:58:08Z" level=warning msg="signal: killed" time="2020-08-28T00:58:08Z" level=error msg="container_linux.go:348: starting container process caused \"process_linux.go:354: sending config to init process caused \\\"write init-p: broken pipe\\\"\"" container_linux.go:348: starting container process caused "process_linux.go:354: sending config to init process caused \"write init-p: broken pipe\"" Aug 28 01:08:06.523: INFO: At 2020-08-28 00:58:10 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {multus } AddedInterface: Add eth0 [10.131.0.42/23] Aug 28 01:08:06.523: INFO: At 2020-08-28 00:58:11 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {kubelet ip-10-0-155-103.ec2.internal} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: time="2020-08-28T00:58:11Z" level=warning msg="signal: killed" time="2020-08-28T00:58:11Z" level=error msg="container_linux.go:348: starting container process caused \"process_linux.go:438: container init caused \\\"read init-p: connection reset by peer\\\"\"" container_linux.go:348: starting container process caused "process_linux.go:438: container init caused \"read init-p: connection reset by peer\"" Aug 28 01:08:06.523: INFO: At 2020-08-28 00:58:12 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {multus } AddedInterface: Add eth0 [10.131.0.43/23] Aug 28 01:08:06.523: INFO: At 2020-08-28 00:58:14 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {kubelet ip-10-0-155-103.ec2.internal} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: time="2020-08-28T00:58:13Z" level=warning msg="signal: killed" time="2020-08-28T00:58:13Z" level=error msg="container_linux.go:348: starting container process caused \"process_linux.go:354: sending config to init process caused \\\"write init-p: broken pipe\\\"\"" container_linux.go:348: starting container process caused "process_linux.go:354: sending config to init process caused \"write init-p: broken pipe\"" Aug 28 01:08:06.523: INFO: At 2020-08-28 00:58:16 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {multus } AddedInterface: Add eth0 [10.131.0.44/23] Aug 28 01:08:06.523: INFO: At 2020-08-28 00:58:17 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {kubelet ip-10-0-155-103.ec2.internal} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: time="2020-08-28T00:58:16Z" level=warning msg="signal: killed" time="2020-08-28T00:58:16Z" level=error msg="container_linux.go:348: starting container process caused \"process_linux.go:438: container init caused \\\"read init-p: connection reset by peer\\\"\"" container_linux.go:348: starting container process caused "process_linux.go:438: container init caused \"read init-p: connection reset by peer\"" Aug 28 01:08:06.523: INFO: At 2020-08-28 00:58:19 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {kubelet ip-10-0-155-103.ec2.internal} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: time="2020-08-28T00:58:19Z" level=warning msg="signal: killed" time="2020-08-28T00:58:19Z" level=error msg="container_linux.go:348: starting container process caused \"process_linux.go:354: sending config to init process caused \\\"write init-p: broken pipe\\\"\"" container_linux.go:348: starting container process caused "process_linux.go:354: sending config to init process caused \"write init-p: broken pipe\"" Aug 28 01:08:06.523: INFO: At 2020-08-28 00:58:19 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {multus } AddedInterface: Add eth0 [10.131.0.45/23] Aug 28 01:08:06.523: INFO: At 2020-08-28 00:58:22 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {multus } AddedInterface: Add eth0 [10.131.0.46/23] Aug 28 01:08:06.523: INFO: At 2020-08-28 00:58:23 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {kubelet ip-10-0-155-103.ec2.internal} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: time="2020-08-28T00:58:23Z" level=warning msg="signal: killed" time="2020-08-28T00:58:23Z" level=error msg="container_linux.go:348: starting container process caused \"process_linux.go:354: sending config to init process caused \\\"write init-p: broken pipe\\\"\"" container_linux.go:348: starting container process caused "process_linux.go:354: sending config to init process caused \"write init-p: broken pipe\"" Aug 28 01:08:06.523: INFO: At 2020-08-28 00:58:26 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {multus } AddedInterface: Add eth0 [10.131.0.47/23] Aug 28 01:08:06.523: INFO: At 2020-08-28 00:58:27 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {kubelet ip-10-0-155-103.ec2.internal} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: time="2020-08-28T00:58:27Z" level=warning msg="signal: killed" time="2020-08-28T00:58:27Z" level=error msg="container_linux.go:348: starting container process caused \"process_linux.go:354: sending config to init process caused \\\"write init-p: broken pipe\\\"\"" container_linux.go:348: starting container process caused "process_linux.go:354: sending config to init process caused \"write init-p: broken pipe\"" Aug 28 01:08:06.523: INFO: At 2020-08-28 00:58:28 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {multus } AddedInterface: Add eth0 [10.131.0.48/23] Aug 28 01:08:06.523: INFO: At 2020-08-28 00:59:34 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {kubelet ip-10-0-155-103.ec2.internal} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: time="2020-08-28T00:59:34Z" level=error msg="container_linux.go:348: starting container process caused \"process_linux.go:317: getting the final child's pid from pipe caused \\\"EOF\\\"\"" container_linux.go:348: starting container process caused "process_linux.go:317: getting the final child's pid from pipe caused \"EOF\"" Aug 28 01:08:06.523: INFO: At 2020-08-28 00:59:36 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {multus } AddedInterface: Add eth0 [10.131.0.49/23] Aug 28 01:08:06.523: INFO: At 2020-08-28 00:59:37 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {kubelet ip-10-0-155-103.ec2.internal} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: time="2020-08-28T00:59:37Z" level=warning msg="signal: killed" time="2020-08-28T00:59:37Z" level=error msg="container_linux.go:348: starting container process caused \"process_linux.go:438: container init caused \\\"read init-p: connection reset by peer\\\"\"" container_linux.go:348: starting container process caused "process_linux.go:438: container init caused \"read init-p: connection reset by peer\"" Aug 28 01:08:06.523: INFO: At 2020-08-28 00:59:39 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {multus } AddedInterface: Add eth0 [10.131.0.50/23] Aug 28 01:08:06.523: INFO: At 2020-08-28 00:59:40 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {kubelet ip-10-0-155-103.ec2.internal} FailedCreatePodSandBox: (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: time="2020-08-28T01:03:45Z" level=error msg="container_linux.go:348: starting container process caused \"process_linux.go:317: getting the final child's pid from pipe caused \\\"EOF\\\"\"" container_linux.go:348: starting container process caused "process_linux.go:317: getting the final child's pid from pipe caused \"EOF\"" ``` ``` Aug 28 01:08:06.542: INFO: POD NODE PHASE GRACE CONDITIONS Aug 28 01:08:06.542: INFO: 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0 ip-10-0-155-103.ec2.internal Pending [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2020-08-28 00:58:06 +0000 UTC } {Ready False 0001-01-01 00:00:00 +0000 UTC 2020-08-28 00:58:06 +0000 UTC ContainersNotReady containers with unready status: [9c5ef79c-7671-44e9-8f23-7840b599ddb5-0]} {ContainersReady False 0001-01-01 00:00:00 +0000 UTC 2020-08-28 00:58:06 +0000 UTC ContainersNotReady containers with unready status: [9c5ef79c-7671-44e9-8f23-7840b599ddb5-0]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2020-08-28 00:58:06 +0000 UTC }] ```
Checking other jobs from https://search.ci.openshift.org/?maxAge=168h&context=1&type=junit&maxMatches=5&maxBytes=20971520&groupBy=job&name=4.6.*serial|serial.*4.6&search=%5C%5Bsig-scheduling%5C%5D+SchedulerPriorities+%5C%5BSerial%5C%5D+Pod+should+be+scheduled+to+node+that+don%27t+match+the+PodAntiAffinity+terms, they are all failing with the same failure/reason. Switching to node team to check the kubelet side.
Assorted errors from runtime process_linux.go:354: sending config to init process caused \"write init-p: broken pipe\" process_linux.go:438: container init caused \"read init-p: connection reset by peer\" process_linux.go:317: getting the final child's pid from pipe caused \"EOF\"
This is not a 4.6 blocker. Looking quickly at testgrid, this seems to fail in a group with ~7 other tests. There is likely some external cause.
Testgrid link https://testgrid.k8s.io/redhat-openshift-ocp-release-4.6-blocking#release-openshift-ocp-installer-e2e-aws-serial-4.6
1. Took a look at 10 recent failures (in 4.7) using this search: https://search.ci.openshift.org/?search=%5C%5Bsig-scheduling%5C%5D+SchedulerPriorities+%5C%5BSerial%5C%5D+Pod+should+be+scheduled+to+node+that+don%27t+match+the+PodAntiAffinity+terms&maxAge=336h&context=1&type=junit&name=4.7.*serial%7Cserial.*4.7&maxMatches=5&maxBytes=20971520&groupBy=job and they all have from 3 to 20 other failed tests as well, so the failure is not isolated. 2. There is also a few similar bugs: - https://bugzilla.redhat.com/show_bug.cgi?id=1852857 - https://bugzilla.redhat.com/show_bug.cgi?id=1749246 (read those, but was not able to make anything out of) 3. All these > process_linux.go:354: sending config to init process caused \"write init-p: broken pipe\" > process_linux.go:438: container init caused \"read init-p: connection reset by peer\" > process_linux.go:317: getting the final child's pid from pipe caused \"EOF\" means the underlying containers are dying during early startup.