Bug 1873467 - [sig-scheduling] SchedulerPriorities [Serial] Pod should be scheduled to node that don't match the PodAntiAffinity terms
Summary: [sig-scheduling] SchedulerPriorities [Serial] Pod should be scheduled to node...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.7.0
Assignee: Kir Kolyshkin
QA Contact: Sunil Choudhary
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-28 12:22 UTC by David Eads
Modified: 2020-12-08 16:15 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-12-08 16:15:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Comment 1 Jan Chaloupka 2020-08-31 10:33:26 UTC
Checking https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-serial-4.6/1299138542483542016:

Pod got scheduled, from the scheduler logs:
```
I0828 00:58:06.407370       1 scheduler.go:598] "Successfully bound pod to node" pod="e2e-sched-priority-7789/9c5ef79c-7671-44e9-8f23-7840b599ddb5-0" node="ip-10-0-155-103.ec2.internal" evaluatedNodes=5 feasibleNodes=1
```

kubelet did not run the pod though:

```
Aug 28 01:08:06.523: INFO: At 2020-08-28 00:58:08 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {multus } AddedInterface: Add eth0 [10.131.0.41/23]
Aug 28 01:08:06.523: INFO: At 2020-08-28 00:58:09 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {kubelet ip-10-0-155-103.ec2.internal} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = 
container create failed: time="2020-08-28T00:58:08Z" level=warning msg="signal: killed"
time="2020-08-28T00:58:08Z" level=error msg="container_linux.go:348: starting container process caused \"process_linux.go:354: sending config to init process caused \\\"write init-p: broken pipe\\\"\""
container_linux.go:348: starting container process caused "process_linux.go:354: sending config to init process caused \"write init-p: broken pipe\""

Aug 28 01:08:06.523: INFO: At 2020-08-28 00:58:10 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {multus } AddedInterface: Add eth0 [10.131.0.42/23]
Aug 28 01:08:06.523: INFO: At 2020-08-28 00:58:11 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {kubelet ip-10-0-155-103.ec2.internal} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = 
container create failed: time="2020-08-28T00:58:11Z" level=warning msg="signal: killed"
time="2020-08-28T00:58:11Z" level=error msg="container_linux.go:348: starting container process caused \"process_linux.go:438: container init caused \\\"read init-p: connection reset by peer\\\"\""
container_linux.go:348: starting container process caused "process_linux.go:438: container init caused \"read init-p: connection reset by peer\""

Aug 28 01:08:06.523: INFO: At 2020-08-28 00:58:12 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {multus } AddedInterface: Add eth0 [10.131.0.43/23]
Aug 28 01:08:06.523: INFO: At 2020-08-28 00:58:14 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {kubelet ip-10-0-155-103.ec2.internal} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = 
container create failed: time="2020-08-28T00:58:13Z" level=warning msg="signal: killed"
time="2020-08-28T00:58:13Z" level=error msg="container_linux.go:348: starting container process caused \"process_linux.go:354: sending config to init process caused \\\"write init-p: broken pipe\\\"\""
container_linux.go:348: starting container process caused "process_linux.go:354: sending config to init process caused \"write init-p: broken pipe\""

Aug 28 01:08:06.523: INFO: At 2020-08-28 00:58:16 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {multus } AddedInterface: Add eth0 [10.131.0.44/23]
Aug 28 01:08:06.523: INFO: At 2020-08-28 00:58:17 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {kubelet ip-10-0-155-103.ec2.internal} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = 
container create failed: time="2020-08-28T00:58:16Z" level=warning msg="signal: killed"
time="2020-08-28T00:58:16Z" level=error msg="container_linux.go:348: starting container process caused \"process_linux.go:438: container init caused \\\"read init-p: connection reset by peer\\\"\""
container_linux.go:348: starting container process caused "process_linux.go:438: container init caused \"read init-p: connection reset by peer\""

Aug 28 01:08:06.523: INFO: At 2020-08-28 00:58:19 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {kubelet ip-10-0-155-103.ec2.internal} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: time="2020-08-28T00:58:19Z" level=warning msg="signal: killed"
time="2020-08-28T00:58:19Z" level=error msg="container_linux.go:348: starting container process caused \"process_linux.go:354: sending config to init process caused \\\"write init-p: broken pipe\\\"\""
container_linux.go:348: starting container process caused "process_linux.go:354: sending config to init process caused \"write init-p: broken pipe\""

Aug 28 01:08:06.523: INFO: At 2020-08-28 00:58:19 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {multus } AddedInterface: Add eth0 [10.131.0.45/23]
Aug 28 01:08:06.523: INFO: At 2020-08-28 00:58:22 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {multus } AddedInterface: Add eth0 [10.131.0.46/23]
Aug 28 01:08:06.523: INFO: At 2020-08-28 00:58:23 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {kubelet ip-10-0-155-103.ec2.internal} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: time="2020-08-28T00:58:23Z" level=warning msg="signal: killed"
time="2020-08-28T00:58:23Z" level=error msg="container_linux.go:348: starting container process caused \"process_linux.go:354: sending config to init process caused \\\"write init-p: broken pipe\\\"\""
container_linux.go:348: starting container process caused "process_linux.go:354: sending config to init process caused \"write init-p: broken pipe\""

Aug 28 01:08:06.523: INFO: At 2020-08-28 00:58:26 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {multus } AddedInterface: Add eth0 [10.131.0.47/23]
Aug 28 01:08:06.523: INFO: At 2020-08-28 00:58:27 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {kubelet ip-10-0-155-103.ec2.internal} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: time="2020-08-28T00:58:27Z" level=warning msg="signal: killed"
time="2020-08-28T00:58:27Z" level=error msg="container_linux.go:348: starting container process caused \"process_linux.go:354: sending config to init process caused \\\"write init-p: broken pipe\\\"\""
container_linux.go:348: starting container process caused "process_linux.go:354: sending config to init process caused \"write init-p: broken pipe\""

Aug 28 01:08:06.523: INFO: At 2020-08-28 00:58:28 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {multus } AddedInterface: Add eth0 [10.131.0.48/23]
Aug 28 01:08:06.523: INFO: At 2020-08-28 00:59:34 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {kubelet ip-10-0-155-103.ec2.internal} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: time="2020-08-28T00:59:34Z" level=error msg="container_linux.go:348: starting container process caused \"process_linux.go:317: getting the final child's pid from pipe caused \\\"EOF\\\"\""
container_linux.go:348: starting container process caused "process_linux.go:317: getting the final child's pid from pipe caused \"EOF\""

Aug 28 01:08:06.523: INFO: At 2020-08-28 00:59:36 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {multus } AddedInterface: Add eth0 [10.131.0.49/23]
Aug 28 01:08:06.523: INFO: At 2020-08-28 00:59:37 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {kubelet ip-10-0-155-103.ec2.internal} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: time="2020-08-28T00:59:37Z" level=warning msg="signal: killed"
time="2020-08-28T00:59:37Z" level=error msg="container_linux.go:348: starting container process caused \"process_linux.go:438: container init caused \\\"read init-p: connection reset by peer\\\"\""
container_linux.go:348: starting container process caused "process_linux.go:438: container init caused \"read init-p: connection reset by peer\""

Aug 28 01:08:06.523: INFO: At 2020-08-28 00:59:39 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {multus } AddedInterface: Add eth0 [10.131.0.50/23]
Aug 28 01:08:06.523: INFO: At 2020-08-28 00:59:40 +0000 UTC - event for 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0: {kubelet ip-10-0-155-103.ec2.internal} FailedCreatePodSandBox: (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: time="2020-08-28T01:03:45Z" level=error msg="container_linux.go:348: starting container process caused \"process_linux.go:317: getting the final child's pid from pipe caused \\\"EOF\\\"\""
container_linux.go:348: starting container process caused "process_linux.go:317: getting the final child's pid from pipe caused \"EOF\""
```



```
Aug 28 01:08:06.542: INFO: POD                                     NODE                          PHASE    GRACE  CONDITIONS
Aug 28 01:08:06.542: INFO: 9c5ef79c-7671-44e9-8f23-7840b599ddb5-0  ip-10-0-155-103.ec2.internal  Pending         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2020-08-28 00:58:06 +0000 UTC  } {Ready False 0001-01-01 00:00:00 +0000 UTC 2020-08-28 00:58:06 +0000 UTC ContainersNotReady containers with unready status: [9c5ef79c-7671-44e9-8f23-7840b599ddb5-0]} {ContainersReady False 0001-01-01 00:00:00 +0000 UTC 2020-08-28 00:58:06 +0000 UTC ContainersNotReady containers with unready status: [9c5ef79c-7671-44e9-8f23-7840b599ddb5-0]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2020-08-28 00:58:06 +0000 UTC  }]
```

Comment 3 Seth Jennings 2020-08-31 18:32:56 UTC
Assorted errors from runtime

process_linux.go:354: sending config to init process caused \"write init-p: broken pipe\"
process_linux.go:438: container init caused \"read init-p: connection reset by peer\"
process_linux.go:317: getting the final child's pid from pipe caused \"EOF\"

Comment 6 Seth Jennings 2020-09-14 20:00:55 UTC
This is not a 4.6 blocker.

Looking quickly at testgrid, this seems to fail in a group with ~7 other tests.  There is likely some external cause.

Comment 12 Kir Kolyshkin 2020-11-25 05:28:43 UTC
1. Took a look at 10 recent failures (in 4.7) using this search:

https://search.ci.openshift.org/?search=%5C%5Bsig-scheduling%5C%5D+SchedulerPriorities+%5C%5BSerial%5C%5D+Pod+should+be+scheduled+to+node+that+don%27t+match+the+PodAntiAffinity+terms&maxAge=336h&context=1&type=junit&name=4.7.*serial%7Cserial.*4.7&maxMatches=5&maxBytes=20971520&groupBy=job

and they all have from 3 to 20 other failed tests as well, so the failure is not isolated.

2. There is also a few similar bugs:
 - https://bugzilla.redhat.com/show_bug.cgi?id=1852857
 - https://bugzilla.redhat.com/show_bug.cgi?id=1749246

(read those, but was not able to make anything out of)

3. All these

> process_linux.go:354: sending config to init process caused \"write init-p: broken pipe\"
> process_linux.go:438: container init caused \"read init-p: connection reset by peer\"
> process_linux.go:317: getting the final child's pid from pipe caused \"EOF\"

means the underlying containers are dying during early startup.


Note You need to log in before you can comment on or make changes to this bug.