Bug 1884697
Summary: | suddenly new failure: [k8s.io] [sig-node] Pods Extended [k8s.io] Pod Container Status should never report success for a pending container | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | David Eads <deads> |
Component: | Node | Assignee: | Seth Jennings <sjenning> |
Status: | CLOSED DUPLICATE | QA Contact: | Sunil Choudhary <schoudha> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.6 | CC: | aos-bugs, jokerman, nagrawal, sjenning |
Target Milestone: | --- | ||
Target Release: | 4.7.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: |
[k8s.io] [sig-node] Pods Extended [k8s.io] Pod Container Status should never report success for a pending container
|
|
Last Closed: | 2020-10-08 14:57:07 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
David Eads
2020-10-02 17:31:07 UTC
Looking at https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.6/1312045708018192384 for a non-RT baseline $ grep status-1-5 test.log Oct 2 15:53:51.310: FAIL: timed out waiting for watch events for pod-submit-status-1-5 Oct 2 15:53:51.326: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for pod-submit-status-1-5: { } Scheduled: Successfully assigned e2e-pods-9247/pod-submit-status-1-5 to ci-op-vyjjxg8x-95aca-pnr79-worker-d-9cptd Oct 2 15:53:51.327: INFO: At 2020-10-02 15:48:54 +0000 UTC - event for pod-submit-status-1-5: {multus } AddedInterface: Add eth0 [10.128.2.118/23] Oct 2 15:53:51.327: INFO: At 2020-10-02 15:48:55 +0000 UTC - event for pod-submit-status-1-5: {kubelet ci-op-vyjjxg8x-95aca-pnr79-worker-d-9cptd} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = error reading container (probably exited) json message: EOF Oct 2 15:53:51.344: INFO: pod-submit-status-1-5 ci-op-vyjjxg8x-95aca-pnr79-worker-d-9cptd Pending 1s [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2020-10-02 15:48:51 +0000 UTC } {Ready False 0001-01-01 00:00:00 +0000 UTC 2020-10-02 15:48:51 +0000 UTC ContainersNotReady containers with unready status: [busybox]} {ContainersReady False 0001-01-01 00:00:00 +0000 UTC 2020-10-02 15:48:51 +0000 UTC ContainersNotReady containers with unready status: [busybox]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2020-10-02 15:48:51 +0000 UTC }] fail [k8s.io/kubernetes.0/test/e2e/node/pods.go:323]: Oct 2 15:53:51.310: timed out waiting for watch events for pod-submit-status-1-5 pod-submit-status-1-5 seems to fail to start due to "error reading container (probably exited) json message: EOF" $ grep EOF test.log Oct 2 15:53:51.326: INFO: At 2020-10-02 15:48:37 +0000 UTC - event for pod-submit-status-2-2: {kubelet ci-op-vyjjxg8x-95aca-pnr79-worker-b-pxbvj} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = error reading container (probably exited) json message: EOF Oct 2 15:53:51.327: INFO: At 2020-10-02 15:48:55 +0000 UTC - event for pod-submit-status-1-5: {kubelet ci-op-vyjjxg8x-95aca-pnr79-worker-d-9cptd} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = error reading container (probably exited) json message: EOF Oct 2 15:53:51.327: INFO: At 2020-10-02 15:49:10 +0000 UTC - event for pod-submit-status-0-6: {kubelet ci-op-vyjjxg8x-95aca-pnr79-worker-d-9cptd} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = error reading container (probably exited) json message: EOF Oct 2 15:53:51.327: INFO: At 2020-10-02 15:49:47 +0000 UTC - event for pod-submit-status-0-13: {kubelet ci-op-vyjjxg8x-95aca-pnr79-worker-d-9cptd} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = error reading container (probably exited) json message: EOF 3 other pods in that same test experience the same error $ grep -e status-2-2 -e status-1-5 -e status-0-6 -e status-0-13 test.log Oct 2 15:48:39.015: INFO: watch delete seen for pod-submit-status-2-2 Oct 2 15:48:39.015: INFO: Pod pod-submit-status-2-2 on node ci-op-vyjjxg8x-95aca-pnr79-worker-b-pxbvj timings total=7.015599937s t=694ms run=0s execute=0s Oct 2 15:49:08.559: INFO: watch delete seen for pod-submit-status-0-6 Oct 2 15:49:08.559: INFO: Pod pod-submit-status-0-6 on node ci-op-vyjjxg8x-95aca-pnr79-worker-d-9cptd timings total=7.595781401s t=371ms run=0s execute=0s Oct 2 15:49:50.799: INFO: watch delete seen for pod-submit-status-0-13 Oct 2 15:49:50.799: INFO: Pod pod-submit-status-0-13 on node ci-op-vyjjxg8x-95aca-pnr79-worker-d-9cptd timings total=7.416990452s t=991ms run=0s execute=0s Oct 2 15:53:51.310: FAIL: timed out waiting for watch events for pod-submit-status-1-5 However, the other 3 have to expected watch delete. Similar situation with the mention gcp-rt run $ grep pod-submit-status-2-10 test.log Oct 2 12:54:47.949: FAIL: timed out waiting for watch events for pod-submit-status-2-10 Oct 2 12:54:47.958: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for pod-submit-status-2-10: { } Scheduled: Successfully assigned e2e-pods-4964/pod-submit-status-2-10 to ci-op-tbz0thi7-1a9c0-gkkhf-worker-c-vzd5b Oct 2 12:54:47.958: INFO: At 2020-10-02 12:49:49 +0000 UTC - event for pod-submit-status-2-10: {multus } AddedInterface: Add eth0 [10.128.2.190/23] Oct 2 12:54:47.958: INFO: At 2020-10-02 12:49:53 +0000 UTC - event for pod-submit-status-2-10: {kubelet ci-op-tbz0thi7-1a9c0-gkkhf-worker-c-vzd5b} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = error reading container (probably exited) json message: EOF Oct 2 12:54:47.964: INFO: pod-submit-status-2-10 ci-op-tbz0thi7-1a9c0-gkkhf-worker-c-vzd5b Pending 1s [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2020-10-02 12:49:46 +0000 UTC } {Ready False 0001-01-01 00:00:00 +0000 UTC 2020-10-02 12:49:46 +0000 UTC ContainersNotReady containers with unready status: [busybox]} {ContainersReady False 0001-01-01 00:00:00 +0000 UTC 2020-10-02 12:49:46 +0000 UTC ContainersNotReady containers with unready status: [busybox]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2020-10-02 12:49:46 +0000 UTC }] fail [k8s.io/kubernetes.0/test/e2e/node/pods.go:323]: Oct 2 12:54:47.949: timed out waiting for watch events for pod-submit-status-2-10 $ grep EOF test.log Oct 2 12:54:47.958: INFO: At 2020-10-02 12:48:42 +0000 UTC - event for pod-submit-status-0-3: {kubelet ci-op-tbz0thi7-1a9c0-gkkhf-worker-d-wzhpc} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = error reading container (probably exited) json message: EOF Oct 2 12:54:47.958: INFO: At 2020-10-02 12:49:53 +0000 UTC - event for pod-submit-status-2-10: {kubelet ci-op-tbz0thi7-1a9c0-gkkhf-worker-c-vzd5b} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = error reading container (probably exited) json message: EOF Oct 2 12:54:47.958: INFO: At 2020-10-02 12:50:24 +0000 UTC - event for pod-submit-status-0-13: {kubelet ci-op-tbz0thi7-1a9c0-gkkhf-worker-c-vzd5b} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = error reading container (probably exited) json message: EOF The onset of the issue is correlated with a bump in runc. There were known issues with the bump fixed in https://bugzilla.redhat.com/show_bug.cgi?id=1883640 There is also a race in the test itself resolved by this upstream PR https://github.com/kubernetes/kubernetes/pull/95261 o/k pick https://github.com/openshift/kubernetes/pull/385 Leaving this in 4.6 over the weekend to observe cleaned up CI signal after race is removed. I want to make sure the runc bump didn't do something bad. If the test is still failing at a higher rate, we can try reverting the runc bump and just pick forward the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1883640 *** This bug has been marked as a duplicate of bug 1884035 *** |