test: [sig-storage] PersistentVolumes-local [Volume type: block] Set fsGroup for local volume should set different fsGroup for second pod if first pod is deleted is failing frequently in CI, see search results: https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=%5C%5Bsig-storage%5C%5D+PersistentVolumes-local++%5C%5BVolume+type%3A+block%5C%5D+Set+fsGroup+for+local+volume+should+set+different+fsGroup+for+second+pod+if+first+pod+is+deleted https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.6/1303312111140605952 Error: Sep 08 13:28:57.670 W ns/e2e-pods-4635 pod/pod-submit-status-1-3 node/ip-10-0-162-39.us-east-2.compute.internal reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = `/usr/bin/runc --root /run/runc start 8b30b801154feeef313d287d6470b80b333a9b7c00b22168136bd07cf289af4f` failed: time="2020-09-08T13:28:55Z" level=error msg="cannot start an already running container"\ncannot start an already running container\n (exit status 1) Sep 08 13:28:57.680 I ns/e2e-statefulset-2707 pod/ss
Please file bugs against the component team responsible for the test unless there's a clear indication that there's another root cause.
The following failure seems to have the same cause. Adding here so sippy will recognize them as having an associated bug: [sig-storage] PersistentVolumes-local [Volume type: dir-link-bindmounted] Set fsGroup for local volume should set different fsGroup for second pod if first pod is deleted https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=4.6&maxMatches=5&maxBytes=20971520&groupBy=job&search=%5C%5Bsig-storage%5C%5D+PersistentVolumes-local++%5C%5BVolume+type%3A+dir-link-bindmounted%5C%5D+Set+fsGroup+for+local+volume+should+set+different+fsGroup+for+second+pod+if+first+pod+is+deleted
*** Bug 1877469 has been marked as a duplicate of this bug. ***
Bait of sippy: test: [sig-network] SCTP [Feature:SCTP] [LinuxOnly] should create a Pod with SCTP HostPort is failing frequently in CI, see search results: https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=%5C%5Bsig-network%5C%5D+SCTP+%5C%5BFeature%3ASCTP%5C%5D+%5C%5BLinuxOnly%5C%5D+should+create+a+Pod+with+SCTP+HostPort https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/promote-release-openshift-machine-os-content-e2e-aws-4.6/1303469614914605056 Error: Sep 09 00:03:29.961 W ns/e2e-gc-3203 pod/simpletest.deployment-7f7555f8bc-q9fwf node/ip-10-0-244-150.us-west-1.compute.internal reason/FailedMount MountVolume.SetUp failed for volume "default-token-lq2r4" : failed to sync secret cache: timed out waiting for the condition The root cause seems to be the same for SCTP and for local storage tests: something's wrong with host_exec.go: fail [k8s.io/kubernetes.0-rc.2/test/e2e/storage/utils/host_exec.go:110]: Unexpected error: <*errors.errorString | 0xc0002da8b0>: { s: "timed out waiting for the condition", } timed out waiting for the condition occurred
Looking at https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_installer/4132/pull-ci-openshift-installer-master-e2e-aws/1303747041729449984 (where is the lowest number of failed tests), in host_exec pod events (namespace e2e-persistent-local-volumes-test-1094): Sep 9 18:31:05.575: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for hostexec-ip-10-0-138-34.us-west-2.compute.internal-ndrlb: { } FailedScheduling: 0/6 nodes are available: 5 node(s) didn't match node selector. Where is the sixth node? The scheduler knows that there is 6 of them, but evaluated only 5. For more context, the host_exec has NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution to the node where the test needs to create a directory / block device / symlink / ...: https://github.com/openshift/origin/blob/16abec0d471f3c40e04622210edba33d43f21704/vendor/k8s.io/kubernetes/test/e2e/storage/utils/host_exec.go#L72 Unfortunately, the test does not log the destination node name and it's chosen randomly. You can check other test that failed for the same reason (e.g. "[sig-network] Networking Granular Checks: Services should .*")
*** Bug 1878756 has been marked as a duplicate of this bug. ***
*** Bug 1878750 has been marked as a duplicate of this bug. ***
Another sippy bait: test: [sig-storage] PersistentVolumes-local [Volume type: dir] Set fsGroup for local volume should set different fsGroup for second pod if first pod is deleted [sig-storage] PersistentVolumes-local [Volume type: dir-link] Set fsGroup for local volume should set different fsGroup for second pod if first pod is deleted
*** Bug 1880125 has been marked as a duplicate of this bug. ***
[sig-storage] PersistentVolumes-local [Volume type: tmpfs] Set fsGroup for local volume should set different fsGroup for second pod if first pod is deleted
Checking subset of "Set fsGroup for local volume should set different fsGroup for second pod if first pod is deleted" failed tests: - all testing pods we scheduled and got stuck in Pending due to "FailedCreatePodSandBox: Failed to create pod sandbox" - failed to continue due to "PV Create API error: PersistentVolume \"local-pvbvxnr\" is invalid: spec.local.path: Required value" - failed due to connection timed out Sending back to the Storage team for re-evaluation. I don't see any FailedScheduling error for `hostexec-XXX` pod anymore.
Upstream fix for missing reason messages for some nodes: https://github.com/kubernetes/kubernetes/pull/93355 In 4.5 fixed by https://github.com/openshift/origin/pull/25407 (backporting https://github.com/kubernetes/kubernetes/pull/93355) In 4.6 fixed by https://github.com/openshift/kubernetes/pull/325 (1.19.0 rebase containing https://github.com/kubernetes/kubernetes/pull/93355)
Feel free to re-open if the issues re-occurs again.