Bug 1959495 - InstallerPodContainerWaiting_ContainerCreating cannot find volume kubelet-dir
Summary: InstallerPodContainerWaiting_ContainerCreating cannot find volume kubelet-dir
Keywords:
Status: CLOSED DUPLICATE of bug 1952224
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.8
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.8.0
Assignee: Ryan Phillips
QA Contact: Sunil Choudhary
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-11 16:41 UTC by W. Trevor King
Modified: 2021-06-01 18:21 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-06-01 18:21:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubernetes kubernetes pull 101919 0 None open kubelet: do not propogate an error with startContainer on deleted pods 2021-05-11 19:59:56 UTC

Description W. Trevor King 2021-05-11 16:41:20 UTC
Canary job [1]:

  operator conditions kube-apiserver	0s
    Operator degraded (InstallerPodContainerWaiting_ContainerCreating): InstallerPodContainerWaitingDegraded: Pod "installer-6-ip-10-0-163-180.us-west-2.compute.internal" on node "ip-10-0-163-180.us-west-2.compute.internal" container "installer" is waiting since 2021-05-11 02:48:24 +0000 UTC because ContainerCreating

  [sig-arch][Early] Managed cluster should start all core operators [Skipped:Disconnected] [Suite:openshift/conformance/parallel]
  Run #0: Failed expand_less	1s
    fail [github.com/onsi/ginkgo.0-origin.0+incompatible/internal/leafnodes/runner.go:113]: May 11 03:02:44.881: Some cluster operators are not ready: kube-apiserver (Degraded=True InstallerPodContainerWaiting_ContainerCreating: InstallerPodContainerWaitingDegraded: Pod "installer-6-ip-10-0-163-180.us-west-2.compute.internal" on node "ip-10-0-163-180.us-west-2.compute.internal" container "installer" is waiting since 2021-05-11 02:48:24 +0000 UTC because ContainerCreating)

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-canary/1391942499508948992/artifacts/e2e-aws-canary/gather-extra/artifacts/pods.json | jq -r '.items[] | select(.metadata | .namespace == "openshift-kube-apiserver" and .name == "installer-6-ip-10-0-163-180.us-west-2.compute.internal").status.containerStatuses[]'
{
  "image": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3c17486313a82d91260da08357f5a3139723ad50ecd1078fdff5ad4b99f40a6a",
  "imageID": "",
  "lastState": {},
  "name": "installer",
  "ready": false,
  "restartCount": 0,
  "started": false,
  "state": {
    "waiting": {
      "reason": "ContainerCreating"
    }
  }
}
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-canary/1391942499508948992/artifacts/e2e-aws-canary/gather-extra/artifacts/events.json | jq -r '.items[] | select(.metadata.namespace == "openshift-kube-apiserver" and .involvedObject.name == "installer-6-ip-10-0-163-180.us-west-2.compute.internal") | .firstTimestamp + " " + (.count | tostring) + " " + .reason + ": " + .message' | sort
2021-05-11T02:48:27Z 1 AddedInterface: Add eth0 [10.128.0.36/23]
2021-05-11T02:48:27Z 1 Failed: Error: cannot find volume "kubelet-dir" to mount into container "installer"
2021-05-11T02:48:27Z 1 Pulled: Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3c17486313a82d91260da08357f5a3139723ad50ecd1078fdff5ad4b99f40a6a" already present on machine
2021-05-11T02:48:28Z 1 AddedInterface: Add eth0 [10.128.0.37/23]

Not sure what's up with the kubelet-dir volume missing.

[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-canary/1391942499508948992

Comment 1 W. Trevor King 2021-05-11 16:45:24 UTC
Not super-common, and the point of the canary job is to turn up that sort of rare flake so we can fix it before it gets lost in the noise:

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=96h&type=junit&search=kube-apiserver.*InstallerPodContainerWaiting_ContainerCreating' | grep 'failures match' | sort
periodic-ci-openshift-release-master-ci-4.7-upgrade-from-stable-4.6-e2e-openstack-upgrade (all) - 2 runs, 100% failed, 50% of failures match = 50% impact
periodic-ci-openshift-release-master-ci-4.7-upgrade-from-stable-4.6-e2e-ovirt-upgrade (all) - 7 runs, 43% failed, 67% of failures match = 29% impact
periodic-ci-openshift-release-master-ci-4.9-e2e-gcp (all) - 57 runs, 95% failed, 2% of failures match = 2% impact
periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-canary (all) - 8 runs, 13% failed, 100% of failures match = 13% impact
periodic-ci-openshift-release-master-nightly-4.8-e2e-metal-ipi-ovn-ipv6 (all) - 96 runs, 86% failed, 1% of failures match = 1% impact
pull-ci-openshift-installer-master-e2e-aws-fips (all) - 20 runs, 35% failed, 14% of failures match = 5% impact
pull-ci-openshift-kubernetes-master-e2e-gcp (all) - 13 runs, 31% failed, 25% of failures match = 8% impact
pull-ci-openshift-ovn-kubernetes-master-e2e-metal-ipi-ovn-dualstack (all) - 27 runs, 63% failed, 6% of failures match = 4% impact

Comment 2 Ben Parees 2021-05-11 17:13:52 UTC
raising sev to reflect the fact that this disrupted a job that's otherwise rock solid

Comment 3 Ryan Phillips 2021-06-01 18:21:04 UTC
This is related to the issues Clayton brought up in 1952224. Going to close this BZ as a dupe with the understanding we are tracking this.

*** This bug has been marked as a duplicate of bug 1952224 ***


Note You need to log in before you can comment on or make changes to this bug.