1959495 – InstallerPodContainerWaiting_ContainerCreating cannot find volume kubelet-dir

Bug 1959495 - InstallerPodContainerWaiting_ContainerCreating cannot find volume kubelet-dir

Summary: InstallerPodContainerWaiting_ContainerCreating cannot find volume kubelet-dir

Keywords:
Status:	CLOSED DUPLICATE of bug 1952224
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.8.0
Assignee:	Ryan Phillips
QA Contact:	Sunil Choudhary
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-05-11 16:41 UTC by W. Trevor King
Modified:	2021-06-01 18:21 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-06-01 18:21:04 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	kubernetes kubernetes pull 101919	0	None	open	kubelet: do not propogate an error with startContainer on deleted pods	2021-05-11 19:59:56 UTC

Description W. Trevor King 2021-05-11 16:41:20 UTC

Canary job [1]:

  operator conditions kube-apiserver	0s
    Operator degraded (InstallerPodContainerWaiting_ContainerCreating): InstallerPodContainerWaitingDegraded: Pod "installer-6-ip-10-0-163-180.us-west-2.compute.internal" on node "ip-10-0-163-180.us-west-2.compute.internal" container "installer" is waiting since 2021-05-11 02:48:24 +0000 UTC because ContainerCreating

  [sig-arch][Early] Managed cluster should start all core operators [Skipped:Disconnected] [Suite:openshift/conformance/parallel]
  Run #0: Failed expand_less	1s
    fail [github.com/onsi/ginkgo.0-origin.0+incompatible/internal/leafnodes/runner.go:113]: May 11 03:02:44.881: Some cluster operators are not ready: kube-apiserver (Degraded=True InstallerPodContainerWaiting_ContainerCreating: InstallerPodContainerWaitingDegraded: Pod "installer-6-ip-10-0-163-180.us-west-2.compute.internal" on node "ip-10-0-163-180.us-west-2.compute.internal" container "installer" is waiting since 2021-05-11 02:48:24 +0000 UTC because ContainerCreating)

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-canary/1391942499508948992/artifacts/e2e-aws-canary/gather-extra/artifacts/pods.json | jq -r '.items[] | select(.metadata | .namespace == "openshift-kube-apiserver" and .name == "installer-6-ip-10-0-163-180.us-west-2.compute.internal").status.containerStatuses[]'
{
  "image": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3c17486313a82d91260da08357f5a3139723ad50ecd1078fdff5ad4b99f40a6a",
  "imageID": "",
  "lastState": {},
  "name": "installer",
  "ready": false,
  "restartCount": 0,
  "started": false,
  "state": {
    "waiting": {
      "reason": "ContainerCreating"
    }
  }
}
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-canary/1391942499508948992/artifacts/e2e-aws-canary/gather-extra/artifacts/events.json | jq -r '.items[] | select(.metadata.namespace == "openshift-kube-apiserver" and .involvedObject.name == "installer-6-ip-10-0-163-180.us-west-2.compute.internal") | .firstTimestamp + " " + (.count | tostring) + " " + .reason + ": " + .message' | sort
2021-05-11T02:48:27Z 1 AddedInterface: Add eth0 [10.128.0.36/23]
2021-05-11T02:48:27Z 1 Failed: Error: cannot find volume "kubelet-dir" to mount into container "installer"
2021-05-11T02:48:27Z 1 Pulled: Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3c17486313a82d91260da08357f5a3139723ad50ecd1078fdff5ad4b99f40a6a" already present on machine
2021-05-11T02:48:28Z 1 AddedInterface: Add eth0 [10.128.0.37/23]

Not sure what's up with the kubelet-dir volume missing.

[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-canary/1391942499508948992

Comment 1 W. Trevor King 2021-05-11 16:45:24 UTC

Not super-common, and the point of the canary job is to turn up that sort of rare flake so we can fix it before it gets lost in the noise:

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=96h&type=junit&search=kube-apiserver.*InstallerPodContainerWaiting_ContainerCreating' | grep 'failures match' | sort
periodic-ci-openshift-release-master-ci-4.7-upgrade-from-stable-4.6-e2e-openstack-upgrade (all) - 2 runs, 100% failed, 50% of failures match = 50% impact
periodic-ci-openshift-release-master-ci-4.7-upgrade-from-stable-4.6-e2e-ovirt-upgrade (all) - 7 runs, 43% failed, 67% of failures match = 29% impact
periodic-ci-openshift-release-master-ci-4.9-e2e-gcp (all) - 57 runs, 95% failed, 2% of failures match = 2% impact
periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-canary (all) - 8 runs, 13% failed, 100% of failures match = 13% impact
periodic-ci-openshift-release-master-nightly-4.8-e2e-metal-ipi-ovn-ipv6 (all) - 96 runs, 86% failed, 1% of failures match = 1% impact
pull-ci-openshift-installer-master-e2e-aws-fips (all) - 20 runs, 35% failed, 14% of failures match = 5% impact
pull-ci-openshift-kubernetes-master-e2e-gcp (all) - 13 runs, 31% failed, 25% of failures match = 8% impact
pull-ci-openshift-ovn-kubernetes-master-e2e-metal-ipi-ovn-dualstack (all) - 27 runs, 63% failed, 6% of failures match = 4% impact

Comment 2 Ben Parees 2021-05-11 17:13:52 UTC

raising sev to reflect the fact that this disrupted a job that's otherwise rock solid

Comment 3 Ryan Phillips 2021-06-01 18:21:04 UTC

This is related to the issues Clayton brought up in 1952224. Going to close this BZ as a dupe with the understanding we are tracking this.

*** This bug has been marked as a duplicate of bug 1952224 ***

Note You need to log in before you can comment on or make changes to this bug.