1826329 – 4.4: unready pod is not deleted, blocking cluster upgrade

Bug 1826329 - 4.4: unready pod is not deleted, blocking cluster upgrade

Summary: 4.4: unready pod is not deleted, blocking cluster upgrade

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	4.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	4.4.0
Assignee:	Ryan Phillips
QA Contact:	Sunil Choudhary
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	1825946 1828031 (view as bug list)
Depends On:	1826821 1827325 1828104
Blocks:
TreeView+	depends on / blocked

Reported:	2020-04-21 12:50 UTC by Stefan Schimanski
Modified:	2023-09-15 00:31 UTC (History)
CC List:	20 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-04-28 12:30:43 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Stefan Schimanski 2020-04-21 12:50:37 UTC

The upgrade of a cluster in https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade-4.4/285 times out with:

  Cluster did not complete upgrade: timed out waiting for the condition: Cluster operator openshift-apiserver is still updating

Diving into the root cause, this is found:

- the openshift-apiserver-operator updated its openshift-apiserver/apiserver deployment at 05:24:07
- the pod openshift-apiserver/apiserver-5b9677479-z84j7 on node ci-op-prmz6-m-2.c.openshift-gce-devel-ci.internal is marked with deletionTimestamp "2020-04-21T05:24:56Z"
- 10 minutes later the new ReplicaSet openshift-apiserver/openshift-apiserver stops progressing:
  
  lastTransitionTime: "2020-04-21T05:34:04Z"
      lastUpdateTime: "2020-04-21T05:34:04Z"
      message: ReplicaSet "apiserver-7c9bf658b" has timed out progressing.
      reason: ProgressDeadlineExceeded
      status: "False"
      type: Progressing

- the pod openshift-apiserver/apiserver-5b9677479-z84j7 has not terminated:

  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-04-21T05:09:57Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2020-04-21T05:11:51Z"
    message: 'containers with unready status: [openshift-apiserver]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2020-04-21T05:11:51Z"
    message: 'containers with unready status: [openshift-apiserver]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2020-04-21T05:00:36Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: cri-o://b5775923b72e47cd18fa007edcdada6d9088dfd0ca9b500ef2977a2aeacff89c
    image: registry.svc.ci.openshift.org/ocp/4.4-2020-04-17-081422@sha256:6f08359ab47e36269a7247a545ac2ef36999782a77108abba90ed03f735a0bcd
    imageID: registry.svc.ci.openshift.org/ocp/4.4-2020-04-17-081422@sha256:6f08359ab47e36269a7247a545ac2ef36999782a77108abba90ed03f735a0bcd
    lastState: {}
    name: openshift-apiserver
    ready: false
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2020-04-21T05:09:58Z"

- the new pod openshift-apiserver/apiserver-7c9bf658b-nvzpx cannot be scheduled due to anti-affinity with the non-terminated pod:

  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-04-21T05:24:03Z"
    message: '0/6 nodes are available: 3 node(s) didn''t match node selector, 3 node(s)
      didn''t match pod affinity/anti-affinity.'
    reason: Unschedulable
    status: "False"
    type: PodScheduled

Comment 1 Stefan Schimanski 2020-04-21 12:54:05 UTC

I double checked another test from https://testgrid.k8s.io/redhat-openshift-ocp-release-4.4-informing#release-openshift-origin-installer-e2e-gcp-upgrade-4.4:

  https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade-4.4/280

Here ingress, storage and openshift-controller-manager are not finishing upgrades. I checked the latter:

- the openshift-controller-manager/controller-manager DaemonSet does not finish a rollout started at around 23:08:45
- the pod controller-manager-b5nlc starts deletion at 2020-04-19T23:08:45Z

    conditions:
    - lastProbeTime: null
      lastTransitionTime: "2020-04-19T22:51:19Z"
      status: "True"
      type: Initialized
    - lastProbeTime: null
      lastTransitionTime: "2020-04-19T23:08:46Z"
      message: 'containers with unready status: [controller-manager]'
      reason: ContainersNotReady
      status: "False"
      type: Ready
    - lastProbeTime: null
      lastTransitionTime: "2020-04-19T23:08:46Z"
      message: 'containers with unready status: [controller-manager]'
      reason: ContainersNotReady
      status: "False"
      type: ContainersReady
    - lastProbeTime: null
      lastTransitionTime: "2020-04-19T22:51:11Z"
      status: "True"
      type: PodScheduled

- it never terminates

Comment 2 Lalatendu Mohanty 2020-04-21 13:00:58 UTC

We're asking the following questions to evaluate whether or not this bug warrants blocking an upgrade edge from either the previous X.Y or X.Y.Z. The ultimate goal is to avoid delivering an update which introduces new risk or reduces cluster functionality in any way. Sample answers are provided to give more context and the UpgradeBlocker flag has been added to this bug. It will be removed if the assessment indicates that this should not block upgrade edges. The expectation is that the assignee answers these questions.

Who is impacted?
  Customers upgrading from 4.y.Z to 4.y+1.z running on GCP with thousands of namespaces, approximately 5% of the subscribed fleet
  All customers upgrading from 4.y.z to 4.y+1.z fail approximately 10% of the time
What is the impact?
  Up to 2 minute disruption in edge routing
  Up to 90seconds of API downtime
  etcd loses quorum and you have to restore from backup
How involved is remediation?
  Issue resolves itself after five minutes
  Admin uses oc to fix things
  Admin must SSH to hosts, restore from backups, or other non standard admin activities
Is this a regression?
  No, it’s always been like this we just never noticed
  Yes, from 4.y.z to 4.y+1.z Or 4.y.z to 4.y.z+1

Comment 3 Stephen Cuppett 2020-04-21 13:02:40 UTC

Could this be related to BZ1825946?

Comment 4 Stefan Schimanski 2020-04-21 13:04:09 UTC

To be clear, this looks like the root cause breaking 90% of gcp 4.4 upgrades in https://testgrid.k8s.io/redhat-openshift-ocp-release-4.4-informing#release-openshift-origin-installer-e2e-gcp-upgrade-4.4.

Comment 5 Mrunal Patel 2020-04-21 14:05:36 UTC

We see i/o timeouts on GCP which is the first thing that leads us into this state. We have opened a bug for early context cancellation of CNI (https://bugzilla.redhat.com/show_bug.cgi?id=1826075) calls but
still need an explanation for:

238090:Apr 21 05:10:19.741093 ci-op-prmz6-m-2.c.openshift-gce-devel-ci.internal crio[1325]: 2020-04-21T05:10:19Z [error] SetNetworkStatus: failed to query the pod apiserver-5b9677479-z84j7 in out of cluster comm: Get https://api-int.ci-op-igpf6bw7-86ace.origin-ci-int-gce.dev.openshift.com:6443/api/v1/namespaces/openshift-apiserver/pods/apiserver-5b9677479-z84j7: dial tcp 10.0.0.2:6443: i/o timeout                        
238091:Apr 21 05:10:19.741706 ci-op-prmz6-m-2.c.openshift-gce-devel-ci.internal crio[1325]: 2020-04-21T05:10:19Z [error] Multus: error unsetting the networks status: SetNetworkStatus: failed to query the pod apiserver-5b9677479-z84j7 in out of cluster comm: Get https://api-int.ci-op-igpf6bw7-86ace.origin-ci-int-gce.dev.openshift.com:6443/api/v1/namespaces/openshift-apiserver/pods/apiserver-5b9677479-z84j7: dial tcp 10.0.0.2:6443: i/o timeout 

https://search.svc.ci.openshift.org/?search=dial+tcp+10.0.0.2%3A6443%3A+i%2Fo+timeout&maxAge=48h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job

Comment 6 Ryan Phillips 2020-04-21 19:21:34 UTC

*** Bug 1825946 has been marked as a duplicate of this bug. ***

Comment 7 Mrunal Patel 2020-04-21 21:14:42 UTC

We met earlier today and believe that fixing 1 below should fix the upgrade issue. 2 will prevent
long i/o timeouts or other long responses from api leading us to such a state and 3 is for the kubelet better handling us going into such a state.
2 may be fine for 4.4.z and 3 may be fine for 4.5.0 with an eventual backport to 4.4.z. We will first have to see upgrade runs after 1 lands.
1. https://bugzilla.redhat.com/show_bug.cgi?id=1802534
2. https://bugzilla.redhat.com/show_bug.cgi?id=1826075
3. https://bugzilla.redhat.com/show_bug.cgi?id=1826498

I think we should continue using this bug as an overall tracker for GCP upgrade issues and create other bugs as needed.

Comment 8 Mrunal Patel 2020-04-23 04:30:09 UTC

Who is impacted?
  90% of gcp upgrade 4.4 jobs in the CI
What is the impact?
  Node in a bad state from which it doesn't recover automatically
How involved is remediation?
  Admin will have to ssh and run some commands to remove the pods/containers stuck in this state.
Is this a regression?
  See this comment from Stefan - https://bugzilla.redhat.com/show_bug.cgi?id=1822603#c12

Comment 9 W. Trevor King 2020-04-23 05:14:55 UTC

> Who is impacted?
>   90% of gcp upgrade 4.4 jobs in the CI

This doesn't tell us about the customer impact, which is what affects whether we pull edges from Cincinnati or not.  Is this really 90% of 4.4 GCP clusters, or do we have some reason to think this is specific to CI and that customers will not be impacted?

> What is the impact?
>   Node in a bad state from which it doesn't recover automatically

What is the impact of the bad state?  Is it just blocking drain?  Is it taking a node completely out of the scheduling pool?  Is it leaving the node in the scheduling pool, but freezing some fraction of the pods scheduled there?  Could this take down multiple control-plane machines and cause my etcd to lose quroum?  If I was a customer and this bit my cluster, how afraid do I need to be?

Comment 10 Stefan Schimanski 2020-04-23 12:45:57 UTC

> What is the impact of the bad state?

What I saw in CI would mean that the upgrade would be blocked. Operators try to terminate old version pods and start new ones (the ones from the upgrade), and the new ones never launch. So the upgrade is never finished until manual intervention.

I don't think this status means we are not available, but the cluster is at least degraded.

Comment 11 Mrunal Patel 2020-04-23 16:13:12 UTC

Also, it appears that we don't have a gcp 4.3 to 4.4 upgrade test, so don't have data whether this blocks that upgrade.

Comment 12 W. Trevor King 2020-04-23 16:53:55 UTC

4.3.15 -> 4.4.0-rc.11 update job [1] (found via [2,3]).  At least that job is green, but one job is not great statistics.

[1]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade/661
[2]: https://openshift-release.svc.ci.openshift.org/#4-stable
[3]: https://openshift-release.svc.ci.openshift.org/releasestream/4-stable/release/4.4.0-rc.11#upgrades-from

Comment 13 W. Trevor King 2020-04-23 18:58:40 UTC

Checking via CI search over the past 14d (CI search history might not actually be that deep, depending on pod restarts) [1]:

release-openshift-origin-installer-e2e-gcp-upgrade-4.4 - 52 runs, 77% failed, 10% of failures match

That's a lot better than 90%.  Do we expect this to manifest differently than:

  fail [github.com/openshift/origin/test/e2e/upgrade/upgrade.go:137]: during upgrade to registry.svc.ci.openshift.org/ocp/release:4.4.0-0.ci-2020-04-20-135811
  Unexpected error:
    <*errors.errorString | 0xc001945ea0>: {
        s: "Cluster did not complete upgrade: timed out waiting for the condition: Cluster operator openshift-apiserver is still updating",
    }
    Cluster did not complete upgrade: timed out waiting for the condition: Cluster operator openshift-apiserver is still updating
  occurred
  failed: (1h18m40s) 2020-04-21T06:36:09 "[Disruptive] Cluster upgrade [Top Level] [Disruptive] Cluster upgrade should maintain a functioning cluster [Feature:ClusterUpgrade] [Serial] [Suite:openshift]"

which is how the originally-linked [2] died?

[1]: https://search.svc.ci.openshift.org/?search=Cluster+did+not+complete+upgrade.*Cluster+operator+openshift-apiserver+is+still+updating&maxAge=336h&context=1&type=bug%2Bjunit&name=release-openshift-origin-installer-e2e-gcp-upgrade-4.4&maxMatches=5&maxBytes=20971520&groupBy=job
[2]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade-4.4/285

Comment 17 W. Trevor King 2020-04-25 05:23:53 UTC

Out of band, Stefan said he'd also seen openshift-controller-manager, marketplace, and other operators time out from this same root cause as well, so maybe that explains my 10% vs. your 90%.  You could extend "timed out waiting for the condition..." to include those other operators, or all operators, but then you'll probably pull in a few other failure modes that also end up surfacing as operator timeouts.  So this is just one of the things that is not getting bubbled up into  a place CI search is indexing, and it's hard to put good numbers on those.

Comment 18 Stephen Cuppett 2020-04-26 10:55:31 UTC

This bug has been serving as a tracking bug. The ones needed resolved in 4.4.0 are merged and the others are set to 4.4.z/4.5 as appropriate. Closing this one out.

Comment 19 Stefan Schimanski 2020-04-26 12:17:39 UTC

*** Bug 1828031 has been marked as a duplicate of this bug. ***

Comment 20 Stefan Schimanski 2020-04-26 12:18:05 UTC

Reopening due to https://bugzilla.redhat.com/show_bug.cgi?id=1828031.

Comment 21 Ted Yu 2020-04-26 15:34:41 UTC

I was looking at https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade/661 where in masters-journal:

There were 25 logs indicating addition attempt for apiserver:

Apr 23 04:24:51.451805 ci-op-x5dhs-m-2.c.openshift-gce-devel-ci.internal hyperkube[1426]: I0423 04:24:51.451775    1426 kubelet.go:1913] SyncLoop (ADD, "api"): "apiserver-74fcb69cb5-p5qkp_openshift-apiserver(aa693864-82e3-4542-bade-b5ae9026c607)"

There were 25 logs indicating apiserver sandbox not found:

Apr 23 04:24:52.107027 ci-op-x5dhs-m-2.c.openshift-gce-devel-ci.internal hyperkube[1426]: I0423 04:24:52.107040    1426 kuberuntime_manager.go:424] No sandbox for pod "apiserver-74fcb69cb5-p5qkp_openshift-apiserver(aa693864-82e3-4542-bade-b5ae9026c607)" can be found. Need to start a new one

Comment 22 Ted Yu 2020-04-26 15:55:47 UTC

Two out of the 25 attempts encountered CNI error in the following form:

Apr 23 03:27:53.309901 ci-op-x5dhs-m-0.c.openshift-gce-devel-ci.internal hyperkube[1390]: E0423 03:27:53.309301    1390 pod_workers.go:191] Error syncing pod 66360513-0108-40e3-92b1-d14a82561539 ("apiserver-d2z92_openshift-apiserver(66360513-0108-40e3-92b1-d14a82561539)"), skipping: failed to "CreatePodSandbox" for "apiserver-d2z92_openshift-apiserver(66360513-0108-40e3-92b1-d14a82561539)" with CreatePodSandboxError: "CreatePodSandbox for pod \"apiserver-d2z92_openshift-apiserver(66360513-0108-40e3-92b1-d14a82561539)\" failed: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_apiserver-d2z92_openshift-apiserver_66360513-0108-40e3-92b1-d14a82561539_0(515be0aefea2ba1c31f5cca44c84a76e22df4bbc6506a7db472dd02d02fabf16): Multus: [openshift-apiserver/apiserver-d2z92]: error adding container to network \"openshift-sdn\": delegateAdd: error invoking DelegateAdd - \"openshift-sdn\": error in getting result from AddNetwork: CNI request failed with status 400: 'pods \"apiserver-d2z92\" not found\n'"
Apr 23 03:27:53.311157 ci-op-x5dhs-m-0.c.openshift-gce-devel-ci.internal hyperkube[1390]: I0423 03:27:53.310891    1390 event.go:255] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-apiserver", Name:"apiserver-d2z92", UID:"66360513-0108-40e3-92b1-d14a82561539", APIVersion:"v1", ResourceVersion:"6565", FieldPath:""}): type: 'Warning' reason: 'FailedCreatePodSandBox' Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_apiserver-d2z92_openshift-apiserver_66360513-0108-40e3-92b1-d14a82561539_0(515be0aefea2ba1c31f5cca44c84a76e22df4bbc6506a7db472dd02d02fabf16): Multus: [openshift-apiserver/apiserver-d2z92]: error adding container to network "openshift-sdn": delegateAdd: error invoking DelegateAdd - "openshift-sdn": error in getting result from AddNetwork: CNI request failed with status 400: 'pods "apiserver-d2z92" not found

Apr 23 03:27:53.316285 ci-op-x5dhs-m-0.c.openshift-gce-devel-ci.internal hyperkube[1390]: E0423 03:27:53.316227    1390 pod_workers.go:191] Error syncing pod 4bb79676-b9f5-4cb8-ab34-fe10893b054a ("apiserver-grphr_openshift-apiserver(4bb79676-b9f5-4cb8-ab34-fe10893b054a)"), skipping: failed to "CreatePodSandbox" for "apiserver-grphr_openshift-apiserver(4bb79676-b9f5-4cb8-ab34-fe10893b054a)" with CreatePodSandboxError: "CreatePodSandbox for pod \"apiserver-grphr_openshift-apiserver(4bb79676-b9f5-4cb8-ab34-fe10893b054a)\" failed: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_apiserver-grphr_openshift-apiserver_4bb79676-b9f5-4cb8-ab34-fe10893b054a_0(e6903ac3505e788a27f43aa59b75f77b28d9140d9d9e76eff2dc152d99b1ad63): Multus: [openshift-apiserver/apiserver-grphr]: error adding container to network \"openshift-sdn\": delegateAdd: error invoking DelegateAdd - \"openshift-sdn\": error in getting result from AddNetwork: CNI request failed with status 400: 'pods \"apiserver-grphr\" not found\n'"
Apr 23 03:27:53.316285 ci-op-x5dhs-m-0.c.openshift-gce-devel-ci.internal hyperkube[1390]: I0423 03:27:53.316256    1390 event.go:255] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-apiserver", Name:"apiserver-grphr", UID:"4bb79676-b9f5-4cb8-ab34-fe10893b054a", APIVersion:"v1", ResourceVersion:"7645", FieldPath:""}): type: 'Warning' reason: 'FailedCreatePodSandBox' Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_apiserver-grphr_openshift-apiserver_4bb79676-b9f5-4cb8-ab34-fe10893b054a_0(e6903ac3505e788a27f43aa59b75f77b28d9140d9d9e76eff2dc152d99b1ad63): Multus: [openshift-apiserver/apiserver-grphr]: error adding container to network "openshift-sdn": delegateAdd: error invoking DelegateAdd - "openshift-sdn": error in getting result from AddNetwork: CNI request failed with status 400: 'pods "apiserver-grphr" not found

Checking what happened to the other 23 attempts.

Comment 23 Ted Yu 2020-04-26 17:54:28 UTC

For apiserver-xmnv6 here were some related log lines:
```
Apr 23 03:27:55.949836 ci-op-x5dhs-m-0.c.openshift-gce-devel-ci.internal hyperkube[1390]: I0423 03:27:55.949780    1390 event.go:255] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-apiserver", Name:"apiserver-xmnv6", UID:"f7ad4c71-51a9-408b-951b-de802c6be3ee", APIVersion:"v1", ResourceVersion:"7941", FieldPath:"spec.initContainers{fix-audit-permissions}"}): type: 'Normal' reason: 'Created' Created container fix-audit-permissions

Apr 23 03:27:56.065769 ci-op-x5dhs-m-0.c.openshift-gce-devel-ci.internal systemd[1]: crio-fa9cb99862cb516d0f3ffcfec12ab780699748727c488ad2e3b1a7258ed57678.scope: Consumed 57ms CPU time

Apr 23 03:28:10.993252 ci-op-x5dhs-m-0.c.openshift-gce-devel-ci.internal hyperkube[1390]: I0423 03:28:10.993142    1390 status_manager.go:575] Status for pod "apiserver-xmnv6_openshift-apiserver(f7ad4c71-51a9-408b-951b-de802c6be3ee)" updated successfully: (4, {Phase:Running Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-04-23 03:27:56 +0000 UTC Reason: Message:} {Type:Ready Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-04-23 03:28:10 +0000 UTC Reason: Message:} {Type:ContainersReady Status:True

Apr 23 03:28:18.843211 ci-op-x5dhs-m-0.c.openshift-gce-devel-ci.internal hyperkube[1390]: I0423 03:28:18.843169    1390 kubelet.go:1924] SyncLoop (DELETE, "api"): "apiserver-xmnv6_openshift-apiserver(f7ad4c71-51a9-408b-951b-de802c6be3ee)"

Apr 23 03:28:20.934208 ci-op-x5dhs-m-0.c.openshift-gce-devel-ci.internal hyperkube[1390]: I0423 03:28:20.934069    1390 event.go:255] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-apiserver", Name:"apiserver-xmnv6", UID:"f7ad4c71-51a9-408b-951b-de802c6be3ee", APIVersion:"v1", ResourceVersion:"7941", FieldPath:"spec.containers{openshift-apiserver}"}): type: 'Warning' reason: 'Unhealthy' Readiness probe failed: Get https://10.129.0.15:8443/healthz: dial tcp 10.129.0.15:8443: connect: connection refused

Apr 23 03:28:21.423394 ci-op-x5dhs-m-0.c.openshift-gce-devel-ci.internal crio[1285]: 2020-04-23T03:28:21Z [verbose] Del: openshift-apiserver:apiserver-xmnv6:openshift-sdn:eth0 {"cniVersion":"0.3.1","name":"openshift-sdn","type":"openshift-sdn"}

Apr 23 03:28:23.953907 ci-op-x5dhs-m-0.c.openshift-gce-devel-ci.internal hyperkube[1390]: I0423 03:28:23.951231    1390 status_manager.go:568] Patch status for pod "apiserver-xmnv6_openshift-apiserver(f7ad4c71-51a9-408b-951b-de802c6be3ee)" with "{\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"Initialized\"},{\"type\":\"Ready\"},{\"type\":\"ContainersReady\"},{\"type\":\"PodScheduled\"}],\"conditions\":[{\"lastTransitionTime\":\"2020-04-23T03:28:23Z\",\"message\":\"containers with unready status: [openshift-apiserver]\",\"reason\":\"ContainersNotReady\",\"status\":\"False\",\"type\":\"Ready\"},{\"lastTransitionTime\":\"2020-04-23T03:28:23Z\",\"message\":\"containers with unready status: [openshift-apiserver]\",\"reason\":\"ContainersNotReady\",\"status\":\"False\",\"type\":\"ContainersReady\"}],\"containerStatuses\":[{\"image\":\"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:773f1aab586673797d55ee7379bbecd65df97d3301720f99e2b57c87fe67419c\",\"imageID\":\"\",\"lastState\":{},\"name\":\"openshift-apiserver\",\"ready\":false,\"restartCount\":0,\"started\":false,\"state\":{\"waiting\":{\"reason\":\"PodInitializing\"}}}],\"phase\":\"Pending\"}}"

Apr 23 03:28:27.715622 ci-op-x5dhs-m-0.c.openshift-gce-devel-ci.internal hyperkube[1390]: I0423 03:28:27.712227    1390 status_manager.go:588] Pod "apiserver-xmnv6_openshift-apiserver(f7ad4c71-51a9-408b-951b-de802c6be3ee)" fully terminated and removed from etcd
```
Please note the deletion at 03:28:18.843211 - I looked back til 03:28:10.993252 when the pod was considered running. Not much information in between.

Since log level 4 was not enabled, we may not have enough information from this log file.

Comment 25 Ben Parees 2020-04-27 16:43:32 UTC

Mrunal can we drop https://bugzilla.redhat.com/show_bug.cgi?id=1822603 from the "Depends on" list, since that bug was moved to 4.4.z?

Comment 26 W. Trevor King 2021-04-05 17:48:00 UTC

Removing UpgradeBlocker from this older bug, to remove it from the suspect queue described in [1].  If you feel like this bug still needs to be a suspect, please add keyword again.

[1]: https://github.com/openshift/enhancements/pull/475

Comment 27 Red Hat Bugzilla 2023-09-15 00:31:09 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days

Note You need to log in before you can comment on or make changes to this bug.

aos-bugs
bparees
chezhang
dcbw
jdelft
jhou
jokerman
lmohanty
mpatel
rphillips
scuppett
sdodson
vlaad
weinliu
wking
wsun
wzheng
xtian
yapei
zyu