Bug 1877374 - Error: Failed to evict container: "": Failed to find container "etcd-signer" in state: no container with name or ID etcd-signer found: no such container
Summary: Error: Failed to evict container: "": Failed to find container "etcd-signer" ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Etcd
Version: 4.5
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: 4.5.z
Assignee: Dan Mace
QA Contact: ge liu
URL:
Whiteboard:
Depends On: 1876091
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-09 13:36 UTC by Dan Mace
Modified: 2020-11-10 14:54 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1876091
Environment:
Last Closed: 2020-11-10 14:53:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-etcd-operator pull 438 0 None closed Bug 1877374: Render bootstrap certificates 2021-02-20 06:31:33 UTC
Github openshift installer pull 4150 0 None closed Bug 1877374: Remove unused bootstrap etcd cert generation mechanism 2021-02-20 06:31:33 UTC
Red Hat Product Errata RHBA-2020:4425 0 None None None 2020-11-10 14:54:10 UTC

Description Dan Mace 2020-09-09 13:36:33 UTC
+++ This bug was initially created as a clone of Bug #1876091 +++

Description of problem:

Error: Failed to evict container: "": Failed to find container "etcd-signer" in state: no container with name or ID etcd-signer found: no such container

Version-Release number of the following components:

4.5.7
vSphere 6.7U3

How reproducible:

Unsure

Steps to Reproduce:

1. Following disconnected installation instructions on vSphere [1]
2. Bootstrap node fails with the error message in the results below
3.

[1] https://docs.openshift.com/container-platform/4.5/installing/installing_vsphere/installing-restricted-networks-vsphere.html#installation-initializing-manual_installing-restricted-networks-vsphere

Actual results:

[core@bootstrap ~]$ journalctl -b -f -u release-image.service -u bootkube.service
-- Logs begin at Fri 2020-09-04 19:24:26 UTC. --
[...]
Sep 04 20:29:10 bootstrap.discocp4.lab.msp.redhat.com bootkube.sh[26740]: Skipped "secret-kube-apiserver-to-kubelet-signer.yaml" secrets.v1./kube-apiserver-to-kubelet-signer -n openshift-kube-apiserver-operator as it already exists
Sep 04 20:29:11 bootstrap.discocp4.lab.msp.redhat.com bootkube.sh[26740]: Skipped "secret-loadbalancer-serving-signer.yaml" secrets.v1./loadbalancer-serving-signer -n openshift-kube-apiserver-operator as it already exists
Sep 04 20:29:11 bootstrap.discocp4.lab.msp.redhat.com bootkube.sh[26740]: Skipped "secret-localhost-serving-signer.yaml" secrets.v1./localhost-serving-signer -n openshift-kube-apiserver-operator as it already exists
Sep 04 20:29:11 bootstrap.discocp4.lab.msp.redhat.com bootkube.sh[26740]: Skipped "secret-service-network-serving-signer.yaml" secrets.v1./service-network-serving-signer -n openshift-kube-apiserver-operator as it already exists
Sep 04 20:29:21 bootstrap.discocp4.lab.msp.redhat.com bootkube.sh[26740]: E0904 20:29:21.735288       1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=3, ErrCode=NO_ERROR, debug=""
Sep 04 20:29:21 bootstrap.discocp4.lab.msp.redhat.com bootkube.sh[26740]: E0904 20:29:21.759732       1 reflector.go:251] github.com/openshift/cluster-bootstrap/pkg/start/status.go:66: Failed to watch *v1.Pod: Get https://localhost:6443/api/v1/pods?watch=true: dial tcp [::1]:6443: connect: connection refused
Sep 04 20:29:22 bootstrap.discocp4.lab.msp.redhat.com bootkube.sh[26740]: E0904 20:29:22.761579       1 reflector.go:134] github.com/openshift/cluster-bootstrap/pkg/start/status.go:66: Failed to list *v1.Pod: Get https://localhost:6443/api/v1/pods: dial tcp [::1]:6443: connect: connection refused
Sep 04 20:29:23 bootstrap.discocp4.lab.msp.redhat.com bootkube.sh[26740]: E0904 20:29:23.763460       1 reflector.go:134] github.com/openshift/cluster-bootstrap/pkg/start/status.go:66: Failed to list *v1.Pod: Get https://localhost:6443/api/v1/pods: dial tcp [::1]:6443: connect: connection refused
Sep 04 20:48:37 bootstrap.discocp4.lab.msp.redhat.com bootkube.sh[26740]: Error: error while checking pod status: timed out waiting for the condition
Sep 04 20:48:37 bootstrap.discocp4.lab.msp.redhat.com bootkube.sh[26740]: Tearing down temporary bootstrap control plane...
Sep 04 20:48:37 bootstrap.discocp4.lab.msp.redhat.com bootkube.sh[26740]: Error: error while checking pod status: timed out waiting for the condition
Sep 04 20:48:38 bootstrap.discocp4.lab.msp.redhat.com bootkube.sh[26740]: Error: Failed to evict container: "": Failed to find container "etcd-signer" in state: no container with name or ID etcd-signer found: no such container
Sep 04 20:48:38 bootstrap.discocp4.lab.msp.redhat.com systemd[1]: bootkube.service: Main process exited, code=exited, status=1/FAILURE
Sep 04 20:48:38 bootstrap.discocp4.lab.msp.redhat.com systemd[1]: bootkube.service: Failed with result 'exit-code'.
Sep 04 20:48:43 bootstrap.discocp4.lab.msp.redhat.com systemd[1]: bootkube.service: Service RestartSec=5s expired, scheduling restart.
Sep 04 20:48:43 bootstrap.discocp4.lab.msp.redhat.com systemd[1]: bootkube.service: Scheduled restart job, restart counter is at 4.
Sep 04 20:48:43 bootstrap.discocp4.lab.msp.redhat.com systemd[1]: Stopped Bootstrap a Kubernetes cluster.
Sep 04 20:48:43 bootstrap.discocp4.lab.msp.redhat.com systemd[1]: Started Bootstrap a Kubernetes cluster.
Sep 04 20:49:00 bootstrap.discocp4.lab.msp.redhat.com bootkube.sh[33924]: Starting etcd certificate signer...
Sep 04 20:49:01 bootstrap.discocp4.lab.msp.redhat.com bootkube.sh[33924]: 6a59a3e4c6a4a93d53756df48b49fea9e64149c059f105101d3b6262aabd9ac2
Sep 04 20:49:02 bootstrap.discocp4.lab.msp.redhat.com bootkube.sh[33924]: https://localhost:2379 is healthy: successfully committed proposal: took = 15.98495ms
Sep 04 20:49:02 bootstrap.discocp4.lab.msp.redhat.com bootkube.sh[33924]: etcd cluster up. Killing etcd certificate signer...
Sep 04 20:49:02 bootstrap.discocp4.lab.msp.redhat.com bootkube.sh[33924]: 6a59a3e4c6a4a93d53756df48b49fea9e64149c059f105101d3b6262aabd9ac2
Sep 04 20:49:02 bootstrap.discocp4.lab.msp.redhat.com bootkube.sh[33924]: Starting cluster-bootstrap...
Sep 04 20:49:03 bootstrap.discocp4.lab.msp.redhat.com bootkube.sh[33924]: Starting temporary bootstrap control plane...
Sep 04 20:49:03 bootstrap.discocp4.lab.msp.redhat.com bootkube.sh[33924]: Skipped "0000_00_cluster-version-operator_00_namespace.yaml" namespaces.v1./openshift-cluster-version -n  as it already exists
Sep 04 20:49:03 bootstrap.discocp4.lab.msp.redhat.com bootkube.sh[33924]: Skipped "0000_00_cluster-version-operator_01_clusteroperator.crd.yaml" customresourcedefinitions.v1beta1.apiextensions.k8s.io/clusteroperators.config.openshift.io -n  as it already exists
[...]

[root@bootstrap ~]# crictl pods
POD ID              CREATED             STATE               NAME                                                                       NAMESPACE                             ATTEMPT
96913e0cd7de9       11 minutes ago      Ready               bootstrap-kube-apiserver-bootstrap.discocp4.lab.msp.redhat.com             kube-system                           1
71ce0c944231d       11 minutes ago      Ready               bootstrap-kube-scheduler-bootstrap.discocp4.lab.msp.redhat.com             kube-system                           1
2518d5fbaf566       11 minutes ago      Ready               bootstrap-kube-controller-manager-bootstrap.discocp4.lab.msp.redhat.com    kube-system                           1
560e173e96a84       11 minutes ago      Ready               bootstrap-cluster-version-operator-bootstrap.discocp4.lab.msp.redhat.com   openshift-cluster-version             1
1f59d9b34619f       11 minutes ago      Ready               cloud-credential-operator-bootstrap.discocp4.lab.msp.redhat.com            openshift-cloud-credential-operator   1
bf512c1a3af90       32 minutes ago      NotReady            bootstrap-kube-scheduler-bootstrap.discocp4.lab.msp.redhat.com             kube-system                           0
6887e35e74929       32 minutes ago      NotReady            bootstrap-kube-controller-manager-bootstrap.discocp4.lab.msp.redhat.com    kube-system                           0
dcc9121a53df1       32 minutes ago      NotReady            bootstrap-kube-apiserver-bootstrap.discocp4.lab.msp.redhat.com             kube-system                           0
c278592bc9851       32 minutes ago      NotReady            bootstrap-cluster-version-operator-bootstrap.discocp4.lab.msp.redhat.com   openshift-cluster-version             0
705aca2b677d9       32 minutes ago      Ready               bootstrap-machine-config-operator-bootstrap.discocp4.lab.msp.redhat.com    default                               0
f849fb2f7034b       33 minutes ago      Ready               etcd-bootstrap-member-bootstrap.discocp4.lab.msp.redhat.com                openshift-etcd                        0

[root@bootstrap ~]# crictl images
IMAGE                                                   TAG                 IMAGE ID            SIZE
quay.io/openshift-release-dev/ocp-release@sha256        <none>              e7c443017e821       306MB
quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256   <none>              790b38ec6f81b       307MB
quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256   <none>              f67097361498f       283MB
quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256   <none>              3fcd563edad3b       255MB
quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256   <none>              b0d508e56910d       305MB
quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256   <none>              7e44a17a2951a       282MB
quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256   <none>              c5072ae56904b       308MB
quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256   <none>              0c893df5a716e       308MB
quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256   <none>              793d4a1e7161c       305MB
quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256   <none>              eaff45a171adb       307MB
quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256   <none>              d1bb18c7027ae       432MB
quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256   <none>              5afa4eae3d651       311MB
quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256   <none>              d1eec47fd97e5       326MB
quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256   <none>              32b54e50bc4bc       288MB
quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256   <none>              d8375a61d36e3       674MB

Expected results:

Installation should run through and/or give us some details on why it's failing

Additional info:

Install config is as follows:

[root@tatooine ocp45]# cat install-config.yaml 
apiVersion: v1
baseDomain: lab.msp.redhat.com
compute:
- hyperthreading: Disabled   
  name: worker
  replicas: 2 
controlPlane:
  hyperthreading: Disabled   
  name: master 
  replicas: 3 
metadata:
  name: discocp4
platform:
  vsphere:
    vcenter: vcenter01.lab.msp.redhat.com
    username: ocp4
    password: OpenShift2020!
    datacenter: msp-lab
    defaultDatastore: storage03-iscsi-lun0 
networking:
  clusterNetworks:
  - cidr: 10.128.0.0/14 
    hostPrefix: 23 
  networkType: OpenShiftSDN
  serviceNetwork: 
  - 172.30.0.0/16
platform:
  none: {}
pullSecret: '{"auths": ...}' 
sshKey: 'ssh-ed25519 AAAA...' 
imageContentSources: 
- mirrors:
  - registry.lab.msp.redhat.com:5000/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-release
- mirrors:
  - registry.lab.msp.redhat.com:5000/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-v4.0-art-dev

--- Additional comment from Scott Dodson on 2020-09-05 16:46:28 UTC ---

Please attach the log bundle generated from `openshift-install gather bootstrap` see --help if you're not familiar with the command. When the installer failed it should've attempted to gather the bundle or emitted instructions to do so. That log bundle should be attached to any bug involving bootstrap failure.

--- Additional comment from Sam Yangsao on 2020-09-06 00:01:22 UTC ---

Log bundle attached.

--- Additional comment from Sam Yangsao on 2020-09-08 15:47:55 UTC ---

I was able to reproduce the issue again this morning, log bundle 2 attached from the bootstrap node.

--- Additional comment from Abhinav Dahiya on 2020-09-08 16:54:34 UTC ---

The etcd team creates the etcd signer, so i think they can help the best here.

Comment 4 ge liu 2020-11-03 07:41:59 UTC
Installed 4.5.0-0.nightly-2020-10-31-200727 with 19_Disconnected UPI on vSphere 7.0 with RHCOS, have not hit this issue.

Comment 6 errata-xmlrpc 2020-11-10 14:53:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.18 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4425


Note You need to log in before you can comment on or make changes to this bug.