Description of problem: ----------------------- After restoring cluster to previous state there are no etcd containers on 2 masters, and on recovery master there is just 1 `etcd` container Version-Release number of selected component (if applicable): ------------------------------------------------------------- Installed: 4.4.0-0.nightly-2020-05-01-231319 Updated to 4.4.3 Restored to: 4.4.0-0.nightly-2020-05-01-231319 Steps to Reproduce: 1. https://url.corp.redhat.com/7fc8d2a Actual results: --------------- etcd containers ain't runing on 2 masters nodes out of 3 All nodes are in `SchedulingDisabled` state Some operators didn't restore to previous version: oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.4.0-0.nightly-2020-05-01-231319 True False False 17h cloud-credential 4.4.0-0.nightly-2020-05-01-231319 True False False 18h cluster-autoscaler 4.4.0-0.nightly-2020-05-01-231319 True False False 18h console 4.4.0-0.nightly-2020-05-01-231319 True False False 3h8m csi-snapshot-controller 4.4.3 False True False 72m dns 4.4.0-0.nightly-2020-05-01-231319 True False False 18h etcd 4.4.0-0.nightly-2020-05-01-231319 True False False 3h4m image-registry 4.4.3 False True False 72m ingress 4.4.0-0.nightly-2020-05-01-231319 True False False 3h13m insights 4.4.3 True False False 18h kube-apiserver 4.4.0-0.nightly-2020-05-01-231319 True False False 18h kube-controller-manager 4.4.0-0.nightly-2020-05-01-231319 True False False 18h kube-scheduler 4.4.0-0.nightly-2020-05-01-231319 True False False 18h kube-storage-version-migrator 4.4.0-0.nightly-2020-05-01-231319 True False False 3h13m machine-api 4.4.0-0.nightly-2020-05-01-231319 True False False 18h machine-config 4.4.0-0.nightly-2020-05-01-231319 True False False 17h marketplace 4.4.0-0.nightly-2020-05-01-231319 True False False 3h8m monitoring 4.4.0-0.nightly-2020-05-01-231319 True False False 17h network 4.4.0-0.nightly-2020-05-01-231319 True False False 18h node-tuning 4.4.0-0.nightly-2020-05-01-231319 True False False 18h openshift-apiserver 4.4.0-0.nightly-2020-05-01-231319 True False False 3h13m openshift-controller-manager 4.4.0-0.nightly-2020-05-01-231319 True False False 18h openshift-samples 4.4.0-0.nightly-2020-05-01-231319 True True False 178m operator-lifecycle-manager 4.4.0-0.nightly-2020-05-01-231319 True False False 18h operator-lifecycle-manager-catalog 4.4.0-0.nightly-2020-05-01-231319 True False False 18h operator-lifecycle-manager-packageserver 4.4.0-0.nightly-2020-05-01-231319 True False False 3h7m service-ca 4.4.0-0.nightly-2020-05-01-231319 True False False 18h service-catalog-apiserver 4.4.0-0.nightly-2020-05-01-231319 True False False 18h service-catalog-controller-manager 4.4.3 True False False 18h storage 4.4.0-0.nightly-2020-05-01-231319 True False False 18h Expected results: ----------------- etcd containers started on all masters Nodes are in Ready state All cluster operators are rolled back to previous version Additional info: ---------------- Virtual setup: 3masters + 2 workers + CNV after manually uncordon'ing 3 masters: Name: cluster Namespace: Labels: <none> Annotations: release.openshift.io/create-only: true API Version: operator.openshift.io/v1 Kind: Etcd Metadata: Creation Timestamp: 2020-05-12T13:53:56Z Generation: 2 Resource Version: 560342 Self Link: /apis/operator.openshift.io/v1/etcds/cluster UID: 9c6368e4-3c28-4ae0-a565-2c72552fb396 Spec: Force Redeployment Reason: recovery-2020-05-13 07:05:01.344000145+00:00 Management State: Managed Status: Conditions: Last Transition Time: 2020-05-12T14:05:45Z Reason: NoUnsupportedConfigOverrides Status: True Type: UnsupportedConfigOverridesUpgradeable Last Transition Time: 2020-05-12T14:59:01Z Status: False Type: InstallerControllerDegraded Last Transition Time: 2020-05-12T14:08:33Z Message: 3 nodes are active; 3 nodes are at revision 4; 0 nodes have achieved new revision 5 Status: True Type: StaticPodsAvailable Last Transition Time: 2020-05-13T08:39:48Z Message: 3 nodes are at revision 4; 0 nodes have achieved new revision 5 Status: True Type: NodeInstallerProgressing Last Transition Time: 2020-05-12T14:05:45Z Status: False Type: NodeInstallerDegraded Last Transition Time: 2020-05-12T14:05:45Z Reason: HostEndpoints2Updated Status: False Type: HostEndpoints2Degraded Last Transition Time: 2020-05-12T14:14:43Z Status: False Type: StaticPodsDegraded Last Transition Time: 2020-05-13T08:39:27Z Message: The master nodes not ready: node "master-0-0" not ready since 2020-05-13 07:03:11 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.), node "m aster-0-2" not ready since 2020-05-13 07:03:11 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.) Reason: MasterNodesReady Status: True Type: NodeControllerDegraded Last Transition Time: 2020-05-13T08:39:29Z Reason: AsExpected Status: False Type: ScriptControllerDegraded Last Transition Time: 2020-05-12T14:05:47Z Status: False Type: InstallerPodPendingDegraded Last Transition Time: 2020-05-12T14:05:47Z Status: False Type: InstallerPodContainerWaitingDegraded Last Transition Time: 2020-05-12T14:05:47Z Status: False Type: InstallerPodNetworkingDegraded Last Transition Time: 2020-05-12T14:05:48Z Reason: AsExpected Status: False Type: EnvVarControllerDegraded Last Transition Time: 2020-05-12T14:05:48Z Status: False Type: ConfigObservationDegraded Last Transition Time: 2020-05-13T08:39:45Z Status: False Type: RevisionControllerDegraded Last Transition Time: 2020-05-12T14:05:58Z Reason: AsExpected Status: False Type: BackingResourceControllerDegraded Last Transition Time: 2020-05-13T05:07:07Z Reason: AsExpected Status: False Type: ClusterMemberControllerDegraded Last Transition Time: 2020-05-12T14:06:04Z Status: False Type: TargetConfigControllerDegraded Last Transition Time: 2020-05-12T14:06:07Z Status: False Type: ResourceSyncControllerDegraded Last Transition Time: 2020-05-12T14:06:08Z Reason: AsExpected Status: False Type: EtcdCertSignerControllerDegraded Last Transition Time: 2020-05-12T14:06:08Z Last Transition Time: 2020-05-12T14:06:08Z Reason: AsExpected Status: False Type: EtcdStaticResourcesDegraded Last Transition Time: 2020-05-13T05:07:06Z Reason: AsExpected Status: False Type: EtcdMemberIPMigratorDegraded Last Transition Time: 2020-05-13T05:07:05Z Reason: MembersReported Status: False Type: EtcdMembersControllerDegraded Last Transition Time: 2020-05-13T05:07:05Z Reason: AsExpected Status: False Type: BootstrapTeardownDegraded Last Transition Time: 2020-05-13T05:20:04Z Message: No unhealthy members found Reason: AsExpected Status: False Type: EtcdMembersDegraded Last Transition Time: 2020-05-12T14:09:31Z Message: etcd-bootstrap member is already removed Reason: BootstrapAlreadyRemoved Status: True Type: EtcdRunningInCluster Last Transition Time: 2020-05-13T05:11:04Z Message: master-0-1 members are available, have not started, are unhealthy, are unknown Reason: EtcdQuorate Status: True Type: EtcdMembersAvailable Last Transition Time: 2020-05-12T14:08:27Z Message: all members have started Reason: AsExpected Status: False Type: EtcdMembersProgressing Latest Available Revision: 5 Latest Available Revision Reason: Node Statuses: Current Revision: 4 Node Name: master-0-1 Current Revision: 4 Node Name: master-0-0 Target Revision: 5 Current Revision: 4 Node Name: master-0-2 Ready Replicas: 0 Events: <none>
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409