Bug 1701316

Summary: Need operator deletion recovery process
Product: OpenShift Container Platform Reporter: Mike Dame <mdame>
Component: MasterAssignee: Mike Dame <mdame>
Status: CLOSED ERRATA QA Contact: zhou ying <yinzhou>
Severity: high Docs Contact:
Priority: high    
Version: 4.1.0CC: aos-bugs, jokerman, mfojtik, mmccomas, yinzhou
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:47:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mike Dame 2019-04-18 15:18:20 UTC
When an operator config is deleted, we lose the revision number when it is created which can cause conflict issues. We should instead check in the revision controller for a current revision status of 0, and if so search through available revision configmaps for the last known revision and set the status to that revision number, which may just kick off a new revision anyway.

Comment 1 Mike Dame 2019-04-24 19:30:14 UTC
PR: https://github.com/openshift/library-go/pull/368

Comment 2 Mike Dame 2019-04-29 13:38:18 UTC
This is actually waiting on the following 3 PRs to be ON_QA:
- https://github.com/openshift/cluster-kube-controller-manager-operator/pull/238
- https://github.com/openshift/cluster-kube-scheduler-operator/pull/116
- https://github.com/openshift/cluster-kube-apiserver-operator/pull/438

Each of these includes this change into those respective operators. Once they have merged, this is ready for testing.

Comment 3 Mike Dame 2019-04-30 13:04:54 UTC
Update: The bumps from those PRs were merged and this is ready for ON_QA testing now

Comment 4 zhou ying 2019-05-06 07:27:12 UTC
Confirmed with latest ocp, the issue has fixed:
[root@dhcp-140-138 ~]# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-0.nightly-2019-05-05-070156   True        False         4h30m   Cluster version is 4.1.0-0.nightly-2019-05-05-070156

[root@dhcp-140-138 ~]# oc delete  kubeapiservers cluster
kubeapiserver.operator.openshift.io "cluster" deleted
[root@dhcp-140-138 ~]# oc get kubeapiservers cluster -o yaml
apiVersion: operator.openshift.io/v1
kind: KubeAPIServer
metadata:
......
  latestAvailableRevision: 9
  nodeStatuses:
  - currentRevision: 8
    nodeName: ip-172-31-128-180.sa-east-1.compute.internal
  - currentRevision: 0
    lastFailedRevision: 8
    nodeName: ip-172-31-131-38.sa-east-1.compute.internal
    targetRevision: 9
  - currentRevision: 0
    nodeName: ip-172-31-156-182.sa-east-1.compute.internal
  readyReplicas: 0


[root@dhcp-140-138 ~]# oc get po 
NAME                                                             READY   STATUS      RESTARTS   AGE
installer-3-ip-172-31-128-180.sa-east-1.compute.internal         0/1     Completed   0          4h43m
installer-3-ip-172-31-131-38.sa-east-1.compute.internal          0/1     Completed   0          4h45m
installer-3-ip-172-31-156-182.sa-east-1.compute.internal         0/1     Completed   0          4h45m
installer-4-ip-172-31-128-180.sa-east-1.compute.internal         0/1     Completed   0          4h43m
installer-5-ip-172-31-128-180.sa-east-1.compute.internal         0/1     Completed   0          4h41m
installer-5-ip-172-31-131-38.sa-east-1.compute.internal          0/1     Completed   0          4h39m
installer-5-ip-172-31-156-182.sa-east-1.compute.internal         0/1     Completed   0          4h37m
installer-6-ip-172-31-128-180.sa-east-1.compute.internal         0/1     Completed   0          4h32m
installer-6-ip-172-31-131-38.sa-east-1.compute.internal          0/1     Completed   0          4h34m
installer-6-ip-172-31-156-182.sa-east-1.compute.internal         0/1     Completed   0          4h36m
installer-7-ip-172-31-128-180.sa-east-1.compute.internal         0/1     Completed   0          3h55m
installer-7-ip-172-31-131-38.sa-east-1.compute.internal          0/1     Completed   0          3h58m
installer-7-ip-172-31-156-182.sa-east-1.compute.internal         0/1     Completed   0          3h56m
installer-8-ip-172-31-128-180.sa-east-1.compute.internal         0/1     Completed   0          58m
installer-8-ip-172-31-131-38.sa-east-1.compute.internal          0/1     Completed   0          61m
installer-8-ip-172-31-156-182.sa-east-1.compute.internal         0/1     Completed   0          59m
installer-9-ip-172-31-128-180.sa-east-1.compute.internal         0/1     Completed   0          6m43s
installer-9-ip-172-31-131-38.sa-east-1.compute.internal          0/1     Completed   0          10m
installer-9-ip-172-31-156-182.sa-east-1.compute.internal         0/1     Completed   0          8m26s
kube-apiserver-ip-172-31-128-180.sa-east-1.compute.internal      2/2     Running     0          6m31s
kube-apiserver-ip-172-31-131-38.sa-east-1.compute.internal       2/2     Running     0          9m57s
kube-apiserver-ip-172-31-156-182.sa-east-1.compute.internal      2/2     Running     0          8m14s
revision-pruner-3-ip-172-31-128-180.sa-east-1.compute.internal   0/1     Completed   0          4h43m
revision-pruner-3-ip-172-31-131-38.sa-east-1.compute.internal    0/1     Completed   0          4h43m
revision-pruner-3-ip-172-31-156-182.sa-east-1.compute.internal   0/1     Completed   0          4h43m
revision-pruner-4-ip-172-31-128-180.sa-east-1.compute.internal   0/1     Completed   0          4h41m
revision-pruner-5-ip-172-31-128-180.sa-east-1.compute.internal   0/1     Completed   0          4h39m
revision-pruner-5-ip-172-31-131-38.sa-east-1.compute.internal    0/1     Completed   0          4h37m
revision-pruner-5-ip-172-31-156-182.sa-east-1.compute.internal   0/1     Completed   0          4h36m
revision-pruner-6-ip-172-31-128-180.sa-east-1.compute.internal   0/1     Completed   0          4h30m
revision-pruner-6-ip-172-31-131-38.sa-east-1.compute.internal    0/1     Completed   0          4h32m
revision-pruner-6-ip-172-31-156-182.sa-east-1.compute.internal   0/1     Completed   0          4h34m
revision-pruner-7-ip-172-31-128-180.sa-east-1.compute.internal   0/1     Completed   0          3h53m
revision-pruner-7-ip-172-31-131-38.sa-east-1.compute.internal    0/1     Completed   0          3h56m
revision-pruner-7-ip-172-31-156-182.sa-east-1.compute.internal   0/1     OOMKilled   0          3h55m
revision-pruner-8-ip-172-31-128-180.sa-east-1.compute.internal   0/1     Completed   0          56m
revision-pruner-8-ip-172-31-131-38.sa-east-1.compute.internal    0/1     Completed   0          60m
revision-pruner-8-ip-172-31-156-182.sa-east-1.compute.internal   0/1     Completed   0          58m
revision-pruner-9-ip-172-31-128-180.sa-east-1.compute.internal   0/1     Completed   0          5m2s
revision-pruner-9-ip-172-31-131-38.sa-east-1.compute.internal    0/1     Completed   0          8m30s
revision-pruner-9-ip-172-31-156-182.sa-east-1.compute.internal   0/1     Completed   0          6m46s

Comment 6 errata-xmlrpc 2019-06-04 10:47:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758