Bug 1701316 - Need operator deletion recovery process
Summary: Need operator deletion recovery process
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Master
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.1.0
Assignee: Mike Dame
QA Contact: zhou ying
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-18 15:18 UTC by Mike Dame
Modified: 2019-06-04 10:47 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-04 10:47:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 0 None None None 2019-06-04 10:47:55 UTC

Description Mike Dame 2019-04-18 15:18:20 UTC
When an operator config is deleted, we lose the revision number when it is created which can cause conflict issues. We should instead check in the revision controller for a current revision status of 0, and if so search through available revision configmaps for the last known revision and set the status to that revision number, which may just kick off a new revision anyway.

Comment 1 Mike Dame 2019-04-24 19:30:14 UTC
PR: https://github.com/openshift/library-go/pull/368

Comment 2 Mike Dame 2019-04-29 13:38:18 UTC
This is actually waiting on the following 3 PRs to be ON_QA:
- https://github.com/openshift/cluster-kube-controller-manager-operator/pull/238
- https://github.com/openshift/cluster-kube-scheduler-operator/pull/116
- https://github.com/openshift/cluster-kube-apiserver-operator/pull/438

Each of these includes this change into those respective operators. Once they have merged, this is ready for testing.

Comment 3 Mike Dame 2019-04-30 13:04:54 UTC
Update: The bumps from those PRs were merged and this is ready for ON_QA testing now

Comment 4 zhou ying 2019-05-06 07:27:12 UTC
Confirmed with latest ocp, the issue has fixed:
[root@dhcp-140-138 ~]# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-0.nightly-2019-05-05-070156   True        False         4h30m   Cluster version is 4.1.0-0.nightly-2019-05-05-070156

[root@dhcp-140-138 ~]# oc delete  kubeapiservers cluster
kubeapiserver.operator.openshift.io "cluster" deleted
[root@dhcp-140-138 ~]# oc get kubeapiservers cluster -o yaml
apiVersion: operator.openshift.io/v1
kind: KubeAPIServer
metadata:
......
  latestAvailableRevision: 9
  nodeStatuses:
  - currentRevision: 8
    nodeName: ip-172-31-128-180.sa-east-1.compute.internal
  - currentRevision: 0
    lastFailedRevision: 8
    nodeName: ip-172-31-131-38.sa-east-1.compute.internal
    targetRevision: 9
  - currentRevision: 0
    nodeName: ip-172-31-156-182.sa-east-1.compute.internal
  readyReplicas: 0


[root@dhcp-140-138 ~]# oc get po 
NAME                                                             READY   STATUS      RESTARTS   AGE
installer-3-ip-172-31-128-180.sa-east-1.compute.internal         0/1     Completed   0          4h43m
installer-3-ip-172-31-131-38.sa-east-1.compute.internal          0/1     Completed   0          4h45m
installer-3-ip-172-31-156-182.sa-east-1.compute.internal         0/1     Completed   0          4h45m
installer-4-ip-172-31-128-180.sa-east-1.compute.internal         0/1     Completed   0          4h43m
installer-5-ip-172-31-128-180.sa-east-1.compute.internal         0/1     Completed   0          4h41m
installer-5-ip-172-31-131-38.sa-east-1.compute.internal          0/1     Completed   0          4h39m
installer-5-ip-172-31-156-182.sa-east-1.compute.internal         0/1     Completed   0          4h37m
installer-6-ip-172-31-128-180.sa-east-1.compute.internal         0/1     Completed   0          4h32m
installer-6-ip-172-31-131-38.sa-east-1.compute.internal          0/1     Completed   0          4h34m
installer-6-ip-172-31-156-182.sa-east-1.compute.internal         0/1     Completed   0          4h36m
installer-7-ip-172-31-128-180.sa-east-1.compute.internal         0/1     Completed   0          3h55m
installer-7-ip-172-31-131-38.sa-east-1.compute.internal          0/1     Completed   0          3h58m
installer-7-ip-172-31-156-182.sa-east-1.compute.internal         0/1     Completed   0          3h56m
installer-8-ip-172-31-128-180.sa-east-1.compute.internal         0/1     Completed   0          58m
installer-8-ip-172-31-131-38.sa-east-1.compute.internal          0/1     Completed   0          61m
installer-8-ip-172-31-156-182.sa-east-1.compute.internal         0/1     Completed   0          59m
installer-9-ip-172-31-128-180.sa-east-1.compute.internal         0/1     Completed   0          6m43s
installer-9-ip-172-31-131-38.sa-east-1.compute.internal          0/1     Completed   0          10m
installer-9-ip-172-31-156-182.sa-east-1.compute.internal         0/1     Completed   0          8m26s
kube-apiserver-ip-172-31-128-180.sa-east-1.compute.internal      2/2     Running     0          6m31s
kube-apiserver-ip-172-31-131-38.sa-east-1.compute.internal       2/2     Running     0          9m57s
kube-apiserver-ip-172-31-156-182.sa-east-1.compute.internal      2/2     Running     0          8m14s
revision-pruner-3-ip-172-31-128-180.sa-east-1.compute.internal   0/1     Completed   0          4h43m
revision-pruner-3-ip-172-31-131-38.sa-east-1.compute.internal    0/1     Completed   0          4h43m
revision-pruner-3-ip-172-31-156-182.sa-east-1.compute.internal   0/1     Completed   0          4h43m
revision-pruner-4-ip-172-31-128-180.sa-east-1.compute.internal   0/1     Completed   0          4h41m
revision-pruner-5-ip-172-31-128-180.sa-east-1.compute.internal   0/1     Completed   0          4h39m
revision-pruner-5-ip-172-31-131-38.sa-east-1.compute.internal    0/1     Completed   0          4h37m
revision-pruner-5-ip-172-31-156-182.sa-east-1.compute.internal   0/1     Completed   0          4h36m
revision-pruner-6-ip-172-31-128-180.sa-east-1.compute.internal   0/1     Completed   0          4h30m
revision-pruner-6-ip-172-31-131-38.sa-east-1.compute.internal    0/1     Completed   0          4h32m
revision-pruner-6-ip-172-31-156-182.sa-east-1.compute.internal   0/1     Completed   0          4h34m
revision-pruner-7-ip-172-31-128-180.sa-east-1.compute.internal   0/1     Completed   0          3h53m
revision-pruner-7-ip-172-31-131-38.sa-east-1.compute.internal    0/1     Completed   0          3h56m
revision-pruner-7-ip-172-31-156-182.sa-east-1.compute.internal   0/1     OOMKilled   0          3h55m
revision-pruner-8-ip-172-31-128-180.sa-east-1.compute.internal   0/1     Completed   0          56m
revision-pruner-8-ip-172-31-131-38.sa-east-1.compute.internal    0/1     Completed   0          60m
revision-pruner-8-ip-172-31-156-182.sa-east-1.compute.internal   0/1     Completed   0          58m
revision-pruner-9-ip-172-31-128-180.sa-east-1.compute.internal   0/1     Completed   0          5m2s
revision-pruner-9-ip-172-31-131-38.sa-east-1.compute.internal    0/1     Completed   0          8m30s
revision-pruner-9-ip-172-31-156-182.sa-east-1.compute.internal   0/1     Completed   0          6m46s

Comment 6 errata-xmlrpc 2019-06-04 10:47:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.