Bug 1782683

Summary: [Disconnected] openshift-samples operator setting management state to Removed does not complete while Progressing==true
Product: OpenShift Container Platform Reporter: Johnny Liu <jialiu>
Component: SamplesAssignee: Gabe Montero <gmontero>
Status: CLOSED ERRATA QA Contact: XiuJuan Wang <xiuwang>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.2.zCC: adam.kaplan, bparees, gmontero, jialiu, wzheng, xiuwang
Target Milestone: ---Keywords: Regression, Reopened
Target Release: 4.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Samples operator would delay moving to Removed management state while imagestream imports were in progress Consequence: If those imagestream imports were doomed to fail and retry forever for reasons like lack of connectivity to the source registry, imports would be in progress for a very long time and prevent removed processing to occur Fix: samples operator was changed to not gate moving to Removed state if imagestream imports were still in progress Result: administrators can now switch samples operator to removed quickly in cases where sample imagestream imports are doomed for failure, like when connectivity to the source registry does not exist
Story Points: ---
Clone Of: 1772178
: 1805615 1805815 (view as bug list) Environment:
Last Closed: 2020-05-13 21:54:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1805615, 1805815    
Attachments:
Description Flags
cluster-samples-operator.log
none
samples config logs
none
samples operator pod log none

Comment 1 Gabe Montero 2019-12-12 14:49:25 UTC
How long had it been between when the config object was created at 2019-12-12T06:01:54Z
and last updated at 2019-12-12T06:02:39Z and when those `oc get` call were made?


Also, at this point I'm going to need the pod logs for the samples operator in the openshift-cluster-samples-operator namespace
when the error occurs.


Looking at the tests at https://openshift-release.svc.ci.openshift.org/releasestream/4.2.0-0.nightly/release/4.2.0-0.nightly-2019-12-11-171302
it is unclear to me which one might be the QE's CI job (in fact I suspect none of them are).

Comment 2 Johnny Liu 2019-12-13 02:29:31 UTC
QE's ci job is NOT on  https://openshift-release.svc.ci.openshift.org/releasestream/4.2.0-0.nightly/release/4.2.0-0.nightly-2019-12-11-171302.

I just tried to reproduce this bug with the same payload image, did not reproduce it, maybe just a flake. I will keep an eye on it,

Comment 3 Gabe Montero 2019-12-13 15:38:51 UTC
OK thanks for the update.

Yeah let's keep this open for a bit, see what happens.  Where if it happens again, get me the pod logs along with the samples config yaml.

Comment 4 Gabe Montero 2019-12-13 20:27:02 UTC
adjusting severity given intermittent nature

Comment 9 Johnny Liu 2020-01-03 07:23:24 UTC
Created attachment 1649330 [details]
cluster-samples-operator.log

Comment 14 XiuJuan Wang 2020-01-16 07:17:00 UTC
Created attachment 1652647 [details]
samples config logs

Comment 15 XiuJuan Wang 2020-01-16 07:22:23 UTC
Created attachment 1652648 [details]
samples operator pod log

Don't met installation blocked by samples operator failure,then remove processing is not so longer as comment #11, around 10 mins.
If the log is enough, I will paste more log when met the installation blocked by samples operator failure

Comment 18 XiuJuan Wang 2020-02-03 07:39:49 UTC
Set samples operator to Removed when the processing=true, tried ten times, all succeed in 4.4.0-0.nightly-2020-02-03-021633
The longest during time is 5 mins, it's acceptable

$ oc get co openshift-samples -o yaml 
apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
  creationTimestamp: "2020-02-03T06:57:08Z"
  generation: 1
  name: openshift-samples
  resourceVersion: "32088"
  selfLink: /apis/config.openshift.io/v1/clusteroperators/openshift-samples
  uid: 9fe06cb4-d4ba-4739-b8e0-a1340b0ceb68
spec: {}
status:
  conditions:
  - lastTransitionTime: "2020-02-03T07:28:52Z"
    message: Samples processing to 4.4.0-0.nightly-2020-02-03-021633
    status: "True"
    type: Progressing
  - lastTransitionTime: "2020-02-03T07:03:46Z"
    status: "False"
    type: Degraded
  - lastTransitionTime: "2020-02-03T07:03:50Z"
    message: Samples installation successful at 4.4.0-0.nightly-2020-02-03-021633
    status: "True"
    type: Available
  extension: null
  relatedObjects:
  - group: samples.operator.openshift.io
    name: cluster
    resource: configs
  - group: ""
    name: openshift-cluster-samples-operator
    resource: namespaces
  - group: ""
    name: openshift
    resource: namespaces
  versions:
  - name: operator
    version: 4.4.0-0.nightly-2020-02-03-021633

$ oc get  config.samples -o yaml 
apiVersion: v1
items:
- apiVersion: samples.operator.openshift.io/v1
  kind: Config
  metadata:
    creationTimestamp: "2020-02-03T06:57:08Z"
    finalizers:
    - samples.operator.openshift.io/finalizer
    generation: 4
    name: cluster
    resourceVersion: "32307"
    selfLink: /apis/samples.operator.openshift.io/v1/configs/cluster
    uid: e07adbca-edab-4347-bfec-b527b4eaf9a0
  spec:
    architectures:
    - x86_64
    managementState: Removed
  status:
    architectures:
    - x86_64
    conditions:
    - lastTransitionTime: "2020-02-03T06:57:14Z"
      lastUpdateTime: "2020-02-03T06:57:14Z"
      status: "False"
      type: RemovePending
    - lastTransitionTime: "2020-02-03T06:57:10Z"
      lastUpdateTime: "2020-02-03T06:57:10Z"
      status: "True"
      type: ImportCredentialsExist
    - lastTransitionTime: "2020-02-03T07:03:46Z"
      lastUpdateTime: "2020-02-03T07:03:46Z"
      status: "True"
      type: SamplesExist
    - lastTransitionTime: "2020-02-03T07:28:49Z"
      lastUpdateTime: "2020-02-03T07:28:49Z"
      reason: 'jboss-eap72-openshift apicurito-ui jboss-webserver31-tomcat7-openshift
        jboss-eap70-openshift jenkins-agent-nodejs rhpam-kieserver-rhel8 ruby jboss-webserver30-tomcat8-openshift
        fis-karaf-openshift jboss-fuse70-java-openshift redhat-openjdk18-openshift
        postgresql rhpam-businesscentral-monitoring-rhel8 fis-java-openshift openjdk-8-rhel8
        redis rhpam-smartrouter-rhel8 jboss-webserver50-tomcat9-openshift jboss-datagrid71-openshift
        java mongodb redhat-sso73-openshift jboss-amq-63 jboss-datavirt64-openshift
        jboss-fuse70-console redhat-sso72-openshift jboss-webserver31-tomcat8-openshift
        dotnet-runtime eap-cd-openshift mariadb mysql jboss-processserver64-openshift
        openjdk-11-rhel7 openjdk-11-rhel8 jenkins jboss-datavirt64-driver-openshift
        jboss-eap71-openshift fuse7-eap-openshift fuse7-java-openshift jboss-fuse70-karaf-openshift
        perl apicast-gateway jboss-datagrid73-openshift nodejs jboss-eap64-openshift
        golang redhat-sso71-openshift jenkins-agent-maven modern-webapp rhpam-businesscentral-rhel8
        jboss-datagrid65-client-openshift jboss-datagrid71-client-openshift jboss-datagrid72-openshift
        jboss-decisionserver64-openshift httpd redhat-sso70-openshift jboss-webserver30-tomcat7-openshift
        python dotnet rhdm-optaweb-employee-rostering-rhel8 jboss-amq-62 jboss-datagrid65-openshift
        jboss-fuse70-eap-openshift rhdm-decisioncentral-rhel8 rhdm-kieserver-rhel8
        fuse-apicurito-generator fuse7-console fuse7-karaf-openshift nginx php '
      status: "True"
      type: ImageChangesInProgress
    - lastTransitionTime: "2020-02-03T07:28:49Z"
      lastUpdateTime: "2020-02-03T07:28:49Z"
      message: <imagestream/apicast-gateway>dockerimage.image.openshift.io "xiuwang-gcp-dis.mirror-registry.qe.gcp.devcluster.openshift.com:5000/3scale-amp21/apicast-gateway:1.4-2"
        not found<imagestream/apicast-gateway>
      reason: 'apicast-gateway '
      status: "True"
      type: ImportImageErrorsExist
    - lastTransitionTime: "2020-02-03T07:03:43Z"
      lastUpdateTime: "2020-02-03T07:03:43Z"
      status: "True"
      type: ConfigurationValid
    - lastTransitionTime: "2020-02-03T07:03:43Z"
      lastUpdateTime: "2020-02-03T07:03:43Z"
      status: "False"
      type: MigrationInProgress
    managementState: Managed
    version: 4.4.0-0.nightly-2020-02-03-021633
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

$ oc get co openshift-samples -o yaml 
apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
  creationTimestamp: "2020-02-03T06:57:08Z"
  generation: 1
  name: openshift-samples
  resourceVersion: "33548"
  selfLink: /apis/config.openshift.io/v1/clusteroperators/openshift-samples
  uid: 9fe06cb4-d4ba-4739-b8e0-a1340b0ceb68
spec: {}
status:
  conditions:
  - lastTransitionTime: "2020-02-03T07:33:04Z"
    message: Samples installation was previously successful at 4.4.0-0.nightly-2020-02-03-021633
      but the samples operator is now Removed
    reason: CurrentlyRemoved
    status: "False"
    type: Progressing
  - lastTransitionTime: "2020-02-03T07:33:04Z"
    message: Samples installation was previously successful at 4.4.0-0.nightly-2020-02-03-021633
      but the samples operator is now Removed
    reason: CurrentlyRemoved
    status: "False"
    type: Degraded
  - lastTransitionTime: "2020-02-03T07:33:04Z"
    message: Samples installation was previously successful at 4.4.0-0.nightly-2020-02-03-021633
      but the samples operator is now Removed
    reason: CurrentlyRemoved
    status: "True"
    type: Available
  extension: null
  relatedObjects:
  - group: samples.operator.openshift.io
    name: cluster
    resource: configs
  - group: ""
    name: openshift-cluster-samples-operator
    resource: namespaces
  - group: ""
    name: openshift
    resource: namespaces
  versions:
  - name: operator
    version: 4.4.0-0.nightly-2020-02-03-021633

Comment 20 errata-xmlrpc 2020-05-13 21:54:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581