Bug 1663406 - status.managementstate can't be changed to 'Removed' in samplesresources when some imagestreams import failed
Summary: status.managementstate can't be changed to 'Removed' in samplesresources when...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Image
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.1.0
Assignee: Gabe Montero
QA Contact: XiuJuan Wang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-01-04 08:29 UTC by XiuJuan Wang
Modified: 2019-06-04 10:41 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2019-06-04 10:41:28 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 None None None 2019-06-04 10:41:36 UTC

Description XiuJuan Wang 2019-01-04 08:29:04 UTC
Description of problem:
When some imagestreams import failed, the ImageChangesInProgress always keep to true. Meantime save managementstate to 'Removed'. The status.managementstate would keep in 'Managed' forever. This conduce imagestreams and templates not removed.

Version-Release number of selected component (if applicable):

$ oc get clusterversion version 
NAME      VERSION                           AVAILABLE   PROGRESSING   SINCE     STATUS
version   4.0.0-0.alpha-2019-01-03-031244   True        False         2h        Cluster version is 4.0.0-0.alpha-2019-01-03-031244

How reproducible:
always

Steps to Reproduce:
1.switch installtype to rhel in samplesresources and create the credentials for registry.redhat.io
2.Wait mins, check imagestreams under openshift project
jenkins imagestream import failed due to image not found
3.Save managementstate to removed.
4.Check samplesresources
5.Check imagestreams and templates

Actual results:
step4: ImageChangesInProgress has keet true for 20mins+, and status.mangementstate is still managed.

$ oc get samplesresources  -o yaml 
apiVersion: v1
items:
- apiVersion: samplesoperator.config.openshift.io/v1alpha1
  kind: SamplesResource
  metadata:
    creationTimestamp: 2019-01-04T05:35:26Z
    finalizers:
    - samplesoperator.config.openshift.io/finalizer
    generation: 1
    name: openshift-samples
    namespace: ""
    resourceVersion: "217615"
    selfLink: /apis/samplesoperator.config.openshift.io/v1alpha1/samplesresources/openshift-samples
    uid: 8a41e914-0fe2-11e9-a8af-029e36a3ae62
  spec:
    architectures:
    - x86_64
    installType: rhel
    managementState: Removed
    version: 4.0.0-alpha1-85ee5a974
  status:
    architectures:
    - x86_64
    conditions:
    - lastTransitionTime: 2019-01-04T07:41:26Z
      lastUpdateTime: 2019-01-04T07:41:26Z
      status: "True"
      type: SamplesExist
    - lastTransitionTime: 2019-01-04T07:37:40Z
      lastUpdateTime: 2019-01-04T07:37:40Z
      status: "True"
      type: ImportCredentialsExists
    - lastTransitionTime: 2019-01-04T05:47:51Z
      lastUpdateTime: 2019-01-04T05:47:51Z
      status: "True"
      type: ConfigurationValid
    - lastTransitionTime: 2019-01-04T08:13:52Z
      lastUpdateTime: 2019-01-04T08:13:52Z
      reason: 'jenkins dotnet-runtime '
      status: "True"
      type: ImageChangesInProgress
    - lastTransitionTime: 2019-01-04T07:49:20Z
      lastUpdateTime: 2019-01-04T07:49:20Z
      status: "True"
      type: PendingRemove
    - lastTransitionTime: 2019-01-04T05:35:23Z
      lastUpdateTime: 2019-01-04T05:35:23Z
      status: "False"
      type: MigrationInProgress
    - lastTransitionTime: 2019-01-04T08:13:52Z
      lastUpdateTime: 2019-01-04T08:13:52Z
      status: "False"
      type: ImportImageErrorsExist
    installType: rhel
    managementState: Managed
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

step5: imagestreams and templates both exist.

Expected results:


Additional info:

The operator log after change managementstage to Removed.

time="2019-01-04T07:41:31Z" level=error msg="error syncing key (openshift/jboss-datagrid72-openshift): Operation cannot be fulfilled on samplesresources.samplesoperator.config.openshift.io \"openshift-samples\": the object has been modified; please apply your changes to the latest version and try again"
time="2019-01-04T07:46:29Z" level=info msg="watch event cli not part of operators inventory"
time="2019-01-04T07:47:00Z" level=info msg="processing secret watch event while in Managed state; deletion event: false"
time="2019-01-04T07:47:00Z" level=info msg="updating dockerconfig secret samples-registry-credentials in openshift namespace"
time="2019-01-04T07:47:05Z" level=error msg="error syncing key (openshift/jboss-webserver31-tomcat8-openshift): Operation cannot be fulfilled on samplesresources.samplesoperator.config.openshift.io \"openshift-samples\": the object has been modified; please apply your changes to the latest version and try again"
ERROR: logging before flag.Parse: W0104 07:47:06.118674       1 reflector.go:341] github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk/informer.go:84: watch of *unstructured.Unstructured ended with: very short watch: github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk/informer.go:84: Unexpected watch close - watch lasted less than a second and no items received
ERROR: logging before flag.Parse: W0104 07:47:09.729494       1 reflector.go:341] github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk/informer.go:84: watch of *unstructured.Unstructured ended with: very short watch: github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk/informer.go:84: Unexpected watch close - watch lasted less than a second and no items received
time="2019-01-04T07:49:24Z" level=error msg="error syncing key (openshift/jboss-fuse70-karaf-openshift): Operation cannot be fulfilled on samplesresources.samplesoperator.config.openshift.io \"openshift-samples\": the object has been modified; please apply your changes to the latest version and try again"
time="2019-01-04T07:49:37Z" level=info msg="processing secret watch event while in Removed state; deletion event: false"
time="2019-01-04T07:49:37Z" level=info msg="creation of credential in openshift namespace recognized"
time="2019-01-04T07:50:21Z" level=info msg="watch event cli not part of operators inventory"
ERROR: logging before flag.Parse: W0104 07:55:52.248677       1 reflector.go:341] github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk/informer.go:84: watch of *unstructured.Unstructured ended with: very short watch: github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk/informer.go:84: Unexpected watch close - watch lasted less than a second and no items received
time="2019-01-04T07:56:08Z" level=info msg="watch event cli not part of operators inventory"
time="2019-01-04T07:57:00Z" level=info msg="processing secret watch event while in Removed state; deletion event: false"
time="2019-01-04T07:57:00Z" level=info msg="updating dockerconfig secret samples-registry-credentials in openshift namespace"
time="2019-01-04T07:57:05Z" level=error msg="error syncing key (openshift/fuse7-karaf-openshift): Operation cannot be fulfilled on samplesresources.samplesoperator.config.openshift.io \"openshift-samples\": the object has been modified; please apply your changes to the latest version and try again"
time="2019-01-04T07:59:37Z" level=info msg="processing secret watch event while in Removed state; deletion event: false"
time="2019-01-04T07:59:37Z" level=info msg="creation of credential in openshift namespace recognized"
ERROR: logging before flag.Parse: W0104 08:02:05.033031       1 reflector.go:341] github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk/informer.go:84: watch of *unstructured.Unstructured ended with: very short watch: github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk/informer.go:84: Unexpected watch close - watch lasted less than a second and no items received
time="2019-01-04T08:07:00Z" level=info msg="processing secret watch event while in Removed state; deletion event: false"
time="2019-01-04T08:07:00Z" level=info msg="updating dockerconfig secret samples-registry-credentials in openshift namespace"
time="2019-01-04T08:07:03Z" level=error msg="error syncing key (openshift-cluster-samples-operator/samples-registry-credentials): Operation cannot be fulfilled on samplesresources.samplesoperator.config.openshift.io \"openshift-samples\": the object has been modified; please apply your changes to the latest version and try again"
time="2019-01-04T08:07:06Z" level=info msg="processing secret watch event while in Removed state; deletion event: false"
time="2019-01-04T08:07:06Z" level=info msg="updating dockerconfig secret samples-registry-credentials in openshift namespace"
time="2019-01-04T08:07:09Z" level=error msg="error syncing key (openshift-cluster-samples-operator/samples-registry-credentials): Operation cannot be fulfilled on samplesresources.samplesoperator.config.openshift.io \"openshift-samples\": the object has been modified; please apply your changes to the latest version and try again"
time="2019-01-04T08:07:12Z" level=info msg="processing secret watch event while in Removed state; deletion event: false"
time="2019-01-04T08:07:12Z" level=info msg="updating dockerconfig secret samples-registry-credentials in openshift namespace"
time="2019-01-04T08:07:15Z" level=error msg="error syncing key (openshift-cluster-samples-operator/samples-registry-credentials): Operation cannot be fulfilled on samplesresources.samplesoperator.config.openshift.io \"openshift-samples\": the object has been modified; please apply your changes to the latest version and try again"
time="2019-01-04T08:07:18Z" level=info msg="processing secret watch event while in Removed state; deletion event: false"
time="2019-01-04T08:07:18Z" level=info msg="updating dockerconfig secret samples-registry-credentials in openshift namespace"
time="2019-01-04T08:07:21Z" level=error msg="error syncing key (openshift-cluster-samples-operator/samples-registry-credentials): Operation cannot be fulfilled on samplesresources.samplesoperator.config.openshift.io \"openshift-samples\": the object has been modified; please apply your changes to the latest version and try again"
time="2019-01-04T08:07:24Z" level=info msg="processing secret watch event while in Removed state; deletion event: false"
time="2019-01-04T08:07:24Z" level=info msg="updating dockerconfig secret samples-registry-credentials in openshift namespace"
time="2019-01-04T08:07:27Z" level=error msg="error syncing key (openshift-cluster-samples-operator/samples-registry-credentials): Operation cannot be fulfilled on samplesresources.samplesoperator.config.openshift.io \"openshift-samples\": the object has been modified; please apply your changes to the latest version and try again"
time="2019-01-04T08:07:30Z" level=info msg="processing secret watch event while in Removed state; deletion event: false"
time="2019-01-04T08:07:30Z" level=info msg="updating dockerconfig secret samples-registry-credentials in openshift namespace"
time="2019-01-04T08:07:33Z" level=error msg="error syncing key (openshift-cluster-samples-operator/samples-registry-credentials): Operation cannot be fulfilled on samplesresources.samplesoperator.config.openshift.io \"openshift-samples\": the object has been modified; please apply your changes to the latest version and try again"
time="2019-01-04T08:07:37Z" level=info msg="processing secret watch event while in Removed state; deletion event: false"
time="2019-01-04T08:07:37Z" level=info msg="updating dockerconfig secret samples-registry-credentials in openshift namespace"
time="2019-01-04T08:07:40Z" level=error msg="error syncing key (openshift-cluster-samples-operator/samples-registry-credentials): Operation cannot be fulfilled on samplesresources.samplesoperator.config.openshift.io \"openshift-samples\": the object has been modified; please apply your changes to the latest version and try again"
time="2019-01-04T08:07:43Z" level=info msg="processing secret watch event while in Removed state; deletion event: false"
time="2019-01-04T08:07:43Z" level=info msg="updating dockerconfig secret samples-registry-credentials in openshift namespace"
time="2019-01-04T08:07:46Z" level=error msg="error syncing key (openshift-cluster-samples-operator/samples-registry-credentials): Operation cannot be fulfilled on samplesresources.samplesoperator.config.openshift.io \"openshift-samples\": the object has been modified; please apply your changes to the latest version and try again"
time="2019-01-04T08:07:50Z" level=info msg="processing secret watch event while in Removed state; deletion event: false"
time="2019-01-04T08:07:50Z" level=info msg="updating dockerconfig secret samples-registry-credentials in openshift namespace"
time="2019-01-04T08:07:53Z" level=error msg="error syncing key (openshift-cluster-samples-operator/samples-registry-credentials): Operation cannot be fulfilled on samplesresources.samplesoperator.config.openshift.io \"openshift-samples\": the object has been modified; please apply your changes to the latest version and try again"
time="2019-01-04T08:07:57Z" level=info msg="processing secret watch event while in Removed state; deletion event: false"
time="2019-01-04T08:07:57Z" level=info msg="updating dockerconfig secret samples-registry-credentials in openshift namespace"
time="2019-01-04T08:08:02Z" level=error msg="error syncing key (openshift/rhdm71-optaweb-employee-rostering-openshift): Operation cannot be fulfilled on samplesresources.samplesoperator.config.openshift.io \"openshift-samples\": the object has been modified; please apply your changes to the latest version and try again"
time="2019-01-04T08:09:37Z" level=info msg="processing secret watch event while in Removed state; deletion event: false"
time="2019-01-04T08:09:37Z" level=info msg="creation of credential in openshift namespace recognized"
ERROR: logging before flag.Parse: W0104 08:10:38.378999       1 reflector.go:341] github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk/informer.go:84: watch of *unstructured.Unstructured ended with: very short watch: github.com/openshift/cluster-samples-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk/informer.go:84: Unexpected watch close - watch lasted less than a second and no items received
time="2019-01-04T08:11:46Z" level=info msg="watch event cli not part of operators inventory"
time="2019-01-04T08:17:00Z" level=info msg="processing secret watch event while in Removed state; deletion event: false"
time="2019-01-04T08:17:00Z" level=info msg="updating dockerconfig secret samples-registry-credentials in openshift namespace"

Comment 1 XiuJuan Wang 2019-01-04 08:32:37 UTC
CVO status

Every 2.0s: oc describe clusteroperator openshift-cluster-samples-operator                                                            dhcp-140-96.nay.redhat.com: Fri Jan  4 16:31:25 2019

Name:         openshift-cluster-samples-operator
Namespace:
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
Metadata:
  Creation Timestamp:  2019-01-04T05:35:26Z
  Generation:          1
  Resource Version:    240219
  Self Link:           /apis/config.openshift.io/v1/clusteroperators/openshift-cluster-samples-operator
  UID:                 8a55e802-0fe2-11e9-a8af-029e36a3ae62
Spec:
Status:
  Conditions:
    Last Transition Time:  2019-01-04T08:31:23Z
    Status:                False
    Type:                  Available
    Last Transition Time:  2019-01-04T07:41:30Z
    Message:               Samples moving to 4.0.0-alpha1-85ee5a974
    Status:                True
    Type:                  Progressing
    Last Transition Time:  2019-01-04T08:31:23Z
    Status:                False
    Type:                  Failing
  Extension:               <nil>
  Version:                 
Events:                    <none>

Comment 2 Gabe Montero 2019-01-04 16:24:53 UTC
Hey @XiuJuan,

Would you have the portion of the pod logs that include the "Image import for imagestream %s failed with reason %s and detailed message %s" 
when the jenkins imagestream import failed?

Comment 3 Gabe Montero 2019-01-04 16:25:57 UTC
In case it was not obvious, the image stream name goes in place of the %s of the message I noted in my previous comment.

Comment 4 Gabe Montero 2019-01-04 16:30:26 UTC
Also provide the yaml for the jenkins image stream when the import error occurs.

Comment 5 Gabe Montero 2019-01-04 19:47:49 UTC
I might have a clue as to what is going on based on re-checking the image api in openshift/origin

the jenkins imagestream yaml when you get the particular import failure you are producing should 
be key here.

And if it is what I'm suspecting, I bet there is *NOT* a message in the entire pod logs like "Image import for imagestream %s failed with reason %s and detailed message %s"

Comment 6 XiuJuan Wang 2019-01-07 09:46:16 UTC
I didn't install a new cluster with next gen installer successfully today.
In my memory, there is no log about the jenkins imagestream import failure message in sample operator pod.
I would paste more info after I install a new cluster

Comment 7 Gabe Montero 2019-01-07 14:46:44 UTC
Hey @XiuJuan Wang

OK, if there is not log, that as I was suspecting in https://bugzilla.redhat.com/show_bug.cgi?id=1663406#c5
the samples operator did *NOT* properly detect the import failure.  Given what the ImageChanges condition
reports, that is what is happening.

That is where running "oc get is jenkins -n openshift -o yaml" after the import failure like I noted 
in https://bugzilla.redhat.com/show_bug.cgi?id=1663406#c4 is key.

Based on how the error you got is represented in the jenkins imagestream yaml, I can adjust the 
error detection logic accordingly.

Comment 8 XiuJuan Wang 2019-01-08 06:44:08 UTC
Confirmed that no logs about import failure in operator pod.
Here is jenkins import error due to registry.redhat.io/openshift/jenkins-2-rhel7:v4.0 not exist.

$ oc get is  jenkins  -o yaml  -n openshift 
apiVersion: image.openshift.io/v1
kind: ImageStream
metadata:
  annotations:
    openshift.io/display-name: Jenkins
    openshift.io/image.dockerRepositoryCheck: 2019-01-08T06:33:20Z
    samplesoperator.config.openshift.io/version: 4.0.0-alpha1-85ee5a974
  creationTimestamp: 2019-01-08T06:32:42Z
  generation: 2
  labels:
    samplesoperator.config.openshift.io/managed: "true"
  name: jenkins
  namespace: openshift
  resourceVersion: "224166"
  selfLink: /apis/image.openshift.io/v1/namespaces/openshift/imagestreams/jenkins
  uid: 33933ac2-130f-11e9-b397-0a580a800010
spec:
  lookupPolicy:
    local: false
  tags:
  - annotations:
      description: Provides a Jenkins 1.X server on RHEL 7. For more information about
        using this container image, including OpenShift considerations, see https://github.com/openshift/jenkins/blob/master/README.md.
      iconClass: icon-jenkins
      openshift.io/display-name: Jenkins 1.X
      openshift.io/provider-display-name: Red Hat, Inc.
      tags: hidden,jenkins
      version: 1.x
    from:
      kind: DockerImage
      name: registry.redhat.io/openshift3/jenkins-1-rhel7:latest
    generation: 2
    importPolicy: {}
    name: "1"
    referencePolicy:
      type: Local
  - annotations:
      description: Provides a Jenkins 2.X server on RHEL 7. For more information about
        using this container image, including OpenShift considerations, see https://github.com/openshift/jenkins/blob/master/README.md.
      iconClass: icon-jenkins
      openshift.io/display-name: Jenkins 2.X
      openshift.io/provider-display-name: Red Hat, Inc.
      tags: jenkins
      version: 2.x
    from:
      kind: DockerImage
      name: registry.redhat.io/openshift/jenkins-2-rhel7:v4.0
    generation: 2
    importPolicy: {}
    name: "2"
    referencePolicy:
      type: Local
  - annotations:
      description: |-
        Provides a Jenkins server on RHEL 7. For more information about using this container image, including OpenShift considerations, see https://github.com/openshift/jenkins/blob/master/README.md.

        WARNING: By selecting this tag, your application will automatically update to use the latest version of Jenkins available on OpenShift, including major versions updates.
      iconClass: icon-jenkins
      openshift.io/display-name: Jenkins (Latest)
      openshift.io/provider-display-name: Red Hat, Inc.
      tags: jenkins
    from:
      kind: ImageStreamTag
      name: "2"
    generation: 1
    importPolicy: {}
    name: latest
    referencePolicy:
      type: Local
status:
  dockerImageRepository: image-registry.openshift-image-registry.svc:5000/openshift/jenkins
  tags:
  - items:
    - created: 2019-01-08T06:33:20Z
      dockerImageReference: registry.redhat.io/openshift3/jenkins-1-rhel7@sha256:3ae2a9ea40f6dab95ce85febe7eaf36807dda14c8698d93afb6431a5077ed09b
      generation: 2
      image: sha256:3ae2a9ea40f6dab95ce85febe7eaf36807dda14c8698d93afb6431a5077ed09b
    tag: "1"
  - conditions:
    - generation: 2
      lastTransitionTime: 2019-01-08T06:33:20Z
      message: 'Internal error occurred: unknown: Not Found'
      reason: InternalError
      status: "False"
      type: ImportSuccess
    items: null
    tag: "2"

Comment 9 XiuJuan Wang 2019-01-08 07:36:43 UTC
Another way to reproduce this ImageChangesInProgress always true issue:

step1: When installtype is centos, set management to Removed.
step2: After imagestreams|templates deleted, change management to Managed, installtype to rhel.
During this change, the samplesresources.stauts.installtype is delay to update, still be centos. And some imagestreams have been created.
After save step2, the processing imagestream will import failed with error "Internal error occurred: Get https://registry.redhat.io/v2/rhscl/*****/manifests/latest: unauthorized: Please login to the Red Hat Registry using your Customer Portal credentials. Further instructions can be found here: https://access.redhat.com/articles/3399531". ImageChangesInProgress will stuck in the failed imagestreams.

Then spec.installtype will mismatch with status.installtype. Two errors come out "Cannot create rhel imagestreams to registry.redhat.io without the credentials being available" and "cannot change installtype from centos to rhel"

To resolve the import error: Delete the imagestream, the recreate one will import succeed....

Spec:
  Architectures:
    x86_64
  Install Type:      rhel
  Management State:  Managed
  Skipped Imagestreams:
    jenkins
  Version:  4.0.0-alpha1-85ee5a974
Status:
  Architectures:
    x86_64
  Conditions:
    Last Transition Time:  2019-01-08T07:02:16Z
    Last Update Time:      2019-01-08T07:02:16Z
    Status:                False
    Type:                  SamplesExist
    Last Transition Time:  2019-01-08T07:02:32Z
    Last Update Time:      2019-01-08T07:02:32Z
    Message:               Cannot create rhel imagestreams to registry.redhat.io without the credentials being available
    Status:                False
    Type:                  ImportCredentialsExists
    Last Transition Time:  2019-01-08T07:02:39Z
    Last Update Time:      2019-01-08T07:02:39Z
    Message:               cannot change installtype from centos to rhel
    Status:                False
    Type:                  ConfigurationValid
    Last Transition Time:  2019-01-08T07:08:02Z
    Last Update Time:      2019-01-08T07:08:02Z
    Reason:                nginx 
    Status:                True
    Type:                  ImageChangesInProgress
    Last Transition Time:  2019-01-08T06:32:25Z
    Last Update Time:      2019-01-08T06:32:25Z
    Status:                False
    Type:                  PendingRemove
    Last Transition Time:  2019-01-08T06:32:25Z
    Last Update Time:      2019-01-08T06:32:25Z
    Status:                False
    Type:                  MigrationInProgress
    Last Transition Time:  2019-01-08T07:08:02Z
    Last Update Time:      2019-01-08T07:08:02Z
    Status:                False
    Type:                  ImportImageErrorsExist
  Install Type:            centos
  Management State:        Managed
  Skipped Imagestreams:
    jenkins
Events:  <none>


More info : http://pastebin.test.redhat.com/692111

Comment 10 Gabe Montero 2019-01-08 16:24:12 UTC
OK, the yaml in https://bugzilla.redhat.com/show_bug.cgi?id=1663406#c8 is what I needed.

The *difference* in the imagestream yaml from the errors I produced during testing is that 
the status generation is getting updated even with errors.

One explanation could be that it was initially OK, then started having issues say on a scheduled
import.  Or their has been a behavior change.  Or something went amiss during my original testing.

In any event, the change to the import error logic to address either situation is pretty 
straight forward.  Should have a PR up fairly soon.

The same basic thing occurred in the imagestream with the scenario from https://bugzilla.redhat.com/show_bug.cgi?id=1663406#c9

Also, per the comment #c9, the the samplesresources.stauts.installtype is delay to update won't get updated until
all the image in progress stuff is complete.

Comment 11 Gabe Montero 2019-01-08 19:50:42 UTC
OK I've got commit https://github.com/gabemontero/cluster-samples-operator/commit/b6ef465f4262c0aa4d3c2ea7cd962a76244cdd0e
pushed that should address the import errors noted in this bugzilla.

It is based on the current state of https://github.com/openshift/cluster-samples-operator/pull/71 (migrating off of the 
operator SDK), but that PR is still under review, so some underlying things may change, so I'm not going to create 
a PR for this fix just yet.  When PR 71 merges, I'll rebase the branch and get a PR up for this fix.

Comment 12 Gabe Montero 2019-01-09 14:07:48 UTC
PR https://github.com/openshift/cluster-samples-operator/pull/72 has been created

Comment 13 Gabe Montero 2019-01-10 01:18:21 UTC
PR merged

Comment 15 XiuJuan Wang 2019-01-11 12:12:06 UTC
Didn't fix in 
registry.svc.ci.openshift.org/openshifr/origin-release:4.0.0-0.alpha-2019-01-11-075335
#oc get clusterversion 
NAME      VERSION                           AVAILABLE   PROGRESSING   SINCE     STATUS
version   4.0.0-0.alpha-2019-01-11-075335   True        False         1h        Cluster version is 4.0.0-0.alpha-2019-01-11-075335

$ oc describe configs.samples.operator.openshift.io instance 
Name:         instance
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  samples.operator.openshift.io/v1
Kind:         Config
Metadata:
  Creation Timestamp:  2019-01-11T10:14:21Z
  Finalizers:
    samples.operator.openshift.io/finalizer
  Generation:        1
  Resource Version:  88637
  Self Link:         /apis/samples.operator.openshift.io/v1/configs/instance
  UID:               a9a48155-1589-11e9-9931-0258eced7de4
Spec:
  Architectures:
    x86_64
  Install Type:      rhel
  Management State:  Managed
  Version:           4.0.0-alpha1-137b53463
Status:
  Architectures:
    x86_64
  Conditions:
    Last Transition Time:  2019-01-11T11:57:32Z
    Last Update Time:      2019-01-11T11:57:32Z
    Status:                True
    Type:                  SamplesExist
    Last Transition Time:  2019-01-11T11:55:42Z
    Last Update Time:      2019-01-11T11:55:42Z
    Status:                True
    Type:                  ImportCredentialsExist
    Last Transition Time:  2019-01-11T10:14:17Z
    Last Update Time:      2019-01-11T10:14:17Z
    Status:                True
    Type:                  ConfigurationValid
    Last Transition Time:  2019-01-11T12:10:38Z
    Last Update Time:      2019-01-11T12:10:38Z
    Reason:                jenkins 
    Status:                True
    Type:                  ImageChangesInProgress
    Last Transition Time:  2019-01-11T10:14:17Z
    Last Update Time:      2019-01-11T10:14:17Z
    Status:                False
    Type:                  RemovePending
    Last Transition Time:  2019-01-11T10:14:17Z
    Last Update Time:      2019-01-11T10:14:17Z
    Status:                False
    Type:                  MigrationInProgress
    Last Transition Time:  2019-01-11T12:10:38Z
    Last Update Time:      2019-01-11T12:10:38Z
    Status:                False
    Type:                  ImportImageErrorsExist
  Install Type:            rhel
  Management State:        Managed
Events:                    <none>

$ oc describe clusteroperator openshift-cluster-samples-operator     dhcp-140-96.nay.redhat.com: Fri Jan 11 20:11:01 2019

Name:         openshift-cluster-samples-operator
Namespace:
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
Metadata:
  Creation Timestamp:  2019-01-11T10:14:21Z
  Generation:          1
  Resource Version:    88837
  Self Link:           /apis/config.openshift.io/v1/clusteroperators/openshift-cluster-samples-operator
  UID:                 a9fb4fa4-1589-11e9-9931-0258eced7de4
Spec:
Status:
  Conditions:
    Last Transition Time:  2019-01-11T12:10:59Z
    Status:                False
    Type:                  Available
    Last Transition Time:  2019-01-11T11:57:32Z
    Message:               Samples moving to 4.0.0-alpha1-137b53463
    Status:                True
    Type:                  Progressing
    Last Transition Time:  2019-01-11T12:10:59Z
    Status:                False
    Type:                  Failing
  Extension:               <nil>
  Version:
Events:                    <none>

$ oc describe is jenkins  -n openshift
Name:			jenkins
Namespace:		openshift
Created:		14 minutes ago
Labels:			samples.operator.openshift.io/managed=true
Annotations:		openshift.io/display-name=Jenkins
			openshift.io/image.dockerRepositoryCheck=2019-01-11T11:56:59Z
			samples.operator.openshift.io/version=4.0.0-alpha1-137b53463
Image Repository:	image-registry.openshift-image-registry.svc:5000/openshift/jenkins
Image Lookup:		local=false
Unique Images:		1
Tags:			3

1
  tagged from registry.redhat.io/openshift3/jenkins-1-rhel7:latest
    prefer registry pullthrough when referencing this tag

  Provides a Jenkins 1.X server on RHEL 7. For more information about using this container image, including OpenShift considerations, see https://github.com/openshift/jenkins/blob/master/README.md.
  Tags: hidden, jenkins

  * registry.redhat.io/openshift3/jenkins-1-rhel7@sha256:3ae2a9ea40f6dab95ce85febe7eaf36807dda14c8698d93afb6431a5077ed09b
      14 minutes ago

2 (latest)
  tagged from registry.redhat.io/openshift/jenkins-2-rhel7:v4.0
    prefer registry pullthrough when referencing this tag

  Provides a Jenkins 2.X server on RHEL 7. For more information about using this container image, including OpenShift considerations, see https://github.com/openshift/jenkins/blob/master/README.md.
  Tags: jenkins

  ! error: Import failed (InternalError): Internal error occurred: unknown: Not Found
      14 minutes ago

Comment 16 Gabe Montero 2019-01-11 15:49:10 UTC
I've cc:ed Ben Parees.

Ben - I'm having trouble finding the instructions you sent out last year for internal redhatters to get 
an actual set of credentials to the TBR.  Could you help refresh my memory?

It's possible I need to re-run the precise flow here to reproduce, as my unit tests and other error producing 
scenarios (like accessing the TBR *WITHOUT* any credentials) are not hitting this.

Comment 17 Gabe Montero 2019-01-11 21:17:03 UTC
OK I've obtained TBR credentials and have reproduced this latest incarnation/form .... though why it is occurring is not obvious to me at all at first 
blush.

The jenkins v4.0 tag is missing as expected.  The other samples imagestreams successfully imported from the TBR.

Debugging has started.

Comment 18 Gabe Montero 2019-01-11 22:07:45 UTC
Debugging successful ....

Comment 19 Gabe Montero 2019-01-12 00:59:32 UTC
PR https://github.com/openshift/cluster-samples-operator/pull/79 is up

Comment 20 Gabe Montero 2019-01-12 22:22:02 UTC
PR has merged

Comment 21 XiuJuan Wang 2019-01-14 08:01:36 UTC
Yes, The jenkins v4.0 tag is missing as expected.

I have checked the latest fix 

oc get clusterversion 
NAME      VERSION                           AVAILABLE   PROGRESSING   SINCE     STATUS
version   4.0.0-0.alpha-2019-01-14-015843   True        False         37m       Cluster version is 4.0.0-0.alpha-2019-01-14-015843

configs.samples.operator.openshift.io 4.0.0-alpha1-f76f4f23b

This fix didn't resolved all issues.

Imagestreams import failed which will not block stauts.managedment sync. But ImportImageErrorsExist will be set true, and processing of cvo status always keep true with error.
$oc describe  configs.samples.operator.openshift.io instance 

Status:
  Architectures:
    x86_64
  Conditions:
    Last Transition Time:  2019-01-14T06:05:49Z
    Last Update Time:      2019-01-14T06:05:49Z
    Status:                True
    Type:                  ConfigurationValid
    Last Transition Time:  2019-01-14T06:40:01Z
    Last Update Time:      2019-01-14T06:40:01Z
    Status:                False
    Type:                  ImageChangesInProgress
    Last Transition Time:  2019-01-14T06:33:06Z
    Last Update Time:      2019-01-14T06:33:06Z
    Status:                True
    Type:                  SamplesExist
    Last Transition Time:  2019-01-14T06:31:50Z
    Last Update Time:      2019-01-14T06:31:50Z
    Status:                True
    Type:                  ImportCredentialsExist
    Last Transition Time:  2019-01-14T06:05:49Z
    Last Update Time:      2019-01-14T06:05:49Z
    Status:                False
    Type:                  RemovePending
    Last Transition Time:  2019-01-14T06:05:49Z
    Last Update Time:      2019-01-14T06:05:49Z
    Status:                False
    Type:                  MigrationInProgress
    Last Transition Time:  2019-01-14T06:40:01Z
    Last Update Time:      2019-01-14T06:40:01Z
    Message:               imagestream/jenkins: Internal error occurred: unknown: Not Found; 
    Reason:                jenkins 
    Status:                True
    Type:                  ImportImageErrorsExist
  Install Type:            rhel
  Management State:        Managed
Events:                    <none>


$oc describe clusteroperator openshift-cluster-samples-operator                                                                                     dhcp-140-96.nay.redhat.com: Mon Jan 14 15:59:04 2019
Name:         openshift-cluster-samples-operator
Namespace:
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
Metadata:
  Creation Timestamp:  2019-01-14T06:05:36Z
  Generation:          1
  Resource Version:    79094
  Self Link:           /apis/config.openshift.io/v1/clusteroperators/openshift-cluster-samples-operator
  UID:                 68ca591b-17c2-11e9-af4d-023542499316
Spec:
Status:
  Conditions:
    Last Transition Time:  2019-01-14T07:58:26Z
    Status:                False
    Type:                  Available
    Last Transition Time:  2019-01-14T06:33:53Z
    Message:               Samples installation in error at 4.0.0-alpha1-f76f4f23b: image import problem
    Status:                True
    Type:                  Progressing
    Last Transition Time:  2019-01-14T07:58:26Z
    Message:               Samples installation in error at 4.0.0-alpha1-f76f4f23b: imagestream/jenkins: Internal error occurred: unknown: Not Found;
    Status:                True
    Type:                  Failing
  Extension:               <nil>


Even I have add jenkins imagestreams in the skipped list, the ImportImageErrorsExist is still true. Meantimes processing of cvo status always keep true with error.

$ oc describe is  jenkins  -n openshift  | grep  operator
Labels:			samples.operator.openshift.io/managed=false
			samples.operator.openshift.io/version=4.0.0-alpha1-f76f4f23b

Comment 22 Gabe Montero 2019-01-14 14:41:54 UTC
"But ImportImageErrorsExist will be set true, and processing of cvo status always keep true with error."   *IS* the intended 
behavior per https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/clusteroperator.md#conditions
and 
https://godoc.org/github.com/openshift/api/config/v1#ClusterStatusConditionType

In particular, see the "If an error blocks reaching 4.0.1, the conditions might be:" portion of
https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/clusteroperator.md#conditions

Please reset your test expectations @XiuJuan Wang for those points.

Your point on adding jenkins to the skipped list as a means to bypassing the failure and getting 
to an available state is interesting.  

I'm inclined to agree on that point.

@Ben - what do you think on the notion of adding to the skip list after the import failure occurs
as a means to ignoring the error and moving on?

Comment 23 Ben Parees 2019-01-14 15:09:43 UTC
skiplist question addressed in chat:
https://coreos.slack.com/archives/CE2HALN2W/p1547478309327400

tldr:  images in the skiplist should not be reported as import errors/should not block operator availability, even if they are added to the skiplist after the error occurs.

Comment 24 XiuJuan Wang 2019-01-15 02:44:25 UTC
Ok, skipped list to ignoring the error is going on.

How about the expect result after fixing jenkins imagestreams error manually?

Now ,The fact is jenkins and perl imagestreams has imported without error by manual. But the processing is still true with error.

See http://pastebin.test.redhat.com/695464

My questions are:

When processing will set to false after the import error occurs? Or only when samples operator fix the error automaticlly?

Comment 25 XiuJuan Wang 2019-01-15 03:14:25 UTC
Hmm, I misunderstand the skipped list point, just thought you devels would not treat skipped list...

So correct my questions in comment #24:

How does Processing behave after fixing jenkins imagestreams error manually?

Comment 26 Gabe Montero 2019-01-15 15:00:28 UTC
With the changes I am working on,

Once the import error condition is clean (false, no imagestreams listed) as a result of fixing jenkins or 
any other failed imports, Processing will go to false, available
to true, with both having messages stating that the samples are at the given level.

i.e. "the steady state" if you will

My changes will include and update to the readme on the possible corrective actions in addition to the code 
changes.

Comment 27 Gabe Montero 2019-01-15 23:27:29 UTC
PR https://github.com/openshift/cluster-samples-operator/pull/80 is up

Comment 28 Gabe Montero 2019-01-16 23:06:31 UTC
PR has merged

Comment 29 XiuJuan Wang 2019-01-17 11:40:06 UTC
After meet imagestream import error,could add imagestream(s) to skippedImagestreams, then cvo status will pass this error. Finally, processing will be false, avaliable be true, failing be false.
Make this bug to verified.
# oc get clusterversion 
NAME      VERSION                           AVAILABLE   PROGRESSING   SINCE     STATUS
version   4.0.0-0.alpha-2019-01-17-070151   True        False         1h        Cluster version is 4.0.0-0.alpha-2019-01-17-070151

Comment 32 errata-xmlrpc 2019-06-04 10:41:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.