Bug 1722183 - openshift-cluster-samples-operator state is degraded
Summary: openshift-cluster-samples-operator state is degraded
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Templates
Version: 4.1.0
Hardware: All
OS: Linux
unspecified
unspecified
Target Milestone: ---
: 4.1.z
Assignee: Gabe Montero
QA Contact: XiuJuan Wang
URL:
Whiteboard: 4.1.4
Depends On:
Blocks: 1722214
TreeView+ depends on / blocked
 
Reported: 2019-06-19 15:44 UTC by Jeremy Eder
Modified: 2019-07-04 09:01 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: A race condition existed when updating conditions on the samples operator config object Consequence: duplicate conditions could result, they would not all be updated properly, and the samples operator would incorrectly state that it was Degraded Fix: proper synchronization was added so that the duplication of conditions did not occur and the Degraded status was properly reported. Result:
Clone Of:
: 1722214 (view as bug list)
Environment:
Last Closed: 2019-07-04 09:01:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:1635 0 None None None 2019-07-04 09:01:47 UTC

Description Jeremy Eder 2019-06-19 15:44:23 UTC
Description of problem:
On a fresh 4.1.0 install, the openshift-cluster-samples-operator is in degraded state.

jeder@jerms-wks: ~ $ oc version -o yaml
clientVersion:
  buildDate: "2019-05-19T21:13:58Z"
  compiler: gc
  gitCommit: cb455d664
  gitTreeState: clean
  gitVersion: v4.1.0
  goVersion: go1.11.5
  major: "4"
  minor: 1+
  platform: linux/amd64
serverVersion:
  buildDate: "2019-05-19T23:51:04Z"
  compiler: gc
  gitCommit: 838b4fa
  gitTreeState: clean
  gitVersion: v1.13.4+838b4fa
  goVersion: go1.11.5
  major: "1"
  minor: 13+
  platform: linux/amd64

jeder@jerms-wks: ~ $ oc get clusteroperators
NAME                                 VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                       4.1.0     True        False         False      18h
cloud-credential                     4.1.0     True        False         False      18h
cluster-autoscaler                   4.1.0     True        False         False      18h
console                              4.1.0     True        False         False      18h
dns                                  4.1.0     True        False         False      18h
image-registry                       4.1.0     True        False         False      18h
ingress                              4.1.0     True        False         False      18h
kube-apiserver                       4.1.0     True        False         False      18h
kube-controller-manager              4.1.0     True        False         False      18h
kube-scheduler                       4.1.0     True        False         False      18h
machine-api                          4.1.0     True        False         False      18h
machine-config                       4.1.0     True        False         False      18h
marketplace                          4.1.0     True        False         False      18h
monitoring                           4.1.0     True        False         False      18h
network                              4.1.0     True        False         False      18h
node-tuning                          4.1.0     True        False         False      18h
openshift-apiserver                  4.1.0     True        False         False      18h
openshift-controller-manager         4.1.0     True        False         False      18h
openshift-samples                    4.1.0     True        True          True       18h
operator-lifecycle-manager           4.1.0     True        False         False      18h
operator-lifecycle-manager-catalog   4.1.0     True        False         False      18h
service-ca                           4.1.0     True        False         False      18h
service-catalog-apiserver            4.1.0     True        False         False      18h
service-catalog-controller-manager   4.1.0     True        False         False      18h
storage                              4.1.0     True        False         False      18h


jeder@jerms-wks: ~ $ oc get configs.samples.operator.openshift.io cluster -o yaml
apiVersion: samples.operator.openshift.io/v1
kind: Config
metadata:
  creationTimestamp: "2019-06-18T21:25:12Z"
  finalizers:
  - samples.operator.openshift.io/finalizer
  generation: 1
  name: cluster
  resourceVersion: "10972"
  selfLink: /apis/samples.operator.openshift.io/v1/configs/cluster
  uid: 8e5cb59e-920f-11e9-a0cb-06afd555ccc8
spec:
  architectures:
  - x86_64
  managementState: Managed
status:
  architectures:
  - x86_64
  conditions:
  - lastTransitionTime: "2019-06-18T21:25:12Z"
    lastUpdateTime: "2019-06-18T21:25:12Z"
    status: "True"
    type: ImportCredentialsExist
  - lastTransitionTime: "2019-06-18T21:25:12Z"
    lastUpdateTime: "2019-06-18T21:25:12Z"
    status: "False"
    type: ImportCredentialsExist
  - lastTransitionTime: "2019-06-18T21:25:29Z"
    lastUpdateTime: "2019-06-18T21:25:29Z"
    status: "True"
    type: ConfigurationValid
  - lastTransitionTime: "2019-06-18T21:25:22Z"
    lastUpdateTime: "2019-06-18T21:25:22Z"
    status: "False"
    type: ImportImageErrorsExist
  - lastTransitionTime: "2019-06-18T21:26:28Z"
    lastUpdateTime: "2019-06-18T21:26:28Z"
    status: "False"
    type: ImageChangesInProgress
  - lastTransitionTime: "2019-06-18T21:25:32Z"
    lastUpdateTime: "2019-06-18T21:25:32Z"
    status: "True"
    type: SamplesExist
  - lastTransitionTime: "2019-06-18T21:25:29Z"
    lastUpdateTime: "2019-06-18T21:25:29Z"
    status: "False"
    type: RemovePending
  - lastTransitionTime: "2019-06-18T21:25:29Z"
    lastUpdateTime: "2019-06-18T21:25:29Z"
    status: "False"
    type: MigrationInProgress
  managementState: Managed
  version: 4.1.0



jeder@jerms-wks: ~ $ oc logs cluster-samples-operator-6b48ccf677-77chp -n openshift-cluster-samples-operator|grep -i error
time="2019-06-18T21:25:15Z" level=error msg="unable to sync: retry secret event because in the middle of an sample upsert cycle, requeuing"
time="2019-06-18T21:25:15Z" level=error msg="unable to sync: retry secret event because in the middle of an sample upsert cycle, requeuing"
time="2019-06-18T21:25:15Z" level=error msg="unable to sync: retry secret event because in the middle of an sample upsert cycle, requeuing"
time="2019-06-18T21:25:15Z" level=error msg="unable to sync: retry secret event because in the middle of an sample upsert cycle, requeuing"
time="2019-06-18T21:25:15Z" level=error msg="unable to sync: Operation cannot be fulfilled on configs.samples.operator.openshift.io \"cluster\": the object has been modified; please apply y
our changes to the latest version and try again, requeuing"
time="2019-06-18T21:25:15Z" level=error msg="unable to sync: retry secret event because in the middle of an sample upsert cycle, requeuing"
time="2019-06-18T21:25:15Z" level=error msg="unable to sync: retry secret event because in the middle of an sample upsert cycle, requeuing"
time="2019-06-18T21:25:15Z" level=error msg="unable to sync: retry secret event because in the middle of an sample upsert cycle, requeuing"
time="2019-06-18T21:25:15Z" level=error msg="unable to sync: retry secret event because in the middle of an sample upsert cycle, requeuing"
time="2019-06-18T21:25:15Z" level=error msg="unable to sync: retry secret event because in the middle of an sample upsert cycle, requeuing"
time="2019-06-18T21:25:15Z" level=error msg="unable to sync: retry secret event because in the middle of an sample upsert cycle, requeuing"
time="2019-06-18T21:25:15Z" level=error msg="unable to sync: retry secret event because in the middle of an sample upsert cycle, requeuing"
time="2019-06-18T21:25:15Z" level=error msg="unable to sync: retry secret event because in the middle of an sample upsert cycle, requeuing"
time="2019-06-18T21:25:15Z" level=error msg="unable to sync: retry secret event because in the middle of an sample upsert cycle, requeuing"
time="2019-06-18T21:25:15Z" level=error msg="unable to sync: retry secret event because in the middle of an sample upsert cycle, requeuing"
time="2019-06-18T21:25:16Z" level=error msg="unable to sync: retry secret event because in the middle of an sample upsert cycle, requeuing"
time="2019-06-18T21:25:16Z" level=error msg="unable to sync: retry secret event because in the middle of an sample upsert cycle, requeuing"
time="2019-06-18T21:25:16Z" level=error msg="unable to sync: retry secret event because in the middle of an sample upsert cycle, requeuing"
time="2019-06-18T21:25:17Z" level=error msg="unable to sync: retry secret event because in the middle of an sample upsert cycle, requeuing"
time="2019-06-18T21:25:17Z" level=error msg="unable to sync: retry secret event because in the middle of an sample upsert cycle, requeuing"
time="2019-06-18T21:25:20Z" level=error msg="unable to sync: retry secret event because in the middle of an sample upsert cycle, requeuing"
time="2019-06-18T21:25:20Z" level=error msg="unable to sync: retry secret event because in the middle of an sample upsert cycle, requeuing"
time="2019-06-18T21:25:22Z" level=error msg="unable to sync: Operation cannot be fulfilled on configs.samples.operator.openshift.io \"cluster\": the object has been modified; please apply your changes to the latest version and try again, requeuing"
time="2019-06-18T21:25:25Z" level=error msg="unable to sync: retry secret event because in the middle of an sample upsert cycle, requeuing"
time="2019-06-18T21:25:25Z" level=error msg="unable to sync: retry secret event because in the middle of an sample upsert cycle, requeuing"
time="2019-06-18T21:25:32Z" level=info msg="Imagestream mariadb watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:25:35Z" level=info msg="Imagestream rhdm73-decisioncentral-openshift watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:25:35Z" level=info msg="Imagestream modern-webapp watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:25:35Z" level=info msg="Imagestream redhat-sso70-openshift watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:25:35Z" level=error msg="unable to sync: retry secret event because in the middle of an sample upsert cycle, requeuing"
time="2019-06-18T21:25:35Z" level=error msg="unable to sync: retry secret event because in the middle of an sample upsert cycle, requeuing"
time="2019-06-18T21:25:37Z" level=info msg="Imagestream jenkins-agent-nodejs watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:25:38Z" level=info msg="Imagestream jboss-eap72-openshift watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:25:41Z" level=info msg="Imagestream perl watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:25:42Z" level=info msg="Imagestream fuse7-java-openshift watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:25:44Z" level=info msg="Imagestream rhpam73-businesscentral-monitoring-openshift watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:25:44Z" level=info msg="Imagestream jboss-decisionserver64-openshift watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:25:44Z" level=info msg="Imagestream jboss-webserver31-tomcat7-openshift watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:25:45Z" level=info msg="Imagestream rhpam73-kieserver-openshift watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:25:45Z" level=info msg="Imagestream jboss-eap64-openshift watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:25:47Z" level=info msg="Imagestream jboss-fuse70-eap-openshift watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:25:47Z" level=info msg="Imagestream jboss-amq-63 watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:25:48Z" level=info msg="Imagestream fis-karaf-openshift watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:25:49Z" level=info msg="Imagestream jboss-datagrid71-client-openshift watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:25:50Z" level=info msg="Imagestream redhat-sso73-openshift watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:25:51Z" level=info msg="Imagestream dotnet-runtime watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:25:51Z" level=info msg="Imagestream jboss-datagrid65-openshift watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:25:52Z" level=info msg="Imagestream httpd watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:25:54Z" level=info msg="Imagestream nginx watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:25:55Z" level=info msg="Imagestream jboss-eap71-openshift watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:25:56Z" level=error msg="unable to sync: retry secret event because in the middle of an sample upsert cycle, requeuing"
time="2019-06-18T21:25:56Z" level=error msg="unable to sync: retry secret event because in the middle of an sample upsert cycle, requeuing"
time="2019-06-18T21:25:58Z" level=info msg="Imagestream redhat-sso71-openshift watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:25:58Z" level=info msg="Imagestream jboss-processserver64-openshift watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:25:59Z" level=info msg="Imagestream php watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:26:00Z" level=info msg="Imagestream fuse7-console watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:26:01Z" level=info msg="Imagestream redis watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:26:01Z" level=info msg="Imagestream jboss-datavirt64-driver-openshift watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:26:01Z" level=info msg="Imagestream jenkins-agent-maven watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:26:03Z" level=info msg="Imagestream jenkins watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:26:03Z" level=info msg="Imagestream jboss-webserver30-tomcat8-openshift watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:26:06Z" level=info msg="Imagestream mysql watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:26:07Z" level=info msg="Imagestream fuse7-eap-openshift watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:26:08Z" level=info msg="Imagestream jboss-fuse70-console watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:26:09Z" level=info msg="Imagestream jboss-webserver31-tomcat8-openshift watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:26:10Z" level=info msg="Imagestream jboss-fuse70-java-openshift watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:26:10Z" level=info msg="Imagestream fuse7-karaf-openshift watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:26:10Z" level=info msg="Imagestream ruby watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:26:11Z" level=info msg="Imagestream mongodb watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:26:13Z" level=info msg="Imagestream fuse-apicurito-generator watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:26:16Z" level=info msg="Imagestream redhat-sso72-openshift watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:26:16Z" level=info msg="Imagestream java watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:26:17Z" level=info msg="Imagestream postgresql watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:26:17Z" level=info msg="Imagestream jboss-datavirt64-openshift watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:26:18Z" level=info msg="Imagestream jboss-fuse70-karaf-openshift watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:26:18Z" level=info msg="Imagestream apicast-gateway watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:26:20Z" level=info msg="Imagestream rhpam73-smartrouter-openshift watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:26:24Z" level=info msg="Imagestream redhat-openjdk18-openshift watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:26:28Z" level=info msg="Imagestream python watch event do upsert false; no errors in prep true,  possibly update operator conditions true"
time="2019-06-18T21:26:28Z" level=info msg="CRDUPDATE updating progress/error condition (within caching block) after results for python"

Comment 3 Gabe Montero 2019-06-19 17:25:25 UTC
4.2 clone created ... initial PR work will be in 4.2/master ... after soak, cherrypick to 4.1.z via this bug will occur

Comment 4 Gabe Montero 2019-06-19 20:27:41 UTC
4.1 cherrypick PR https://github.com/openshift/cluster-samples-operator/pull/154

Comment 5 Naveen Malik 2019-06-21 11:40:25 UTC
Is there a way to recover from this state?  I have tried deleting the extra ImportCredentialsExist reported in configs.samples.operator.openshift.io and it doesn't come out of degraded state.  The extra condition is restored.

Comment 6 Gabe Montero 2019-06-21 13:30:08 UTC
@Naveen - delete the sample operator's config object:

oc delete configs.samples.operator.openshift.io cluster

the samples operator will delete all its managed objects and then recreate them as a result
assuming the race condition which causes this bug for you guys is not too consistent, the 
extra condition should go away.

Comment 9 Naveen Malik 2019-06-21 20:49:20 UTC
Deleting the config got the the upgrade progressing again, the samples operator is version 4.1.2, not degraded, and is available.

Comment 10 Gabe Montero 2019-06-25 18:53:31 UTC
PR has merged moving to modified

Comment 12 XiuJuan Wang 2019-06-26 07:16:13 UTC
Don't meet this issue in 4.1.0-0.nightly-2019-06-25-235846 version. 
Meantime doing regression test for samples operator, no issue found.

Comment 14 errata-xmlrpc 2019-07-04 09:01:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1635


Note You need to log in before you can comment on or make changes to this bug.