Bug 1744153 - oc adm upgrade from 4.1.1 to 4.2.0-0.nightly-2019-08-21-040043 on loaded cluster stuck on marketplace-operator
Summary: oc adm upgrade from 4.1.1 to 4.2.0-0.nightly-2019-08-21-040043 on loaded clus...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.2.0
Assignee: Alexander Greene
QA Contact: Mike Fiedler
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-21 13:02 UTC by Mike Fiedler
Modified: 2019-10-16 06:37 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-16 06:37:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:37:11 UTC

Description Mike Fiedler 2019-08-21 13:02:04 UTC
Description of problem:

Created a cluster with 500 projects.  Each project contains:

1 buildconfig
15 builds
10 imagestreams
1 deployment with 0 replicas
1 service
20 secrets
10 routes

NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.11    True        True          29m     Unable to apply 4.2.0-0.nightly-2019-08-21-040043: the update could not be applied


NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.2.0-0.nightly-2019-08-21-040043   True        False         False      5d15h
cloud-credential                           4.2.0-0.nightly-2019-08-21-040043   True        False         False      5d15h
cluster-autoscaler                         4.2.0-0.nightly-2019-08-21-040043   True        False         False      5d15h
console                                    4.2.0-0.nightly-2019-08-21-040043   True        False         False      18m
dns                                        4.1.11                              True        False         False      5d15h
image-registry                             4.1.11                              True        True          False      5d15h
ingress                                    4.2.0-0.nightly-2019-08-21-040043   True        False         False      5d15h
insights                                   4.2.0-0.nightly-2019-08-21-040043   True        False         False      21m
kube-apiserver                             4.2.0-0.nightly-2019-08-21-040043   True        False         False      5d15h
kube-controller-manager                    4.2.0-0.nightly-2019-08-21-040043   True        False         False      5d15h
kube-scheduler                             4.2.0-0.nightly-2019-08-21-040043   True        False         False      5d15h
machine-api                                4.2.0-0.nightly-2019-08-21-040043   True        False         False      5d15h
machine-config                             4.1.11                              True        False         False      5d15h
marketplace                                4.1.11                              True        False         False      5d15h
monitoring                                 4.2.0-0.nightly-2019-08-21-040043   True        False         False      5d15h
network                                    4.1.11                              True        False         False      5d15h
node-tuning                                4.2.0-0.nightly-2019-08-21-040043   True        False         False      21m
openshift-apiserver                        4.2.0-0.nightly-2019-08-21-040043   True        False         False      5d15h
openshift-controller-manager               4.2.0-0.nightly-2019-08-21-040043   True        False         False      5d15h
openshift-samples                          4.2.0-0.nightly-2019-08-21-040043   True        False         False      11m
operator-lifecycle-manager                 4.2.0-0.nightly-2019-08-21-040043   True        False         False      5d15h
operator-lifecycle-manager-catalog         4.1.11                              True        False         False      5d15h
operator-lifecycle-manager-packageserver   4.2.0-0.nightly-2019-08-21-040043   True        False         False      20m
service-ca                                 4.2.0-0.nightly-2019-08-21-040043   True        False         False      5d15h
service-catalog-apiserver                  4.2.0-0.nightly-2019-08-21-040043   True        False         False      5d15h
service-catalog-controller-manager         4.2.0-0.nightly-2019-08-21-040043   True        False         False      5d15h
storage                                    4.2.0-0.nightly-2019-08-21-040043   True        False         False      21m


The clusterversion operator logs show that the marketplace-operator is stuck.

From clusterversion operator logs:

E0821 12:55:36.078647       1 task.go:77] error running apply for deployment "openshift-marketplace/marketplace-operator" (333 of 412): timed out waiting for the condition
I0821 12:55:36.078694       1 task_graph.go:588] Result of work: [Cluster operator image-registry is still updating Cluster operator operator-lifecycle-manager-catalog is still updating Could not update deployment "openshift-marketplace/marketplace-operator" (333 of 412)]
I0821 12:55:36.078734       1 sync_worker.go:740] Update error 333 of 412: UpdatePayloadFailed Could not update deployment "openshift-marketplace/marketplace-operator" (333 of 412) (*errors.errorString: timed out waiting for the condition)
E0821 12:55:36.078753       1 sync_worker.go:311] unable to synchronize image (waiting 2m52.525702462s): Could not update deployment "openshift-marketplace/marketplace-operator" (333 of 412)



From marketplace-operator logs:

time="2019-08-21T12:56:05Z" level=info msg="Reconciling OperatorSource openshift-marketplace/redhat-operators\n"
time="2019-08-21T12:56:05Z" level=error msg="Unexpected error while creating CatalogSourceConfig: CatalogSourceConfig.operators.coreos.com \"redhat-operators\" is invalid: []: Invalid value: map[string]interface {}{\"apiVersion\":\"operat
ors.coreos.com/v1\", \"kind\":\"CatalogSourceConfig\", \"metadata\":map[string]interface {}{\"creationTimestamp\":\"2019-08-21T12:56:05Z\", \"generation\":1, \"labels\":map[string]interface {}{\"opsrc-datastore\":\"true\", \"opsrc-owner-n
ame\":\"redhat-operators\", \"opsrc-owner-namespace\":\"openshift-marketplace\", \"opsrc-provider\":\"redhat\"}, \"name\":\"redhat-operators\", \"namespace\":\"openshift-marketplace\", \"uid\":\"09cae082-c413-11e9-885d-068c43ae29a2\"}, \"
spec\":map[string]interface {}{\"csDisplayName\":\"Red Hat Operators\", \"csPublisher\":\"Red Hat\", \"packages\":\"kubevirt-hyperconverged,amq-online,3scale-operator,codeready-workspaces,businessautomation-operator,cluster-logging,elasti
csearch-operator,openshifttemplateservicebroker,amq-streams,openshiftansibleservicebroker,jaeger-product,amq7-cert-manager,amq7-interconnect-operator\", \"targetNamespace\":\"openshift-marketplace\"}, \"status\":map[string]interface {}{\"
currentPhase\":map[string]interface {}{\"lastTransitionTime\":interface {}(nil), \"lastUpdateTime\":interface {}(nil), \"phase\":map[string]interface {}{}}}}: validation failure list:\nspec.source in body is required" name=redhat-operator
s namespace=openshift-marketplace type=OperatorSource


Version-Release number of selected component (if applicable): upgrading from 4.1.11 to 4.2.0-0.nightly-2019-08-21-040043


Additional info:

Full oc adm must-gather info will be added shortly.

Comment 2 Alexander Greene 2019-08-27 14:19:39 UTC
Based on the information contained within the must-gather.zip, it appears that the 4.2 version of the marketplace operator was never deployed.

The Marketplace Operator logs match those of a 4.1 version. It appears that during the upgrade, the CatalogSourceConfig CRD was upgraded to `v2`, which requires the `source` field. Version 4.1 of the Marketplace Operator does not work with `v2` of the `CatalogSourceConfig` CRD.


This may be related to [0], which was fixed on the 21st, the same day the 4.2.0-0.nightly-2019-08-21-040043 was created. I will attempt an upgrade from a 4.1 cluster to 4.2.0-0.nightly-2019-08-21-040043 to confirm this suspicion.

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1743699

Comment 3 Mike Fiedler 2019-08-28 12:30:23 UTC
Re-testing today with 4.1.13 -> registry.svc.ci.openshift.org/ocp/release:4.2.0-0.nightly-2019-08-28-083236

Comment 4 Alexander Greene 2019-09-04 17:30:57 UTC
@Mike(In reply to Mike Fiedler from comment #3)
> Re-testing today with 4.1.13 ->
> registry.svc.ci.openshift.org/ocp/release:4.2.0-0.nightly-2019-08-28-083236

Could you please update this BZ with the results.

Comment 5 Dan Geoffroy 2019-09-04 17:44:31 UTC
Moving to ON_QA. Code delivered previously and this was reopened due to another component failure unrelated to this.  Waiting on QA (Mike F) to verify things are good here.

Comment 6 Mike Fiedler 2019-09-04 19:53:03 UTC
Sorry for delay, this is all good now on  4.2.0-0.nightly-2019-09-03-102130

Comment 7 Fan Jia 2019-09-05 07:59:24 UTC
Meet the same problem when upgrade the cluster from 4.1.14 to 4.2.0-0.nightly-2019-09-04-142146

1. oc get clusteroperators
 oc get clusteroperator
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.2.0-0.nightly-2019-09-04-142146   True        False         False      6h44m
cloud-credential                           4.2.0-0.nightly-2019-09-04-142146   True        False         False      7h
cluster-autoscaler                         4.2.0-0.nightly-2019-09-04-142146   True        False         False      7h
console                                    4.2.0-0.nightly-2019-09-04-142146   True        False         False      40m
dns                                        4.1.14                              True        False         False      6h59m
image-registry                             4.2.0-0.nightly-2019-09-04-142146   True        False         False      138m
ingress                                    4.2.0-0.nightly-2019-09-04-142146   True        False         False      3h15m
insights                                   4.2.0-0.nightly-2019-09-04-142146   True        False         False      57m
kube-apiserver                             4.2.0-0.nightly-2019-09-04-142146   True        False         False      6h58m
kube-controller-manager                    4.2.0-0.nightly-2019-09-04-142146   True        False         False      6h57m
kube-scheduler                             4.2.0-0.nightly-2019-09-04-142146   True        False         False      6h57m
machine-api                                4.2.0-0.nightly-2019-09-04-142146   True        False         False      7h
machine-config                             4.1.14                              False       True          True       28m
marketplace                                4.1.14                              True        False         False      11m
monitoring                                 4.2.0-0.nightly-2019-09-04-142146   True        False         False      3h12m
network                                    4.2.0-0.nightly-2019-09-04-142146   True        True          False      7h
node-tuning                                4.2.0-0.nightly-2019-09-04-142146   True        False         False      11m
openshift-apiserver                        4.2.0-0.nightly-2019-09-04-142146   True        False         False      106s
openshift-controller-manager               4.2.0-0.nightly-2019-09-04-142146   True        False         False      6h58m
openshift-samples                          4.2.0-0.nightly-2019-09-04-142146   True        False         False      44m
operator-lifecycle-manager                 4.2.0-0.nightly-2019-09-04-142146   True        False         False      6h59m
operator-lifecycle-manager-catalog         4.2.0-0.nightly-2019-09-04-142146   True        False         False      6h59m
operator-lifecycle-manager-packageserver   4.2.0-0.nightly-2019-09-04-142146   True        False         False      10m
service-ca                                 4.2.0-0.nightly-2019-09-04-142146   True        False         False      7h
service-catalog-apiserver                  4.2.0-0.nightly-2019-09-04-142146   True        False         False      100s
service-catalog-controller-manager         4.2.0-0.nightly-2019-09-04-142146   True        False         False      107m
storage                                    4.2.0-0.nightly-2019-09-04-142146   True        False         False      43m


2. the logs of clusterversion
`
E0905 07:42:48.002084       1 task.go:77] error running apply for clusteroperator "openshift-marketplace/marketplace" (337 of 415): Cluster operator marketplace is still updating
I0905 07:42:48.002292       1 sync_worker.go:736] Summarizing 1 errors
I0905 07:42:48.002298       1 sync_worker.go:740] Update error 337 of 415: ClusterOperatorNotAvailable Cluster operator marketplace is still updating (*errors.errorString: cluster operator marketplace is still updating)
E0905 07:50:04.122092       1 task.go:77] error running apply for clusteroperator "openshift-marketplace/marketplace" (337 of 415): Cluster operator marketplace is still updating
I0905 07:50:04.122213       1 sync_worker.go:736] Summarizing 1 errors
I0905 07:50:04.122223       1 sync_worker.go:740] Update error 337 of 415: ClusterOperatorNotAvailable Cluster operator marketplace is still updating (*errors.errorString: cluster operator marketplace is still updating)

`

3. the logs of marketplace
the image of marketplace (still the 4.1 comment after the upgrade)
marketplace commit:6881ba35b74077c29e8791f26d04d2f7ec25e8de

`
time="2019-09-05T07:50:28Z" level=error msg="Unexpected error while creating CatalogSourceConfig: CatalogSourceConfig.operators.coreos.com \"certified-operators\" is invalid: []: Invalid value: map[string]interface {}{\"apiVersion\":\"operators.coreos.com/v1\", \"kind\":\"CatalogSourceConfig\", \"metadata\":map[string]interface {}{\"creationTimestamp\":\"2019-09-05T07:50:28Z\", \"generation\":1, \"labels\":map[string]interface {}{\"opsrc-datastore\":\"true\", \"opsrc-owner-name\":\"certified-operators\", \"opsrc-owner-namespace\":\"openshift-marketplace\", \"opsrc-provider\":\"certified\"}, \"name\":\"certified-operators\", \"namespace\":\"openshift-marketplace\", \"uid\":\"d44a8a02-cfb1-11e9-b840-0279035724fc\"}, \"spec\":map[string]interface {}{\"csDisplayName\":\"Certified Operators\", \"csPublisher\":\"Red Hat\", \"packages\":\"tidb-operator-certified,aqua-certified,twistlock-certified,cpx-cic-operator,robin-operator,insightedge-operator,instana-agent,portworx-certified,percona-xtradb-cluster-operator-certified,presto-operator,t8c-certified,oneagent-certified,storageos,orca,memql-certified,percona-server-mongodb-operator-certified,planetscale-certified,appdynamics-operator,anchore-engine,federatorai-certified,crunchy-postgres-operator,joget-openshift-operator,cic-operator,couchbase-enterprise-certified,openunison-ocp-certified,kong,sematext,hazelcast-enterprise-certified,nuodb-ce-certified,kubeturbo-certified,seldon-operator-certified,sysdig-certified,synopsys-certified,newrelic-infrastructure,mariadb,appsody-operator-certified,mongodb-enterprise\", \"targetNamespace\":\"openshift-marketplace\"}, \"status\":map[string]interface {}{\"currentPhase\":map[string]interface {}{\"lastTransitionTime\":interface {}(nil), \"lastUpdateTime\":interface {}(nil), \"phase\":map[string]interface {}{}}}}: validation failure list:\nspec.source in body is required" name=certified-operators namespace=openshift-marketplace type=OperatorSource

`

Comment 9 Fan Jia 2019-09-06 06:15:42 UTC
use another bug to track the new problem since they have the different reson: https://bugzilla.redhat.com/show_bug.cgi?id=1749643

Comment 10 errata-xmlrpc 2019-10-16 06:37:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.