Created attachment 1673616 [details] cvo log from the above prow run The Service Catalog Removal Job was merged into the cluster-svcat-apiserver-operator. The upgrade failed because it was trying to modify the Pod of the Job. https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/24682/pull-ci-openshift-origin-master-e2e-gcp-upgrade/2812 I0313 19:49:13.579516 1 sync_worker.go:621] Running sync for job "openshift-service-catalog-removed/openshift-service-catalog-apiserver-remover" (495 of 571) E0313 19:49:13.690824 1 task.go:81] error running apply for job "openshift-service-catalog-removed/openshift-service-catalog-apiserver-remover" (495 of 571): Job.batch "openshift-service-catalog-apiserver-remover" is invalid: spec.template: Invalid value: core.PodTemplateSpec{ObjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string{"controller-uid":"e5aae1bb-9262-4d5e-a697-b2b99c5223c2", "job-name":"openshift-service-catalog-apiserver-remover"}, Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:core.PodSpec{Volumes:[]core.Volume(nil), InitContainers:[]core.Container(nil), Containers:[]core.Container{core.Container{Name:"openshift-service-catalog-apiserver-remover", Image:"registry.svc.ci.openshift.org/ci-op-1hk9l3tt/stable@sha256:f955103400800b00b24c53192d76a57dfd8c9df22368808bf3808c856c18f11b", Command:[]string{"/usr/bin/cluster-svcat-apiserver-remover"}, Args:[]string(nil), WorkingDir:"", Ports:[]core.ContainerPort(nil), EnvFrom:[]core.EnvFromSource(nil), Env:[]core.EnvVar(nil), Resources:core.ResourceRequirements{Limits:core.ResourceList(nil), Requests:core.ResourceList{"memory":resource.Quantity{i:resource.int64Amount{value:52428800, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"50Mi", Format:"BinarySI"}}}, VolumeMounts:[]core.VolumeMount(nil), VolumeDevices:[]core.VolumeDevice(nil), LivenessProbe:(*core.Probe)(nil), ReadinessProbe:(*core.Probe)(nil), StartupProbe:(*core.Probe)(nil), Lifecycle:(*core.Lifecycle)(nil), TerminationMessagePath:"/dev/termination-log", TerminationMessagePolicy:"File", ImagePullPolicy:"IfNotPresent", SecurityContext:(*core.SecurityContext)(0xc0390fe000), Stdin:false, StdinOnce:false, TTY:false}}, EphemeralContainers:[]core.EphemeralContainer(nil), RestartPolicy:"Never", TerminationGracePeriodSeconds:(*int64)(0xc014ab8e50), ActiveDeadlineSeconds:(*int64)(nil), DNSPolicy:"ClusterFirst", NodeSelector:map[string]string(nil), ServiceAccountName:"openshift-service-catalog-apiserver-remover", AutomountServiceAccountToken:(*bool)(nil), NodeName:"", SecurityContext:(*core.PodSecurityContext)(0xc033b3dd50), ImagePullSecrets:[]core.LocalObjectReference(nil), Hostname:"", Subdomain:"", Affinity:(*core.Affinity)(nil), SchedulerName:"default-scheduler", Tolerations:[]core.Toleration(nil), HostAliases:[]core.HostAlias(nil), PriorityClassName:"", Priority:(*int32)(nil), PreemptionPolicy:(*core.PreemptionPolicy)(nil), DNSConfig:(*core.PodDNSConfig)(nil), ReadinessGates:[]core.PodReadinessGate(nil), RuntimeClassName:(*string)(nil), Overhead:core.ResourceList(nil), EnableServiceLinks:(*bool)(nil), TopologySpreadConstraints:[]core.TopologySpreadConstraint(nil)}}: field is immutable
PR currently tracking this is https://github.com/openshift/cluster-svcat-apiserver-operator/pull/80 I'm looking to see if there's anything else that could be related. In the above PR, the job seems to run as expected: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-svcat-apiserver-operator/80/pull-ci-openshift-cluster-svcat-apiserver-operator-master-e2e-aws-upgrade/1/artifacts/e2e-aws-upgrade/pods/openshift-service-catalog-removed_openshift-service-catalog-apiserver-remover-zpk4p_openshift-service-catalog-apiserver-remover.log
Mar 25 03:59:55.602 I ns/openshift-service-catalog-removed pod/openshift-service-catalog-apiserver-remover-zpk4p node/ created Mar 25 03:59:55.609 I ns/openshift-service-catalog-removed job/openshift-service-catalog-apiserver-remover Created pod: openshift-service-catalog-apiserver-remover-zpk4p Mar 25 03:59:55.613 I ns/openshift-service-catalog-removed pod/openshift-service-catalog-apiserver-remover-zpk4p Successfully assigned openshift-service-catalog-removed/openshift-service-catalog-apiserver-remover-zpk4p to ip-10-0-128-212.us-west-2.compute.internal Mar 25 03:59:57.344 I ns/openshift-service-catalog-removed pod/openshift-service-catalog-apiserver-remover-zpk4p Pulling image "registry.svc.ci.openshift.org/ci-op-bx3s5p3m/stable@sha256:772889a5e98f50b38b3c91ecedc6f9834e12a597d1fd4508a393d0410e4544a8" Mar 25 04:00:03.764 I ns/openshift-service-catalog-removed pod/openshift-service-catalog-apiserver-remover-zpk4p Successfully pulled image "registry.svc.ci.openshift.org/ci-op-bx3s5p3m/stable@sha256:772889a5e98f50b38b3c91ecedc6f9834e12a597d1fd4508a393d0410e4544a8" Mar 25 04:00:03.844 I ns/openshift-service-catalog-removed pod/openshift-service-catalog-apiserver-remover-zpk4p Created container openshift-service-catalog-apiserver-remover Mar 25 04:00:03.865 I ns/openshift-service-catalog-removed pod/openshift-service-catalog-apiserver-remover-zpk4p Started container openshift-service-catalog-apiserver-remover Mar 25 04:00:03.918 W clusteroperator/service-catalog-apiserver deleted Mar 25 04:00:09.208 W ns/openshift-service-catalog-apiserver-operator pod/openshift-service-catalog-apiserver-operator-67c4f669d4-bmv5h node/ip-10-0-143-215.us-west-2.compute.internal graceful deletion within 30s Mar 25 04:00:09.736 W clusterversion/version cluster reached 0.0.1-2020-03-25-025035 Mar 25 04:00:09.736 W clusterversion/version changed Progressing to False: Cluster version is 0.0.1-2020-03-25-025035 Mar 25 04:00:09.806 E ns/openshift-service-catalog-apiserver-operator pod/openshift-service-catalog-apiserver-operator-67c4f669d4-bmv5h node/ip-10-0-143-215.us-west-2.compute.internal container=operator container exited with code 255 (Error): failed with : servicecatalogapiserver.operator.openshift.io "cluster" not found\nE0325 04:00:04.507217 1 resourcesync_controller.go:247] key failed with : servicecatalogapiserver.operator.openshift.io "cluster" not found\nE0325 04:00:05.146355 1 unsupportedconfigoverrides_controller.go:181] key failed with : servicecatalogapiserver.operator.openshift.io "cluster" not found\nE0325 04:00:05.147368 1 resourcesync_controller.go:247] key failed with : servicecatalogapiserver.operator.openshift.io "cluster" not found\nE0325 04:00:06.426543 1 unsupportedconfigoverrides_controller.go:181] key failed with : servicecatalogapiserver.operator.openshift.io "cluster" not found\nE0325 04:00:06.427510 1 resourcesync_controller.go:247] key failed with : servicecatalogapiserver.operator.openshift.io "cluster" not found\nE0325 04:00:08.048561 1 resourcesync_controller.go:247] key failed with : servicecatalogapiserver.operator.openshift.io "cluster" not found\nE0325 04:00:08.986734 1 unsupportedconfigoverrides_controller.go:181] key failed with : servicecatalogapiserver.operator.openshift.io "cluster" not found\nE0325 04:00:08.987919 1 resourcesync_controller.go:247] key failed with : servicecatalogapiserver.operator.openshift.io "cluster" not found\nE0325 04:00:09.009437 1 resourcesync_controller.go:247] key failed with : servicecatalogapiserver.operator.openshift.io "cluster" not found\nE0325 04:00:09.047967 1 resourcesync_controller.go:247] key failed with : servicecatalogapiserver.operator.openshift.io "cluster" not found\nE0325 04:00:09.063697 1 resourcesync_controller.go:247] key failed with : servicecatalogapiserver.operator.openshift.io "cluster" not found\nE0325 04:00:09.070455 1 resourcesync_controller.go:247] key failed with : servicecatalogapiserver.operator.openshift.io "cluster" not found\nI0325 04:00:09.198428 1 cmd.go:79] Received SIGTERM or SIGINT signal, shutting down controller.\nF0325 04:00:09.198466 1 leaderelection.go:66] leaderelection lost\n Mar 25 04:00:13.592 W ns/openshift-service-catalog-apiserver-operator pod/openshift-service-catalog-apiserver-operator-67c4f669d4-bmv5h node/ip-10-0-143-215.us-west-2.compute.internal deleted
The snippet from comment #2 is from the e2e-aws-upgrade CI job. Looking at that I see that my remover job gets created and it runs since I saw the pod logs. Then it looks like the CVO is trying to resurrect the operator because my job blew it away.
Original job linked from this bug was an unrelated origin PR, which flaked on the removal job [1]: Cluster did not complete upgrade: timed out waiting for the condition: Could not update job \"openshift-service-catalog-removed/openshift-service-catalog-apiserver-remover\" (495 of 571): the object is invalid, possibly due to local cluster configuration Then [2] landed reverting the job. Now [3] is in flight to restore it. [1]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/24682/pull-ci-openshift-origin-master-e2e-gcp-upgrade/2812 [2]: https://github.com/openshift/cluster-svcat-apiserver-operator/pull/79 [3]: https://github.com/openshift/cluster-svcat-apiserver-operator/pull/80
PR 80 [1] landed and caused [2]. So it was reverted again [3]. I have PR 82 [4] in flight to restore it again. Met with several folks today to try and track this issue down. We will be adding .metadata.annotations["release.openshift.io/create-only"]="true" annotation, [5]. [1] https://github.com/openshift/cluster-svcat-apiserver-operator/pull/80 [2] https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-etcd-operator/289/pull-ci-openshift-cluster-etcd-operator-master-e2e-gcp-upgrade/1254 [3] https://github.com/openshift/cluster-svcat-apiserver-operator/pull/81 [4] https://github.com/openshift/cluster-svcat-apiserver-operator/pull/82 [5] https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/operators.md#what-if-i-only-want-the-cvo-to-create-my-resource-but-never-update-it
Moving this to MODIFIED as the create-only on the job PRs seem to have addressed this issue.
Moving this to Service Catalog component so QE can test and ensure service catalog is removed in 4.5 during upgrades.
Added PR links
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409