Bug 1817226 - Upgrades broken when a removal Job is active.
Summary: Upgrades broken when a removal Job is active.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Service Catalog
Version: 4.5
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.5.0
Assignee: Jesus M. Rodriguez
QA Contact: Fan Jia
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-25 20:47 UTC by Jesus M. Rodriguez
Modified: 2020-07-13 17:24 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-13 17:23:43 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
cvo log from the above prow run (1.75 MB, text/plain)
2020-03-25 20:47 UTC, Jesus M. Rodriguez
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-svcat-apiserver-operator pull 82 0 None closed Re-add service catalog apiserver removal job 2020-07-24 06:54:57 UTC
Github openshift cluster-svcat-controller-manager-operator pull 68 0 None closed Add service catalog remover job 2020-07-24 06:54:57 UTC
Github openshift cluster-svcat-controller-manager-operator pull 74 0 None closed add create-only annotation 2020-07-24 06:54:56 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:23:59 UTC

Description Jesus M. Rodriguez 2020-03-25 20:47:08 UTC
Created attachment 1673616 [details]
cvo log from the above prow run

The Service Catalog Removal Job was merged into the cluster-svcat-apiserver-operator. The upgrade failed because it was trying to modify the Pod of the Job.

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/24682/pull-ci-openshift-origin-master-e2e-gcp-upgrade/2812

I0313 19:49:13.579516       1 sync_worker.go:621] Running sync for job "openshift-service-catalog-removed/openshift-service-catalog-apiserver-remover" (495 of 571)
E0313 19:49:13.690824       1 task.go:81] error running apply for job "openshift-service-catalog-removed/openshift-service-catalog-apiserver-remover" (495 of 571): Job.batch "openshift-service-catalog-apiserver-remover" is invalid: spec.template: Invalid value: core.PodTemplateSpec{ObjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string{"controller-uid":"e5aae1bb-9262-4d5e-a697-b2b99c5223c2", "job-name":"openshift-service-catalog-apiserver-remover"}, Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:core.PodSpec{Volumes:[]core.Volume(nil), InitContainers:[]core.Container(nil), Containers:[]core.Container{core.Container{Name:"openshift-service-catalog-apiserver-remover", Image:"registry.svc.ci.openshift.org/ci-op-1hk9l3tt/stable@sha256:f955103400800b00b24c53192d76a57dfd8c9df22368808bf3808c856c18f11b", Command:[]string{"/usr/bin/cluster-svcat-apiserver-remover"}, Args:[]string(nil), WorkingDir:"", Ports:[]core.ContainerPort(nil), EnvFrom:[]core.EnvFromSource(nil), Env:[]core.EnvVar(nil), Resources:core.ResourceRequirements{Limits:core.ResourceList(nil), Requests:core.ResourceList{"memory":resource.Quantity{i:resource.int64Amount{value:52428800, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"50Mi", Format:"BinarySI"}}}, VolumeMounts:[]core.VolumeMount(nil), VolumeDevices:[]core.VolumeDevice(nil), LivenessProbe:(*core.Probe)(nil), ReadinessProbe:(*core.Probe)(nil), StartupProbe:(*core.Probe)(nil), Lifecycle:(*core.Lifecycle)(nil), TerminationMessagePath:"/dev/termination-log", TerminationMessagePolicy:"File", ImagePullPolicy:"IfNotPresent", SecurityContext:(*core.SecurityContext)(0xc0390fe000), Stdin:false, StdinOnce:false, TTY:false}}, EphemeralContainers:[]core.EphemeralContainer(nil), RestartPolicy:"Never", TerminationGracePeriodSeconds:(*int64)(0xc014ab8e50), ActiveDeadlineSeconds:(*int64)(nil), DNSPolicy:"ClusterFirst", NodeSelector:map[string]string(nil), ServiceAccountName:"openshift-service-catalog-apiserver-remover", AutomountServiceAccountToken:(*bool)(nil), NodeName:"", SecurityContext:(*core.PodSecurityContext)(0xc033b3dd50), ImagePullSecrets:[]core.LocalObjectReference(nil), Hostname:"", Subdomain:"", Affinity:(*core.Affinity)(nil), SchedulerName:"default-scheduler", Tolerations:[]core.Toleration(nil), HostAliases:[]core.HostAlias(nil), PriorityClassName:"", Priority:(*int32)(nil), PreemptionPolicy:(*core.PreemptionPolicy)(nil), DNSConfig:(*core.PodDNSConfig)(nil), ReadinessGates:[]core.PodReadinessGate(nil), RuntimeClassName:(*string)(nil), Overhead:core.ResourceList(nil), EnableServiceLinks:(*bool)(nil), TopologySpreadConstraints:[]core.TopologySpreadConstraint(nil)}}: field is immutable

Comment 2 Jesus M. Rodriguez 2020-03-25 21:34:42 UTC
Mar 25 03:59:55.602 I ns/openshift-service-catalog-removed pod/openshift-service-catalog-apiserver-remover-zpk4p node/ created
Mar 25 03:59:55.609 I ns/openshift-service-catalog-removed job/openshift-service-catalog-apiserver-remover Created pod: openshift-service-catalog-apiserver-remover-zpk4p
Mar 25 03:59:55.613 I ns/openshift-service-catalog-removed pod/openshift-service-catalog-apiserver-remover-zpk4p Successfully assigned openshift-service-catalog-removed/openshift-service-catalog-apiserver-remover-zpk4p to ip-10-0-128-212.us-west-2.compute.internal
Mar 25 03:59:57.344 I ns/openshift-service-catalog-removed pod/openshift-service-catalog-apiserver-remover-zpk4p Pulling image "registry.svc.ci.openshift.org/ci-op-bx3s5p3m/stable@sha256:772889a5e98f50b38b3c91ecedc6f9834e12a597d1fd4508a393d0410e4544a8"
Mar 25 04:00:03.764 I ns/openshift-service-catalog-removed pod/openshift-service-catalog-apiserver-remover-zpk4p Successfully pulled image "registry.svc.ci.openshift.org/ci-op-bx3s5p3m/stable@sha256:772889a5e98f50b38b3c91ecedc6f9834e12a597d1fd4508a393d0410e4544a8"
Mar 25 04:00:03.844 I ns/openshift-service-catalog-removed pod/openshift-service-catalog-apiserver-remover-zpk4p Created container openshift-service-catalog-apiserver-remover
Mar 25 04:00:03.865 I ns/openshift-service-catalog-removed pod/openshift-service-catalog-apiserver-remover-zpk4p Started container openshift-service-catalog-apiserver-remover
Mar 25 04:00:03.918 W clusteroperator/service-catalog-apiserver deleted
Mar 25 04:00:09.208 W ns/openshift-service-catalog-apiserver-operator pod/openshift-service-catalog-apiserver-operator-67c4f669d4-bmv5h node/ip-10-0-143-215.us-west-2.compute.internal graceful deletion within 30s
Mar 25 04:00:09.736 W clusterversion/version cluster reached 0.0.1-2020-03-25-025035
Mar 25 04:00:09.736 W clusterversion/version changed Progressing to False: Cluster version is 0.0.1-2020-03-25-025035
Mar 25 04:00:09.806 E ns/openshift-service-catalog-apiserver-operator pod/openshift-service-catalog-apiserver-operator-67c4f669d4-bmv5h node/ip-10-0-143-215.us-west-2.compute.internal container=operator container exited with code 255 (Error): failed with : servicecatalogapiserver.operator.openshift.io "cluster" not found\nE0325 04:00:04.507217       1 resourcesync_controller.go:247] key failed with : servicecatalogapiserver.operator.openshift.io "cluster" not found\nE0325 04:00:05.146355       1 unsupportedconfigoverrides_controller.go:181] key failed with : servicecatalogapiserver.operator.openshift.io "cluster" not found\nE0325 04:00:05.147368       1 resourcesync_controller.go:247] key failed with : servicecatalogapiserver.operator.openshift.io "cluster" not found\nE0325 04:00:06.426543       1 unsupportedconfigoverrides_controller.go:181] key failed with : servicecatalogapiserver.operator.openshift.io "cluster" not found\nE0325 04:00:06.427510       1 resourcesync_controller.go:247] key failed with : servicecatalogapiserver.operator.openshift.io "cluster" not found\nE0325 04:00:08.048561       1 resourcesync_controller.go:247] key failed with : servicecatalogapiserver.operator.openshift.io "cluster" not found\nE0325 04:00:08.986734       1 unsupportedconfigoverrides_controller.go:181] key failed with : servicecatalogapiserver.operator.openshift.io "cluster" not found\nE0325 04:00:08.987919       1 resourcesync_controller.go:247] key failed with : servicecatalogapiserver.operator.openshift.io "cluster" not found\nE0325 04:00:09.009437       1 resourcesync_controller.go:247] key failed with : servicecatalogapiserver.operator.openshift.io "cluster" not found\nE0325 04:00:09.047967       1 resourcesync_controller.go:247] key failed with : servicecatalogapiserver.operator.openshift.io "cluster" not found\nE0325 04:00:09.063697       1 resourcesync_controller.go:247] key failed with : servicecatalogapiserver.operator.openshift.io "cluster" not found\nE0325 04:00:09.070455       1 resourcesync_controller.go:247] key failed with : servicecatalogapiserver.operator.openshift.io "cluster" not found\nI0325 04:00:09.198428       1 cmd.go:79] Received SIGTERM or SIGINT signal, shutting down controller.\nF0325 04:00:09.198466       1 leaderelection.go:66] leaderelection lost\n
Mar 25 04:00:13.592 W ns/openshift-service-catalog-apiserver-operator pod/openshift-service-catalog-apiserver-operator-67c4f669d4-bmv5h node/ip-10-0-143-215.us-west-2.compute.internal deleted

Comment 3 Jesus M. Rodriguez 2020-03-25 21:36:16 UTC
The snippet from comment #2 is from the e2e-aws-upgrade CI job. Looking at that I see that my remover job gets created and it runs since I saw the pod logs. Then it looks like the CVO is trying to resurrect the operator because my job blew it away.

Comment 4 W. Trevor King 2020-03-30 18:57:35 UTC
Original job linked from this bug was an unrelated origin PR, which flaked on the removal job [1]:

  Cluster did not complete upgrade: timed out waiting for the condition: Could not update job \"openshift-service-catalog-removed/openshift-service-catalog-apiserver-remover\" (495 of 571): the object is invalid, possibly due to local cluster configuration

Then [2] landed reverting the job.  Now [3] is in flight to restore it.

[1]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/24682/pull-ci-openshift-origin-master-e2e-gcp-upgrade/2812
[2]: https://github.com/openshift/cluster-svcat-apiserver-operator/pull/79
[3]: https://github.com/openshift/cluster-svcat-apiserver-operator/pull/80

Comment 6 Jesus M. Rodriguez 2020-04-17 20:08:24 UTC
Moving this to MODIFIED as the create-only on the job PRs seem to have addressed this issue.

Comment 7 Jesus M. Rodriguez 2020-04-17 20:12:43 UTC
Moving this to Service Catalog component so QE can test and ensure service catalog is removed in 4.5 during upgrades.

Comment 8 Jesus M. Rodriguez 2020-04-17 20:15:00 UTC
Added PR links

Comment 13 errata-xmlrpc 2020-07-13 17:23:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.