Description of problem: When installing using http proxy the installation hangs in "Finalizing" state Controller logs: time="2020-10-21T05:32:41Z" level=info msg="Cluster version is available:false , message:Working towards 4.6.0-rc.4: 78% complete" time="2020-10-21T05:33:11Z" level=info msg="Waiting for cluster version to be available" time="2020-10-21T05:33:11Z" level=info msg="Cluster version is available:false , message:Working towards 4.6.0-rc.4: 78% complete" time="2020-10-21T05:33:41Z" level=info msg="Waiting for cluster version to be available" time="2020-10-21T05:33:41Z" level=info msg="Cluster version is available:false , message:Working towards 4.6.0-rc.4: 78% complete" time="2020-10-21T05:34:11Z" level=info msg="Waiting for cluster version to be available" time="2020-10-21T05:34:11Z" level=info msg="Cluster version is available:false , message:Working towards 4.6.0-rc.4: 78% complete" Cluster Version Operator Logs: E1021 04:56:22.013534 1 event.go:273] Unable to write event: 'can't create an event with namespace 'default' in namespace 'openshift-cluster-version'' (may retry after sleeping) E1021 04:56:22.013635 1 event.go:218] Unable to write event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:".163fe8e75b843788", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"ConfigMap", Namespace:"", Name:"", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}, Reason:"LeaderElection", Message:"master-0-0_a2380518-ae40-4761-afaa-96c685400da2 stopped leading", Source:v1.EventSource{Component:"openshift-cluster-version", Host:""}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xbfdc0da22bf7e788, ext:128205563418, loc:(*time.Location)(0x26b0400)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xbfdc0da22bf7e788, ext:128205563418, loc:(*time.Location)(0x26b0400)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}' (retry limit exceeded!) I1021 04:56:32.738209 1 start.go:260] Waiting on 1 outstanding goroutines. E1021 04:56:32.738259 1 metrics.go:160] Failed to gracefully shut down metrics server: accept tcp [::]:9099: use of closed network connection I1021 04:56:48.238826 1 discovery.go:214] Invalidating discovery information Version-Release number of selected component (if applicable): { "release_tag": "v1.0.10.1", "versions": { "assisted-ignition-generator": "", "assisted-installer": "registry-proxy.engineering.redhat.com/rh-osbs/openshift4-assisted-installer-rhel8:v4.6.0-28", "assisted-installer-controller": "registry-proxy.engineering.redhat.com/rh-osbs/openshift4-assisted-installer-reporter-rhel8:v4.6.0-25", "assisted-installer-service": "quay.io/app-sre/assisted-service:fd4fdf8", "discovery-agent": "quay.io/ocpmetal/assisted-installer-agent:latest", "image-builder": "quay.io/app-sre/assisted-iso-create:fd4fdf8" } } How reproducible: Intermittently Steps to Reproduce: 1. Create cluster and generate ISO with http proxy set 2. Begin cluster installation Actual results: Cluster install hangs on "Finalizing" Expected results: Cluster install completed Additional info:
The issue reproduced on other cluster deployment variations as well. The main problem that cluster is stacked on Finalizing stage See attached screenshots or verify cluster on Staging as example https://qaprodauth.cloud.redhat.com/openshift/assisted-installer/clusters/261d76cf-cbd4-4241-8d54-582cb34fbe68
Created attachment 1723210 [details] Screenshot 1
Created attachment 1723211 [details] Screenshot 2
Also happened in Production https://cloud.redhat.com/openshift/assisted-installer/clusters/91eee822-b2d0-4f99-bd19-c65d492ba9dc Cluster created at 9/26/2020, 4:44:12 AM
Created attachment 1723246 [details] Screenshot Production
Created attachment 1723259 [details] assisted-installer-controller-9zn8b.logs
Another proxy install hanging on "Finalizing" https://qaprodauth.cloud.redhat.com/openshift/assisted-installer/clusters/69080309-8fd1-4f9d-9e0b-f474e47aaeb3 Attached cluster version describe and controller logs
Created attachment 1723273 [details] output of "oc describe --all-namespaces clusterversion"
Created attachment 1723274 [details] controller logs for cluster 69080309-8fd1-4f9d-9e0b-f474e47aaeb3
Caused by 'Bug 1891143 - CVO deadlocked while shutting down, shortly after fresh cluster install (metrics goroutine)'
This was fixed for 4.7: https://bugzilla.redhat.com/show_bug.cgi?id=1892267 we need to see how bad is the impact of it on assisted-installer with 4.6.
Is this still reproducing? Looks like it was waiting for CVO (log seems incomplete), thus installation took longer. I recently merged this[1] PR, so we currently don't check for CVO status [1] https://github.com/openshift/assisted-installer/commit/3193963c83bf98fe3f877058a97ee14643aa0a8e
This indeed does not reproduce, anymore. As the check is disabled.