Bug 1889970 - [Assisted-4.6][Staging]Installation hanging in "Finalizing" state
Summary: [Assisted-4.6][Staging]Installation hanging in "Finalizing" state
Keywords:
Status: NEW
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: assisted-installer
Version: 4.6
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: ---
Assignee: yevgeny shnaidman
QA Contact: Yuri Obshansky
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-21 06:44 UTC by nshidlin
Modified: 2020-10-28 10:54 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)
Screenshot 1 (41.43 KB, image/png)
2020-10-21 13:41 UTC, Yuri Obshansky
no flags Details
Screenshot 2 (68.32 KB, image/png)
2020-10-21 13:42 UTC, Yuri Obshansky
no flags Details
Screenshot Production (8.54 KB, image/png)
2020-10-21 15:17 UTC, Yuri Obshansky
no flags Details
assisted-installer-controller-9zn8b.logs (15.96 KB, text/plain)
2020-10-21 17:01 UTC, Yuri Obshansky
no flags Details
output of "oc describe --all-namespaces clusterversion" (3.01 KB, text/plain)
2020-10-21 18:21 UTC, nshidlin
no flags Details
controller logs for cluster 69080309-8fd1-4f9d-9e0b-f474e47aaeb3 (14.24 KB, text/plain)
2020-10-21 18:22 UTC, nshidlin
no flags Details

Description nshidlin 2020-10-21 06:44:26 UTC
Description of problem:
When installing using http proxy the installation hangs in "Finalizing" state

Controller logs:

time="2020-10-21T05:32:41Z" level=info msg="Cluster version is available:false , message:Working towards 4.6.0-rc.4: 78% complete"
time="2020-10-21T05:33:11Z" level=info msg="Waiting for cluster version to be available"
time="2020-10-21T05:33:11Z" level=info msg="Cluster version is available:false , message:Working towards 4.6.0-rc.4: 78% complete"
time="2020-10-21T05:33:41Z" level=info msg="Waiting for cluster version to be available"
time="2020-10-21T05:33:41Z" level=info msg="Cluster version is available:false , message:Working towards 4.6.0-rc.4: 78% complete"
time="2020-10-21T05:34:11Z" level=info msg="Waiting for cluster version to be available"
time="2020-10-21T05:34:11Z" level=info msg="Cluster version is available:false , message:Working towards 4.6.0-rc.4: 78% complete"

Cluster Version Operator Logs:

E1021 04:56:22.013534       1 event.go:273] Unable to write event: 'can't create an event with namespace 'default' in namespace 'openshift-cluster-version'' (may retry after sleeping)
E1021 04:56:22.013635       1 event.go:218] Unable to write event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:".163fe8e75b843788", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"ConfigMap", Namespace:"", Name:"", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}, Reason:"LeaderElection", Message:"master-0-0_a2380518-ae40-4761-afaa-96c685400da2 stopped leading", Source:v1.EventSource{Component:"openshift-cluster-version", Host:""}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xbfdc0da22bf7e788, ext:128205563418, loc:(*time.Location)(0x26b0400)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xbfdc0da22bf7e788, ext:128205563418, loc:(*time.Location)(0x26b0400)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}' (retry limit exceeded!)
I1021 04:56:32.738209       1 start.go:260] Waiting on 1 outstanding goroutines.
E1021 04:56:32.738259       1 metrics.go:160] Failed to gracefully shut down metrics server: accept tcp [::]:9099: use of closed network connection
I1021 04:56:48.238826       1 discovery.go:214] Invalidating discovery information


Version-Release number of selected component (if applicable):
{
    "release_tag": "v1.0.10.1",
    "versions": {
        "assisted-ignition-generator": "",
        "assisted-installer": "registry-proxy.engineering.redhat.com/rh-osbs/openshift4-assisted-installer-rhel8:v4.6.0-28",
        "assisted-installer-controller": "registry-proxy.engineering.redhat.com/rh-osbs/openshift4-assisted-installer-reporter-rhel8:v4.6.0-25",
        "assisted-installer-service": "quay.io/app-sre/assisted-service:fd4fdf8",
        "discovery-agent": "quay.io/ocpmetal/assisted-installer-agent:latest",
        "image-builder": "quay.io/app-sre/assisted-iso-create:fd4fdf8"
    }
}

How reproducible:
Intermittently 

Steps to Reproduce:
1. Create cluster and generate ISO with http proxy set
2. Begin cluster installation

Actual results:
Cluster install hangs on "Finalizing"

Expected results:
Cluster install completed


Additional info:

Comment 1 Yuri Obshansky 2020-10-21 13:41:15 UTC
The issue reproduced on other cluster deployment variations as well.
The main problem that cluster is stacked on Finalizing stage
See attached screenshots or verify cluster on Staging as example
https://qaprodauth.cloud.redhat.com/openshift/assisted-installer/clusters/261d76cf-cbd4-4241-8d54-582cb34fbe68

Comment 2 Yuri Obshansky 2020-10-21 13:41:48 UTC
Created attachment 1723210 [details]
Screenshot 1

Comment 3 Yuri Obshansky 2020-10-21 13:42:16 UTC
Created attachment 1723211 [details]
Screenshot 2

Comment 5 Yuri Obshansky 2020-10-21 15:17:07 UTC
Also happened in Production 
https://cloud.redhat.com/openshift/assisted-installer/clusters/91eee822-b2d0-4f99-bd19-c65d492ba9dc
Cluster created at 9/26/2020, 4:44:12 AM

Comment 6 Yuri Obshansky 2020-10-21 15:17:40 UTC
Created attachment 1723246 [details]
Screenshot Production

Comment 7 Yuri Obshansky 2020-10-21 17:01:46 UTC
Created attachment 1723259 [details]
assisted-installer-controller-9zn8b.logs

Comment 8 nshidlin 2020-10-21 18:19:13 UTC
Another proxy install hanging on "Finalizing"
https://qaprodauth.cloud.redhat.com/openshift/assisted-installer/clusters/69080309-8fd1-4f9d-9e0b-f474e47aaeb3

Attached cluster version describe and controller logs

Comment 9 nshidlin 2020-10-21 18:21:37 UTC
Created attachment 1723273 [details]
output of "oc describe --all-namespaces clusterversion"

Comment 10 nshidlin 2020-10-21 18:22:52 UTC
Created attachment 1723274 [details]
controller logs for cluster 69080309-8fd1-4f9d-9e0b-f474e47aaeb3

Comment 11 Yuri Obshansky 2020-10-27 18:53:19 UTC
Caused by 'Bug 1891143 - CVO deadlocked while shutting down, shortly after fresh cluster install (metrics goroutine)'

Comment 12 Omri Hochman 2020-10-28 10:54:06 UTC
This was fixed for 4.7: https://bugzilla.redhat.com/show_bug.cgi?id=1892267

we need to see how bad is the impact of it on assisted-installer with 4.6.


Note You need to log in before you can comment on or make changes to this bug.