Bug 1860233

Summary: After installing RHACM 2.0, managed cluster created through ACM is in Pending Import state not in Ready State
Product: Red Hat Advanced Cluster Management for Kubernetes Reporter: Neha Chugh <nchugh>
Component: Cluster LifecycleAssignee: Hao Liu <haoli>
Status: CLOSED NOTABUG QA Contact: magchen@redthat.com <magchen>
Severity: high Docs Contact: Christopher Dawson <cdawson>
Priority: unspecified    
Version: rhacm-2.0CC: bscalio, gghezzo, haoli, ming
Target Milestone: ---Flags: ming: rhacm-2.0.z+
Target Release: rhacm-2.0.4   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-21 23:26:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
hive logs of google managed cluster showing installation as success.
none
latest hive logs which is showing successful installation
none
Showing pending import state though the installation is success none

Description Neha Chugh 2020-07-24 04:50:57 UTC
Description of problem:
After installing RHACM 2.0, managed cluster created through ACM is in Pending Import state not in Ready State

Version-Release number of selected component (if applicable):
2.0

How reproducible:
In my test environment

Steps to Reproduce:

After installing RHACM 2.0 on my bare metal setup, The cluster creation is showing success in the hive logs but the status of managed cluster shows in pending import state rather than Ready State.

The RHACM 2.0 installation is a success, all the pods under open-cluster-management namespace is in running state.


Actual results:

It is showing in Pending import state.

Expected results:

It should show in Ready State.

Additional info:

Attaching my hive logs and test environment details for reference.

Comment 1 Neha Chugh 2020-07-24 04:52:12 UTC
Created attachment 1702296 [details]
hive logs of google managed cluster showing installation as success.

Comment 3 Mike Ng 2020-07-24 20:44:20 UTC
G2Bsync 663726067 comment 
 hanqiuzh Fri, 24 Jul 2020 20:43:12 UTC 
 G2Bsync  Used kubeconfig in `/var/www/html/43/deploy02/auth/kubeconfig`. The cluster given is not a hub, but managedcluster.
Log in rergistration-agent is showing missing managedcluster permission:
```
E0724 20:34:23.229043       1 reflector.go:178] k8s.io/client-go.3/tools/cache/reflector.go:125: Failed to list *v1.ManagedCluster: managedclusters.cluster.open-cluster-management.io "nchugh-gc" is forbidden: User "system:open-cluster-management:nchugh-import:b2vtk" cannot list resource "managedclusters" in API group "cluster.open-cluster-management.io" at the cluster scope
E0724 20:35:00.600143       1 reflector.go:178] k8s.io/client-go.3/tools/cache/reflector.go:125: Failed to list *v1.ManagedCluster: managedclusters.cluster.open-cluster-management.io "nchugh-gc" is forbidden: User "system:open-cluster-management:nchugh-import:b2vtk" cannot list resource "managedclusters" in API group "cluster.open-cluster-management.io" at the cluster scope
```
From the log, it's possible to be an old version of rhacm 2.0 install. @qiujian16  can you please take a look, thanks

Comment 4 Mike Ng 2020-07-27 13:37:30 UTC
G2Bsync 664078477 comment 
 skeeey Mon, 27 Jul 2020 01:54:52 UTC 
 G2Bsync try to connect the cluster, but the token is expired, `error: You must be logged in to the server (Unauthorized)`

Comment 5 Mike Ng 2020-07-27 13:37:31 UTC
G2Bsync 664391776 comment 
 juliana-hsu Mon, 27 Jul 2020 13:19:43 UTC 
 G2Bsync What build snapshot was being used?

Comment 7 Neha Chugh 2020-08-10 17:04:04 UTC
Created attachment 1710991 [details]
latest hive logs which is showing successful installation

Comment 8 Neha Chugh 2020-08-10 17:08:57 UTC
Created attachment 1710994 [details]
Showing pending import state though the installation is success

Comment 9 Neha Chugh 2020-08-12 16:45:26 UTC
Hello Team, 

After checking pods status of GKE cluster that has been created via ACM console, below 2 pods are in CrashLoopBackOff state which could be the reason for pending import status of GKE cluster i.e.

klusterlet-work-agent-674dd7f9f8-4x659	NamespaceNS open-cluster-management-agent	ReplicaSetRSklusterlet-work-agent-674dd7f9f8	NodeN nchugh-jkdk8-w-b-6tc7v.c.openshift-gce-devel-ci.internal	CrashLoopBackOff	ContainersNotReady
	
klusterlet-work-agent-674dd7f9f8-hrd64	NamespaceNSopen-cluster-management-agent	ReplicaSetRSklusterlet-work-agent-674dd7f9f8	NodeNnchugh-jkdk8-w-a-7nzr8.c.openshift-gce-devel-ci.internal	CrashLoopBackOff	ContainersNotReady	

After checking the logs of these pods, below exception has been noticed i.e.

W0812 16:38:34.654493       1 builder.go:94] graceful termination failed, controllers failed with error: stat /spoke/hub-kubeconfig/kubeconfig: no such file or directory
I0812 16:38:34.654459       1 event.go:278] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"open-cluster-management-agent", Name:"work-agent-lock", UID:"9bf57c74-e4a0-4889-a41e-0c35d739aeb6", APIVersion:"v1", ResourceVersion:"846153", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' e3d89866-6a83-4c6f-9c02-1ee661c4c2d3 became leader


Seems like it is unable to find kubeconfig directory due to which it lead to CrashLoopBackOff state, not sure how to rectify this issue.

I understand the issue is specific to test environment but it would be great if you can suggest the solution so to rectify the issue.

Regards,
Neha Chugh

Comment 11 Bradley Scalio 2020-09-03 13:33:56 UTC
Note KB article referencing this issue:  https://access.redhat.com/solutions/5355821

A record needs to be added to the public DNS to point the Hub Cluster's kube-apiserver address (in this exampleapi.ocp.example.com) to the LoadBalancer of the cluster.
Following this the klusterlet-registration pod should be able to fetch the host and continue the cluster import.

Comment 13 Mike Ng 2020-10-15 17:34:41 UTC
G2Bsync 708869131 comment 
 juliana-hsu Thu, 15 Oct 2020 03:15:58 UTC 
 G2Bsync nchugh is this still an issue or can it be closed?

Comment 14 Red Hat Bugzilla 2023-09-15 00:34:37 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days