Bug 2108888 - Hypershift on AWS - control plane not running
Summary: Hypershift on AWS - control plane not running
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Advanced Cluster Management for Kubernetes
Classification: Red Hat
Component: Hypershift
Version: rhacm-2.6
Hardware: x86_64
OS: Unspecified
unspecified
high
Target Milestone: ---
: rhacm-2.6
Assignee: Roke Jung
QA Contact: txue
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-19 22:36 UTC by Thuy Nguyen
Modified: 2022-09-06 22:33 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-09-06 22:33:43 UTC
Target Upstream Version:
Embargoed:
cbynum: rhacm-2.6+
cbynum: rhacm-2.6.z+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github stolostron backlog issues 24409 0 None None None 2022-07-20 02:33:38 UTC
Red Hat Product Errata RHSA-2022:6370 0 None None None 2022-09-06 22:33:53 UTC

Description Thuy Nguyen 2022-07-19 22:36:38 UTC
Description of problem: Hypershift on AWS - control plane install not completed


Version-Release number of selected component (if applicable):
ACM 2.6.0-DOWNSTREAM-2022-07-18-22-05-41
2.1.0-DOWNANDBACK-2022-07-19-01-34-01


How reproducible:


Steps to Reproduce:
1. Enable hypershift
2. Install hypershift addon in local-cluster
3. Create hypershiftdeployment by applying yaml file:
```
apiVersion: cluster.open-cluster-management.io/v1alpha1
kind: HypershiftDeployment
metadata:
  name: hs0
  namespace: default
spec:
  hostingCluster: local-cluster     # the hypershift management cluster name.
  hostingNamespace: hs0-ns     # specify the namespace to which hostedcluster and noodpools belong on the hypershift management cluster.
  infrastructure:
    cloudProvider:
      name: aws-creds
    configure: True
    platform:
      aws:
        region: us-east-1
  hostedClusterSpec:
    etcd:
      managed:
        storage:
          persistentVolume:
            size: 4Gi
          type: PersistentVolume
      managementType: Managed
    controllerAvailabilityPolicy: SingleReplica
    #controllerAvailabilityPolicy: HighlyAvailable
    release:
      image: quay.io/openshift-release-dev/ocp-release:4.11.0-rc.3-x86_64
    networking:
      networkType: OpenShiftSDN
      machineCIDR: ""           # Can be left empty, when configure: true
      podCIDR: ""               # Can be left empty, when configure: true
      serviceCIDR: ""           # Can be left empty, when configure: true
    platform:
      type: AWS
    pullSecret: {}  # Can be left empty, when configure: true
    sshKey: {}      # Can be left empty, when configure: true
    services: []    # Can be left empty, when configure: true
  nodePools:
  - name: hs0
    spec:
      clusterName: hs0
      management:
        autoRepair: false
        replace:
          rollingUpdate:
            maxSurge: 1
            maxUnavailable: 0
          strategy: RollingUpdate
        upgradeType: Replace
      nodeCount: 2
      platform:
        aws:
          instanceType: t3.large
          rootVolume:
            size: 35
            type: gp3
        type: AWS
      release:
        image: quay.io/openshift-release-dev/ocp-release:4.11.0-rc.3-x86_64
```

Actual results:
control plane install not completed. 

Expected results:


Additional info:

```
# oc get po -n hs0-ns-hs0
NAME                                      READY   STATUS     RESTARTS   AGE
capi-provider-87fcb4765-k2zz2             0/2     Init:0/1   0          16m
cluster-api-84749bf6fd-x8smb              1/1     Running    0          16m
cluster-autoscaler-74dcd7fc7f-v6vpn       0/1     Init:0/1   0          15m
cluster-autoscaler-b6dc98b7c-8mz5w        0/1     Init:0/1   0          15m
control-plane-operator-6d48ff775b-w2nvf   2/2     Running    0          16m
etcd-0                                    1/1     Running    0          15m
ignition-server-646549b976-mt4wf          1/1     Running    0          15m
konnectivity-agent-cc59444f4-tmbds        1/1     Running    0          15m
konnectivity-server-699bfb7d94-g4bvw      1/1     Running    0          15m
kube-apiserver-64b99b657f-s6rtk           0/5     Init:0/2   0          15m
machine-approver-5b676fffff-jjmkm         0/1     Init:0/1   0          15m
machine-approver-6979f4f57d-8drlg         0/1     Init:0/1   0          15m


# oc describe po kube-apiserver-64b99b657f-s6rtk -n h0-ns-hs0
Events:
  Type     Reason       Age                 From               Message
  ----     ------       ----                ----               -------
  Normal   Scheduled    16m                 default-scheduler  Successfully assigned hs0-ns-hs0/kube-apiserver-64b99b657f-s6rtk to ocp4-az-dr-2-vnm5z-worker-centralus3-pm4j7 by ocp4-az-dr-2-vnm5z-master-0
  Warning  FailedMount  14m                 kubelet            Unable to attach or mount volumes: unmounted volumes=[cloud-creds], unattached volumes=[cloud-creds bootstrap-manifests oauth-metadata kubelet-client-crt aggregator-ca egress-selector-config etcd-client-crt konnectivity-client aws-pod-identity-webhook-kubeconfig cloud-token aws-pod-identity-webhook-serving-certs kubelet-client-ca localhost-kubeconfig root-ca kas-secret-encryption-config kas-config cloud-config logs auth-token-webhook-config client-ca kubeconfig aggregator-crt audit-config server-crt svcacct-key]: timed out waiting for the condition
  Warning  FailedMount  12m                 kubelet            Unable to attach or mount volumes: unmounted volumes=[cloud-creds], unattached volumes=[kubeconfig kubelet-client-ca server-crt client-ca cloud-token cloud-config oauth-metadata bootstrap-manifests kas-secret-encryption-config etcd-client-crt svcacct-key aws-pod-identity-webhook-kubeconfig localhost-kubeconfig root-ca audit-config logs auth-token-webhook-config kas-config konnectivity-client cloud-creds egress-selector-config aws-pod-identity-webhook-serving-certs aggregator-crt aggregator-ca kubelet-client-crt]: timed out waiting for the condition
  Warning  FailedMount  10m                 kubelet            Unable to attach or mount volumes: unmounted volumes=[cloud-creds], unattached volumes=[kubelet-client-crt client-ca oauth-metadata etcd-client-crt kas-config aggregator-crt logs cloud-token kubeconfig aggregator-ca auth-token-webhook-config kas-secret-encryption-config server-crt bootstrap-manifests egress-selector-config root-ca svcacct-key aws-pod-identity-webhook-serving-certs cloud-creds aws-pod-identity-webhook-kubeconfig localhost-kubeconfig kubelet-client-ca konnectivity-client audit-config cloud-config]: timed out waiting for the condition
  Warning  FailedMount  7m59s               kubelet            Unable to attach or mount volumes: unmounted volumes=[cloud-creds], unattached volumes=[svcacct-key aws-pod-identity-webhook-kubeconfig localhost-kubeconfig auth-token-webhook-config kubelet-client-crt aws-pod-identity-webhook-serving-certs client-ca oauth-metadata audit-config logs cloud-creds kubeconfig bootstrap-manifests kas-secret-encryption-config cloud-config cloud-token kubelet-client-ca aggregator-ca konnectivity-client kas-config root-ca egress-selector-config server-crt aggregator-crt etcd-client-crt]: timed out waiting for the condition
  Warning  FailedMount  5m45s               kubelet            Unable to attach or mount volumes: unmounted volumes=[cloud-creds], unattached volumes=[etcd-client-crt aws-pod-identity-webhook-kubeconfig cloud-config kas-config logs cloud-creds kas-secret-encryption-config aws-pod-identity-webhook-serving-certs oauth-metadata audit-config konnectivity-client kubeconfig svcacct-key kubelet-client-ca cloud-token bootstrap-manifests aggregator-ca kubelet-client-crt server-crt root-ca aggregator-crt auth-token-webhook-config egress-selector-config client-ca localhost-kubeconfig]: timed out waiting for the condition
  Warning  FailedMount  3m27s               kubelet            Unable to attach or mount volumes: unmounted volumes=[cloud-creds], unattached volumes=[kas-config server-crt cloud-config etcd-client-crt localhost-kubeconfig cloud-creds kubeconfig egress-selector-config kubelet-client-ca logs aws-pod-identity-webhook-kubeconfig kubelet-client-crt audit-config cloud-token aws-pod-identity-webhook-serving-certs aggregator-crt oauth-metadata aggregator-ca auth-token-webhook-config bootstrap-manifests client-ca svcacct-key kas-secret-encryption-config konnectivity-client root-ca]: timed out waiting for the condition
  Warning  FailedMount  72s                 kubelet            Unable to attach or mount volumes: unmounted volumes=[cloud-creds], unattached volumes=[kubelet-client-crt bootstrap-manifests etcd-client-crt server-crt aggregator-ca aws-pod-identity-webhook-serving-certs konnectivity-client kas-secret-encryption-config egress-selector-config client-ca cloud-creds logs oauth-metadata aws-pod-identity-webhook-kubeconfig cloud-config auth-token-webhook-config svcacct-key localhost-kubeconfig kubelet-client-ca root-ca aggregator-crt kubeconfig cloud-token kas-config audit-config]: timed out waiting for the condition
  Warning  FailedMount  28s (x16 over 16m)  kubelet            MountVolume.SetUp failed for volume "cloud-creds" : secret "cloud-controller-creds" not found


# oc logs konnectivity-agent-cc59444f4-tmbds -n hs0-n-hs0
I0719 22:16:38.892536       1 options.go:102] AgentCert set to "/etc/konnectivity/agent/tls.crt".
I0719 22:16:38.892621       1 options.go:103] AgentKey set to "/etc/konnectivity/agent/tls.key".
I0719 22:16:38.892626       1 options.go:104] CACert set to "/etc/konnectivity/agent/ca.crt".
I0719 22:16:38.892630       1 options.go:105] ProxyServerHost set to "konnectivity-server".
I0719 22:16:38.892634       1 options.go:106] ProxyServerPort set to 8091.
I0719 22:16:38.892638       1 options.go:107] ALPNProtos set to [].
I0719 22:16:38.892643       1 options.go:108] HealthServerHost set to
I0719 22:16:38.892648       1 options.go:109] HealthServerPort set to 2041.
I0719 22:16:38.892653       1 options.go:110] AdminServerPort set to 8094.
I0719 22:16:38.892657       1 options.go:111] EnableProfiling set to false.
I0719 22:16:38.892665       1 options.go:112] EnableContentionProfiling set to false.
I0719 22:16:38.892674       1 options.go:113] AgentID set to 6a2c6edd-18ed-41aa-9a9d-71d12900fecc.
I0719 22:16:38.892680       1 options.go:114] SyncInterval set to 1m0s.
I0719 22:16:38.892688       1 options.go:115] ProbeInterval set to 30s.
I0719 22:16:38.892694       1 options.go:116] SyncIntervalCap set to 5m0s.
I0719 22:16:38.892700       1 options.go:117] Keepalive time set to 30s.
I0719 22:16:38.892706       1 options.go:118] ServiceAccountTokenPath set to "".
I0719 22:16:38.892716       1 options.go:119] AgentIdentifiers set to ipv4=172.30.154.9,ipv4=172.30.23.228,ipv4=172.30.88.67.
I0719 22:16:38.892723       1 options.go:120] WarnOnChannelLimit set to false.
I0719 22:16:38.892729       1 options.go:121] SyncForever set to false.
E0719 22:16:58.893673       1 clientset.go:183] "cannot connect once" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.30.251.165:8091: i/o timeout\""
E0719 22:18:02.528719       1 clientset.go:183] "cannot connect once" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: authentication handshake failed: remote error: tls: handshake failure\""
E0719 22:19:40.996887       1 clientset.go:183] "cannot connect once" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: authentication handshake failed: remote error: tls: handshake failure\""
E0719 22:22:04.972031       1 clientset.go:183] "cannot connect once" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: authentication handshake failed: remote error: tls: handshake failure\""
E0719 22:25:36.342959       1 clientset.go:183] "cannot connect once" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: authentication handshake failed: remote error: tls: handshake failure\""
E0719 22:30:49.086945       1 clientset.go:183] "cannot connect once" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: authentication handshake failed: remote error: tls: handshake failure\""


# oc get svc -n hs0-ns-hs0
NAME                        TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)             AGE
etcd-client                 ClusterIP      None             <none>          2379/TCP,2381/TCP   18m
etcd-discovery              ClusterIP      None             <none>          2380/TCP,2379/TCP   18m
ignition-server             ClusterIP      172.30.208.171   <none>          443/TCP             19m
konnectivity-server         ClusterIP      172.30.251.165   <none>          8091/TCP            19m
konnectivity-server-local   ClusterIP      172.30.159.100   <none>          8090/TCP            18m
kube-apiserver              LoadBalancer   172.30.218.131   13.89.107.183   7443:30254/TCP      19m
machine-config-server       ClusterIP      None             <none>          8443/TCP            19m
oauth-openshift             ClusterIP      172.30.250.153   <none>          6443/TCP            19m
openshift-apiserver         ClusterIP      172.30.154.9     <none>          443/TCP             19m
openshift-oauth-apiserver   ClusterIP      172.30.23.228    <none>          443/TCP             19m
packageserver               ClusterIP      172.30.88.67     <none>          443/TCP             19m
```

Comment 1 bot-tracker-sync 2022-08-18 00:23:55 UTC
G2Bsync 1218542340 comment 
 thuyn-581 Wed, 17 Aug 2022 22:14:26 UTC 
 G2BSync -
Retest on 2.6.0-FC6. Issue no longer observed.

Comment 4 errata-xmlrpc 2022-09-06 22:33:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Advanced Cluster Management 2.6.0 security updates and bug fixes), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:6370


Note You need to log in before you can comment on or make changes to this bug.