Bug 2015793 - [hypershift] The collect-profiles job's pods should run on the control-plane node
Summary: [hypershift] The collect-profiles job's pods should run on the control-plane ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.10
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.10.0
Assignee: Alexander Greene
QA Contact: Jian Zhang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-10-20 06:26 UTC by Jian Zhang
Modified: 2022-03-10 16:20 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-10 16:20:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift hypershift pull 583 0 None Draft Bug 2015793: Run OLM's collect profiles job on the management cluster 2021-10-21 19:26:35 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:20:52 UTC

Description Jian Zhang 2021-10-20 06:26:11 UTC
Description of problem:
The collect-profiles pods are running on the guest cluster nodes.
[cloud-user@preserve-olm-env jian]$ oc get nodes
NAME                           STATUS   ROLES           AGE   VERSION
ip-10-0-136-239.ec2.internal   Ready    master,worker   19h   v1.22.0-rc.0+894a78b
ip-10-0-137-69.ec2.internal    Ready    master,worker   19h   v1.22.0-rc.0+894a78b

[cloud-user@preserve-olm-env jian]$ oc get pods -n openshift-operator-lifecycle-manager -o wide
NAME                                 READY   STATUS      RESTARTS   AGE    IP            NODE                          NOMINATED NODE   READINESS GATES
collect-profiles-27245130--1-6p9fw   0/1     Completed   0          36m    10.133.0.49   ip-10-0-137-69.ec2.internal   <none>           <none>
collect-profiles-27245145--1-j7p2q   0/1     Completed   0          21m    10.133.0.50   ip-10-0-137-69.ec2.internal   <none>           <none>
collect-profiles-27245160--1-n4ztc   0/1     Completed   0          6m5s   10.133.0.51   ip-10-0-137-69.ec2.internal   <none>           <none>

[cloud-user@preserve-olm-env jian]$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0     True        False         22h     Cluster version is 4.9.0
[cloud-user@preserve-olm-env jian]$ oc project
Using project "default" from context named "clusters-example" on server "https://a5e3f9c3568ef4d8783a8dc1d501e4b1-1330874932.us-east-2.elb.amazonaws.com:6443".

Version-Release number of selected component (if applicable):
OCP: Cluster version is 4.10.0-0.nightly-2021-10-16-173656
Hypershift: Cluster version is 4.9.0

How reproducible:
always

Steps to Reproduce:
1. Create OCP 4.10.
[cloud-user@preserve-olm-env jian]$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2021-10-16-173656   True        False         27h     Cluster version is 4.10.0-0.nightly-2021-10-16-173656

2. Install the `hypershift` binary based on https://github.com/openshift/hypershift#how-to-install-the-hypershift-cli

3. Create a hosted cluster
[cloud-user@preserve-olm-env jian]$ hypershift install
...
applied ServiceMonitor hypershift/operator
applied PrometheusRule hypershift/metrics

[cloud-user@preserve-olm-env hypershift]$ hypershift create cluster --pull-secret .dockerconfigjson --aws-creds ./aws/credentials --name example --base-domain qe.devcluster.openshift.com
INFO[0000] Creating infrastructure                       id=example-rlm59
INFO[0000] Using zone                                    zone=us-east-1a
...

4, Create the kubeconfig of this cluster, and login it.
[cloud-user@preserve-olm-env hypershift]$ hypershift create kubeconfig
2021/10/19 03:14:36 selected 1 of 1 hostedclusters for the kubeconfig
..

[cloud-user@preserve-olm-env hypershift]$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0     True        False         23h     Cluster version is 4.9.0

[cloud-user@preserve-olm-env hypershift]$ oc get nodes
NAME                           STATUS   ROLES           AGE   VERSION
ip-10-0-136-239.ec2.internal   Ready    master,worker   23h   v1.22.0-rc.0+894a78b
ip-10-0-137-69.ec2.internal    Ready    master,worker   23h   v1.22.0-rc.0+894a78b

5, Check the OLM pods if running on the control plane nodes.


Actual results:
The collect-profiles job's pods are not running on the control plane nodes.
[cloud-user@preserve-olm-env jian]$ oc get pods -n openshift-operator-lifecycle-manager -o wide
NAME                                 READY   STATUS      RESTARTS   AGE    IP            NODE                          NOMINATED NODE   READINESS GATES
collect-profiles-27245130--1-6p9fw   0/1     Completed   0          36m    10.133.0.49   ip-10-0-137-69.ec2.internal   <none>           <none>
collect-profiles-27245145--1-j7p2q   0/1     Completed   0          21m    10.133.0.50   ip-10-0-137-69.ec2.internal   <none>           <none>
collect-profiles-27245160--1-n4ztc   0/1     Completed   0          6m5s   10.133.0.51   ip-10-0-137-69.ec2.internal   <none>           <none>


Expected results:
The collect-profiles job's pods should run on the control-plane node.

Additional info:

Comment 1 Jian Zhang 2021-10-26 02:38:55 UTC
1, Build the hypershift binary that contains the fixed PR, as follows,

[cloud-user@preserve-olm-env hypershift]$ git remote add alex git:awgreene/hypershift.git
[cloud-user@preserve-olm-env hypershift]$ 
[cloud-user@preserve-olm-env hypershift]$ 
[cloud-user@preserve-olm-env hypershift]$ git fetch alex run-collect-profiles-on-management-cluster:bz-2015793
remote: Enumerating objects: 5856, done.
remote: Counting objects: 100% (5856/5856), done.
remote: Compressing objects: 100% (4047/4047), done.
remote: Total 5816 (delta 1730), reused 5000 (delta 1420), pack-reused 0
Receiving objects: 100% (5816/5816), 9.70 MiB | 17.37 MiB/s, done.
Resolving deltas: 100% (1730/1730), completed with 28 local objects.
From github.com:awgreene/hypershift
 * [new branch]        run-collect-profiles-on-management-cluster -> bz-2015793
 * [new branch]        run-collect-profiles-on-management-cluster -> alex/run-collect-profiles-on-management-cluster

[cloud-user@preserve-olm-env hypershift]$ git checkout bz-2015793
Switched to branch 'bz-2015793'
[cloud-user@preserve-olm-env hypershift]$ git log
commit a909db1672d23503c5e3bd332021f9a347195e16 (HEAD -> bz-2015793, alex/run-collect-profiles-on-management-cluster)
Author: Alexander Greene <greene.al1991>
Date:   Wed Oct 20 13:41:15 2021 -0700

    Run olm's collect profiles job in control plane
    
    Problem: In 4.9, OLM introduce a job that collects
    the data from the pprof endpoint ever 15 minutes.
    This job currently runs on the guest cluster but
    should run on the control plane.
    
    Solution: Run the profile collection job on the
    control plane.
...

[cloud-user@preserve-olm-env hypershift]$ make build
CGO_ENABLED=0 GO111MODULE=on GOFLAGS=-mod=vendor go build -gcflags=all='-N -l' -o bin/ignition-server ./ignition-server
CGO_ENABLED=0 GO111MODULE=on GOFLAGS=-mod=vendor go build -gcflags=all='-N -l' -o bin/hypershift-operator ./hypershift-operator
CGO_ENABLED=0 GO111MODULE=on GOFLAGS=-mod=vendor go build -gcflags=all='-N -l' -o bin/control-plane-operator ./control-plane-operator
CGO_ENABLED=0 GO111MODULE=on GOFLAGS=-mod=vendor go build -gcflags=all='-N -l' -o bin/hosted-cluster-config-operator ./hosted-cluster-config-operator
CGO_ENABLED=0 GO111MODULE=on GOFLAGS=-mod=vendor go build -gcflags=all='-N -l' -o bin/konnectivity-socks5-proxy ./konnectivity-socks5-proxy
CGO_ENABLED=0 GO111MODULE=on GOFLAGS=-mod=vendor go build -gcflags=all='-N -l' -o bin/hypershift .
CGO_ENABLED=0 GO111MODULE=on GOFLAGS=-mod=vendor go build -gcflags=all='-N -l' -o bin/availability-prober ./availability-prober

[cloud-user@preserve-olm-env hypershift]$ ls -l ./bin/hypershift 
-rwxrwxr-x. 1 cloud-user cloud-user 78096065 Oct 25 21:47 ./bin/hypershift

2, Create an OCP 4.10, as follows,
[cloud-user@preserve-olm-env hypershift]$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2021-10-25-125739   True        False         23m     Cluster version is 4.10.0-0.nightly-2021-10-25-125739

3, Create a hosted cluster.
[cloud-user@preserve-olm-env hypershift]$ ./bin/hypershift install
applied CustomResourceDefinition /clusterresourcesetbindings.addons.cluster.x-k8s.io
...

[cloud-user@preserve-olm-env hypershift]$ ./bin/hypershift create cluster --pull-secret .dockerconfigjson --aws-creds ./aws/credentials --name jian26 --base-domain qe.devcluster.openshift.com 
INFO[0000] Creating infrastructure                       id=jian26-8j2vp
INFO[0000] Using zone                                    zone=us-east-1a
...

[cloud-user@preserve-olm-env hypershift]$ oc get hostedclusters -n clusters
NAME     VERSION   KUBECONFIG   PROGRESS   AVAILABLE   REASON
jian26                          Partial    False       UnhealthyControlPlaneComponents

[cloud-user@preserve-olm-env hypershift]$ oc get pods -n clusters-jian26 
NAME                                      READY   STATUS    RESTARTS   AGE
capa-controller-manager-67f466565-bd2dg   1/1     Running   0          11m
cluster-api-54f588bd99-lgx28              1/1     Running   0          11m
control-plane-operator-848b899586-dt8bk   1/1     Running   0          11m
ignition-server-698888978-zwxtj           1/1     Running   0          11m

[cloud-user@preserve-olm-env hypershift]$ ./bin/hypershift create kubeconfig
2021/10/25 22:35:29 skipping hostedcluster clusters/jian26 which reports no kubeconfig
2021/10/25 22:35:29 selected 0 of 1 hostedclusters for the kubeconfig
2021/10/25 22:35:29 created kubeconfig with 0 contexts
apiVersion: v1
clusters: []
contexts: []
current-context: ""
kind: Config
preferences: {}
users: []

Seems like something wrong with this new hypershift binary, list the whole logs here:
cloud-user@preserve-olm-env hypershift]$ ./bin/hypershift create cluster --pull-secret .dockerconfigjson --aws-creds ./aws/credentials --name jian26 --base-domain qe.devcluster.openshift.com 
INFO[0000] Creating infrastructure                       id=jian26-8j2vp
INFO[0000] Using zone                                    zone=us-east-1a
INFO[0001] Created VPC                                   id=vpc-02a9e892ea26849a9
INFO[0001] Enabled DNS support on VPC                    id=vpc-02a9e892ea26849a9
INFO[0001] Enabled DNS hostnames on VPC                  id=vpc-02a9e892ea26849a9
INFO[0001] Created DHCP options                          id=dopt-0fdfb8c6e25d5f2eb
INFO[0001] Associated DHCP options with VPC              dhcp options=dopt-0fdfb8c6e25d5f2eb vpc=vpc-02a9e892ea26849a9
INFO[0002] Created subnet                                id=subnet-05ae2def8968b2dfc name=jian26-8j2vp-private-us-east-1a
INFO[0002] Created subnet                                id=subnet-0a04c03e704c10a8f name=jian26-8j2vp-public-us-east-1a
INFO[0002] Created internet gateway                      id=igw-064616d3efcd77314
INFO[0003] Attached internet gateway to VPC              internet gateway=igw-064616d3efcd77314 vpc=vpc-02a9e892ea26849a9
INFO[0003] Created elastic IP for NAT gateway            id=eipalloc-0d6d6d57d438809a6
INFO[0003] Created NAT gateway                           id=nat-0c870e25b3ef8425d
INFO[0007] Created security group                        id=sg-068817e12ad73802b name=jian26-8j2vp-worker-sg
INFO[0007] Authorized ingress rules on security group    id=sg-068817e12ad73802b
INFO[0008] Created route table                           id=rtb-0ae7eb6093ef74fa3 name=jian26-8j2vp-private-us-east-1a
INFO[0008] Created route to NAT gateway                  nat gateway=nat-0c870e25b3ef8425d route table=rtb-0ae7eb6093ef74fa3
INFO[0008] Associated subnet with route table            route table=rtb-0ae7eb6093ef74fa3 subnet=subnet-05ae2def8968b2dfc
INFO[0009] Created route table                           id=rtb-0b0d2e919424327ea name=jian26-8j2vp-public-us-east-1a
INFO[0009] Set main VPC route table                      route table=rtb-0b0d2e919424327ea vpc=vpc-02a9e892ea26849a9
INFO[0009] Created route to internet gateway             internet gateway=igw-064616d3efcd77314 route table=rtb-0b0d2e919424327ea
INFO[0009] Associated route table with subnet            route table=rtb-0b0d2e919424327ea subnet=subnet-0a04c03e704c10a8f
INFO[0010] Created s3 VPC endpoint                       id=vpce-06d82b38e7bde3b88
INFO[0010] Found existing public zone                    id=Z3B3KOVA3TRCWP name=qe.devcluster.openshift.com
INFO[0011] Created private zone                          id=Z08662363M99QOR6YNY7V name=jian26.qe.devcluster.openshift.com
INFO[0011] Detected Issuer URL                           issuer="https://oidc-jian26-8j2vp.apps.jiazha26.qe.devcluster.openshift.com"
INFO[0011] OIDC CA thumbprint discovered                 thumbprint=07f30b8d5196c0aa29cd6ea4820cd34cde8b88aa
INFO[0011] Created OIDC provider                         provider="arn:aws:iam::301721915996:oidc-provider/oidc-jian26-8j2vp.apps.jiazha26.qe.devcluster.openshift.com"
INFO[0011] Created role                                  name=jian26-8j2vp-openshift-ingress
INFO[0011] Created role policy                           name=jian26-8j2vp-openshift-ingress
INFO[0011] Created role                                  name=jian26-8j2vp-openshift-image-registry
INFO[0011] Created role policy                           name=jian26-8j2vp-openshift-image-registry
INFO[0012] Created role                                  name=jian26-8j2vp-aws-ebs-csi-driver-operator
INFO[0012] Created role policy                           name=jian26-8j2vp-aws-ebs-csi-driver-operator
INFO[0012] Created role                                  name=jian26-8j2vp-worker-role
INFO[0012] Created instance profile                      name=jian26-8j2vp-worker
INFO[0012] Added role to instance profile                profile=jian26-8j2vp-worker role=jian26-8j2vp-worker-role
INFO[0012] Created role policy                           name=jian26-8j2vp-worker-policy
INFO[0012] Created IAM profile                           name=jian26-8j2vp-worker region=us-east-1
INFO[0012] Created user                                  user=jian26-8j2vp-cloud-controller
INFO[0012] Created user policy                           user=jian26-8j2vp-cloud-controller
INFO[0013] Created access key                            user=jian26-8j2vp-cloud-controller
INFO[0013] Created user                                  user=jian26-8j2vp-node-pool
INFO[0013] Created user policy                           user=jian26-8j2vp-node-pool
INFO[0013] Created access key                            user=jian26-8j2vp-node-pool
INFO[0013] Applied Kube resource                         kind=Namespace name=clusters namespace=
INFO[0013] Applied Kube resource                         kind=Secret name=jian26-pull-secret namespace=clusters
INFO[0013] Applied Kube resource                         kind=Secret name=jian26-cloud-ctrl-creds namespace=clusters
INFO[0013] Applied Kube resource                         kind=Secret name=jian26-node-mgmt-creds namespace=clusters
INFO[0013] Applied Kube resource                         kind=HostedCluster name=jian26 namespace=clusters
INFO[0013] Applied Kube resource                         kind=NodePool name=jian26 namespace=clusters

Comment 2 Alexander Greene 2021-11-03 22:49:05 UTC
@jiazha I do not believe that your test will work because the image used by the control plane operator does not include the changes introduced in the unmerged PR. You should build the hypershift image and set the control plane operator's image with the `--control-plane-operator-image` flag.

Comment 3 Alexander Greene 2021-11-04 13:13:47 UTC
Tested locally and the collect-profile job was running in the hosted cluster namespace, but failed because the olm image did not include this PR https://github.com/openshift/operator-framework-olm/pull/212

```
$ hypershift create cluster \
  --pull-secret ~/dev/openshift/installer/pull-secret.json \
  --aws-creds ~/.aws/credentials \
  --name agreeneguest \
  --base-domain devcluster.openshift.com \
  --control-plane-operator-image quay.io/agreene/hypershift:cp
INFO[0004] Creating infrastructure                       id=agreeneguest-t82km
INFO[0005] Using zone                                    zone=us-east-1a
INFO[0005] Created VPC                                   id=vpc-057d76f2f156dd433
INFO[0005] Enabled DNS support on VPC                    id=vpc-057d76f2f156dd433
INFO[0006] Enabled DNS hostnames on VPC                  id=vpc-057d76f2f156dd433
INFO[0006] Created DHCP options                          id=dopt-03f8ac7a38357c912
INFO[0006] Associated DHCP options with VPC              dhcp options=dopt-03f8ac7a38357c912 vpc=vpc-057d76f2f156dd433
INFO[0007] Created subnet                                id=subnet-003f6f92d7a483fe1 name=agreeneguest-t82km-private-us-east-1a
INFO[0008] Created subnet                                id=subnet-09cc9b5be8eefe70a name=agreeneguest-t82km-public-us-east-1a
INFO[0008] Created internet gateway                      id=igw-0ca3ae0eef74aacb5
INFO[0008] Attached internet gateway to VPC              internet gateway=igw-0ca3ae0eef74aacb5 vpc=vpc-057d76f2f156dd433
INFO[0009] Created elastic IP for NAT gateway            id=eipalloc-02e0678016a8f105e
INFO[0010] Created NAT gateway                           id=nat-0bf7cd056d4370d18
INFO[0014] Created security group                        id=sg-01e85c450fe3b60aa name=agreeneguest-t82km-worker-sg
INFO[0015] Authorized ingress rules on security group    id=sg-01e85c450fe3b60aa
INFO[0015] Created route table                           id=rtb-0fcfefba83df8f61c name=agreeneguest-t82km-private-us-east-1a
INFO[0016] Created route to NAT gateway                  nat gateway=nat-0bf7cd056d4370d18 route table=rtb-0fcfefba83df8f61c
INFO[0016] Associated subnet with route table            route table=rtb-0fcfefba83df8f61c subnet=subnet-003f6f92d7a483fe1
INFO[0016] Created route table                           id=rtb-01c5c951fca87c8fc name=agreeneguest-t82km-public-us-east-1a
INFO[0017] Set main VPC route table                      route table=rtb-01c5c951fca87c8fc vpc=vpc-057d76f2f156dd433
INFO[0017] Created route to internet gateway             internet gateway=igw-0ca3ae0eef74aacb5 route table=rtb-01c5c951fca87c8fc
INFO[0017] Associated route table with subnet            route table=rtb-01c5c951fca87c8fc subnet=subnet-09cc9b5be8eefe70a
INFO[0018] Created s3 VPC endpoint                       id=vpce-021f20a2884b89502
INFO[0019] Found existing public zone                    id=Z3URY6TWQ91KVV name=devcluster.openshift.com
INFO[0019] Created private zone                          id=Z05411723CLEQ9YXE6ZSJ name=agreeneguest.devcluster.openshift.com
INFO[0019] Detected Issuer URL                           issuer="https://oidc-agreeneguest-t82km.apps.ci-ln-vxyp1ck-72292.origin-ci-int-gce.dev.rhcloud.com"
INFO[0020] OIDC CA thumbprint discovered                 thumbprint=795ffce649c3a385205759bf91bbceed5f64655a
INFO[0020] Created OIDC provider                         provider="arn:aws:iam::269733383066:oidc-provider/oidc-agreeneguest-t82km.apps.ci-ln-vxyp1ck-72292.origin-ci-int-gce.dev.rhcloud.com"
INFO[0020] Created role                                  name=agreeneguest-t82km-openshift-ingress
INFO[0021] Created role policy                           name=agreeneguest-t82km-openshift-ingress
INFO[0021] Created role                                  name=agreeneguest-t82km-openshift-image-registry
INFO[0021] Created role policy                           name=agreeneguest-t82km-openshift-image-registry
INFO[0021] Created role                                  name=agreeneguest-t82km-aws-ebs-csi-driver-operator
INFO[0022] Created role policy                           name=agreeneguest-t82km-aws-ebs-csi-driver-operator
INFO[0022] Created role                                  name=agreeneguest-t82km-worker-role
INFO[0022] Created instance profile                      name=agreeneguest-t82km-worker
INFO[0022] Added role to instance profile                profile=agreeneguest-t82km-worker role=agreeneguest-t82km-worker-role
INFO[0022] Created role policy                           name=agreeneguest-t82km-worker-policy
INFO[0022] Created IAM profile                           name=agreeneguest-t82km-worker region=us-east-1
INFO[0023] Created user                                  user=agreeneguest-t82km-cloud-controller
INFO[0023] Created user policy                           user=agreeneguest-t82km-cloud-controller
INFO[0023] Created access key                            user=agreeneguest-t82km-cloud-controller
INFO[0023] Created user                                  user=agreeneguest-t82km-node-pool
INFO[0023] Created user policy                           user=agreeneguest-t82km-node-pool
INFO[0023] Created access key                            user=agreeneguest-t82km-node-pool
INFO[0024] Applied Kube resource                         kind=Namespace name=clusters namespace=
INFO[0024] Applied Kube resource                         kind=Secret name=agreeneguest-pull-secret namespace=clusters
INFO[0024] Applied Kube resource                         kind=Secret name=agreeneguest-cloud-ctrl-creds namespace=clusters
INFO[0024] Applied Kube resource                         kind=Secret name=agreeneguest-node-mgmt-creds namespace=clusters
INFO[0024] Applied Kube resource                         kind=HostedCluster name=agreeneguest namespace=clusters
INFO[0024] Applied Kube resource                         kind=NodePool name=agreeneguest namespace=clusters

$ k get hostedclusters -n clusters
NAME           VERSION   KUBECONFIG                      PROGRESS   AVAILABLE   REASON
agreeneguest             agreeneguest-admin-kubeconfig   Partial    True        HostedClusterAsExpected


$ k get pods -n clusters-agreeneguest
NAME                                              READY   STATUS      RESTARTS      AGE
capa-controller-manager-95cf86bdf-qn7rb           1/1     Running     2 (12m ago)   15m
catalog-operator-66d967cbd8-bchcj                 2/2     Running     0             14m
certified-operators-catalog-77b4585ffb-kcsr9      1/1     Running     0             14m
cluster-api-c9c44699-pbl4c                        1/1     Running     0             15m
cluster-autoscaler-7698c4b9b7-qp68c               1/1     Running     0             14m
cluster-policy-controller-958fbb8c9-8qvxt         1/1     Running     0             14m
cluster-version-operator-86c75f88cc-7zpxl         1/1     Running     0             14m
community-operators-catalog-74c7b599d4-fxwfc      1/1     Running     0             14m
control-plane-operator-6587d9c698-hsjh7           1/1     Running     0             15m
etcd-0                                            1/1     Running     0             14m
hosted-cluster-config-operator-57fbfc4658-59n9q   1/1     Running     0             14m
ignition-server-86b89b6ff4-6vkwk                  1/1     Running     0             15m
konnectivity-agent-85d768878b-kqlbh               1/1     Running     0             14m
konnectivity-server-788749c6d6-9qg5t              1/1     Running     0             14m
kube-apiserver-8549d4d5bd-77wcc                   2/2     Running     0             14m
kube-controller-manager-74955fd9b8-ppq7j          1/1     Running     0             6m21s
kube-scheduler-565c955cc7-d75h7                   1/1     Running     0             14m
machine-approver-5d99c49d44-279ts                 1/1     Running     0             14m
manifests-bootstrapper--1-jwvlf                   0/1     Completed   4             14m
oauth-openshift-fb88dfcb4-x5bjt                   1/1     Running     0             11m
olm-collect-profiles-27266370--1-2sr7h            0/1     Error       0             7m19s
olm-collect-profiles-27266370--1-5htgq            0/1     Error       0             6m30s
olm-collect-profiles-27266370--1-bxrzl            0/1     Error       0             6m50s
olm-collect-profiles-27266370--1-hbl5k            0/1     Error       0             16s
olm-collect-profiles-27266370--1-lzw5d            0/1     Error       0             3m10s
olm-collect-profiles-27266370--1-qg4rj            0/1     Error       0             7m
olm-collect-profiles-27266370--1-z6szg            0/1     Error       0             5m50s
olm-operator-5f5bf87759-ddgqg                     2/2     Running     0             14m
openshift-apiserver-559d6dcb94-7wtxz              2/2     Running     0             6m21s
openshift-controller-manager-7cbdf56f54-g2pr2     1/1     Running     0             14m
openshift-oauth-apiserver-67b68f9cfb-7zlx9        1/1     Running     0             14m
packageserver-6cb5cc97f6-rsfjs                    2/2     Running     1 (11m ago)   14m
redhat-marketplace-catalog-5f66d74764-vhxgr       1/1     Running     0             14m
redhat-operators-catalog-7c67b89867-fv9cz         1/1     Running     0             14m
```

Comment 4 Cesar Wong 2021-11-04 15:19:10 UTC
To test the code from the PR, please build the image with the code from the PR:

make build
make RUNTIME=podman IMG=quay.io/yourname/hypershift:bz-2015793 docker-build docker-push

Then install hypershift with the created image:
./bin/hypershift install --hypershift-image quay.io/yourname/hypershift:bz-2015793 

Then create a cluster as usual

Comment 6 Jian Zhang 2021-11-22 06:13:26 UTC
Hi Alex and Cesar,

Thanks for your suggestion! Test it with the merged PR, as follows,

1. Create OCP 4.10.
[cloud-user@preserve-olm-env hypershift]$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2021-11-21-005535   True        False         48m     Cluster version is 4.10.0-0.nightly-2021-11-21-005535

2. Install the `hypershift` binary

[cloud-user@preserve-olm-env hypershift]$ make build
CGO_ENABLED=0 GO111MODULE=on GOFLAGS=-mod=vendor go build -gcflags=all='-N -l' -o bin/ignition-server ./ignition-server
...
[cloud-user@preserve-olm-env hypershift]$ ls -l ./bin/hypershift
-rwxrwxr-x. 1 cloud-user cloud-user 82164765 Nov 21 22:06 ./bin/hypershift

3. Create a hosted cluster
[cloud-user@preserve-olm-env hypershift]$ ./bin/hypershift install --oidc-storage-provider-s3-bucket-name="whuaws" --oidc-storage-provider-s3-region="us-east-2" --oidc-storage-provider-s3-credentials=aws/credentials
applied CustomResourceDefinition /clusterresourcesetbindings.addons.cluster.x-k8s.io
...

[cloud-user@preserve-olm-env hypershift]$ ./bin/hypershift create cluster aws --pull-secret .dockerconfigjson --aws-creds ./aws/credentials --name example2 --base-domain qe.devcluster.openshift.com
INFO[0002] Creating infrastructure                       id=example2-vds88
INFO[0002] Using zone                                    zone=us-east-1a
INFO[0002] Created VPC                                   id=vpc-043aabbd517173e3d
...

[cloud-user@preserve-olm-env jian]$ oc get hostedclusters -n clusters
NAME       VERSION   KUBECONFIG                  PROGRESS   AVAILABLE   REASON
example2             example2-admin-kubeconfig   Partial    True        HostedClusterAsExpected

4, check if the OLM's collect profiles job pods running on the clusters-example2 project. Looks good to me.
[cloud-user@preserve-olm-env jian]$ oc get job -n clusters-example2
NAME                            COMPLETIONS   DURATION   AGE
manifests-bootstrapper          1/1           3m15s      115m
olm-collect-profiles-27292650   1/1           5s         32m
olm-collect-profiles-27292665   1/1           6s         17m
olm-collect-profiles-27292680   1/1           8s         2m26s

[cloud-user@preserve-olm-env jian]$ oc get pods -n clusters-example2 |grep olm
olm-collect-profiles-27292650--1-j9nbs            0/1     Completed   0          31m
olm-collect-profiles-27292665--1-69g6c            0/1     Completed   0          16m
olm-collect-profiles-27292680--1-ptpmw            0/1     Completed   0          75s
olm-operator-7b585b5d6f-rxjvq                     2/2     Running     0          114m

5, Create the kubeconfig of this cluster, and login it.
[cloud-user@preserve-olm-env hypershift]$ ./bin/hypershift create kubeconfig
2021/11/22 01:01:26 selected 1 of 1 hostedclusters for the kubeconfig
2021/11/22 01:01:26 adding clusters/example2 to kubeconfig
2021/11/22 01:01:26 added clusters-example2 to kubeconfig
...

6, Check the OLM pods if running on the control plane nodes. No job found, LGTM.
[cloud-user@preserve-olm-env hypershift]$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          116m    Unable to apply 4.9.8: some cluster operators have not yet rolled out
[cloud-user@preserve-olm-env hypershift]$ oc get pods -n openshift-operator-lifecycle-manager 
No resources found in openshift-operator-lifecycle-manager namespace.
[cloud-user@preserve-olm-env hypershift]$ oc get job -n openshift-operator-lifecycle-manager 
No resources found in openshift-operator-lifecycle-manager namespace.

Verify it.

Comment 9 errata-xmlrpc 2022-03-10 16:20:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.