Created attachment 1742642 [details] log-bundle-20201224121756.tar.gz Try to create a cluster on C2S, the bootstrap process failed. (Please see attachment log-bundle-20201224121756.tar.gz) Per story https://issues.redhat.com/browse/CORS-1584 Trust CA chain should be stored in openshift-config-managed/kube-cloud-config, but it does not exist: ``` ./oc get configmap -n openshift-config-managed kube-cloud-config Error from server (NotFound): configmaps "kube-cloud-config" not found ``` Instead, the trust ca was stored in: ``` ./oc get configmap -n openshift-config-managed kube-root-ca.crt -oyaml apiVersion: v1 data: ca.crt: | -----BEGIN CERTIFICATE----- <--snip--> ``` All pods are in pending status: ``` NAMESPACE NAME READY STATUS RESTARTS AGE openshift-apiserver-operator openshift-apiserver-operator-fcd64cdd5-zd89z 0/1 Pending 0 14h openshift-authentication-operator authentication-operator-547b4c57dd-zh727 0/1 Pending 0 14h openshift-cloud-credential-operator cloud-credential-operator-55ff8976c9-db5wj 0/2 Pending 0 14h openshift-cluster-machine-approver machine-approver-845d5b794b-lv87q 0/2 Pending 0 14h openshift-cluster-node-tuning-operator cluster-node-tuning-operator-bdb7f64c9-l84nl 0/1 Pending 0 14h openshift-cluster-storage-operator cluster-storage-operator-74d7f47bf6-wwkvw 0/1 Pending 0 14h openshift-cluster-storage-operator csi-snapshot-controller-operator-9fdd68d74-jr7w8 0/1 Pending 0 14h openshift-cluster-version cluster-version-operator-59b5c54d75-nbx47 0/1 Pending 0 14h openshift-config-operator openshift-config-operator-9c555879f-jt2vd 0/1 Pending 0 14h openshift-controller-manager-operator openshift-controller-manager-operator-6cbffdc798-twngw 0/1 Pending 0 14h openshift-dns-operator dns-operator-5fd98db9cd-wh68h 0/2 Pending 0 14h openshift-etcd-operator etcd-operator-756b494c88-kls64 0/1 Pending 0 14h openshift-image-registry cluster-image-registry-operator-5dbd8889fd-4d8cc 0/1 Pending 0 14h openshift-ingress-operator ingress-operator-7869bc8864-6vxnj 0/2 Pending 0 14h openshift-insights insights-operator-56797c4f9b-kff7n 0/1 Pending 0 14h openshift-kube-apiserver-operator kube-apiserver-operator-5c77c9c859-q6njj 0/1 Pending 0 14h openshift-kube-controller-manager-operator kube-controller-manager-operator-669859c4fd-5kqjh 0/1 Pending 0 14h openshift-kube-scheduler-operator openshift-kube-scheduler-operator-6fd6b9997-xz9s9 0/1 Pending 0 14h openshift-kube-storage-version-migrator-operator kube-storage-version-migrator-operator-7b778676c7-dq9cv 0/1 Pending 0 14h openshift-machine-api cluster-autoscaler-operator-5d48f5ff8f-xglb6 0/2 Pending 0 14h openshift-machine-api cluster-baremetal-operator-584b9999fc-6trt7 0/1 Pending 0 14h openshift-machine-api machine-api-operator-796744bff-6tcmp 0/2 Pending 0 14h openshift-machine-config-operator machine-config-operator-5c8b66cc4f-wf5f4 0/1 Pending 0 14h openshift-marketplace marketplace-operator-684f669d55-c7w72 0/1 Pending 0 14h openshift-monitoring cluster-monitoring-operator-5555868585-2gtdv 0/2 Pending 0 14h openshift-network-operator network-operator-8558975454-mgglx 0/1 Pending 0 14h openshift-operator-lifecycle-manager catalog-operator-7c9b666d58-7b6cz 0/1 Pending 0 14h openshift-operator-lifecycle-manager olm-operator-77748d5dcb-kj9md 0/1 Pending 0 14h openshift-service-ca-operator service-ca-operator-6fb466b7cf-rjvzm 0/1 Pending 0 14h ``` >> install-config.yaml: ``` apiVersion: v1 baseDomain: govcloudemu.devcluster.openshift.com compute: - architecture: amd64 hyperthreading: Enabled name: worker platform: {} replicas: 3 controlPlane: architecture: amd64 hyperthreading: Enabled name: master platform: {} replicas: 3 metadata: creationTimestamp: null name: yunjiang-m6 networking: clusterNetwork: - cidr: 10.128.0.0/14 hostPrefix: 23 machineNetwork: - cidr: 10.119.0.0/16 networkType: OpenShiftSDN serviceNetwork: - 172.30.0.0/16 platform: aws: region: us-iso-east-1 amiID: ami-0996bc3576195a2f0 subnets: - subnet-04018b54f216345c5 publish: Internal credentialsMode: Manual additionalTrustBundle: <SHIFT-CERT-CA and IMAGE-REGISTRY-CERT-CA> imageContentSources: - mirrors: - ip-10-119-0-121.ec2.internal:5000/ocp/release source: registry.svc.ci.openshift.org/ocp/release - mirrors: - ip-10-119-0-121.ec2.internal:5000/ocp/release source: quay.io/openshift-release-dev/ocp-v4.0-art-dev pullSecret: <HIDDEN> sshKey: <HIDDEN> ``` Version-Release number of the following components: 4.7.0-0.nightly-2020-12-20-031835 How reproducible: always Steps to Reproduce: 1. mirror OCP image to local image registry server 2. Create and edit install-config.yaml 3. Config CCO in manual mode [1] 4. (SHIFT console) Turn off `Allow Internet` 5. Create cluster Actual results: Bootstrap process failed. Expected results: Bootstrap process completed successfully. Additional info: [1] https://docs.openshift.com/container-platform/4.6/installing/installing_aws/manually-creating-iam.html#manually-create-iam_manually-creating-iam-aws
Found kubelet service error on one master machine: Dec 29 08:08:49 ip-10-119-1-234 hyperkube[1461]: W1229 08:08:49.648314 1461 plugins.go:105] WARNING: aws built-in cloud provider is now deprecated. The AWS provider is deprecated and will be removed in a future release Dec 29 08:08:49 ip-10-119-1-234 hyperkube[1461]: I1229 08:08:49.649512 1461 aws.go:1251] Building AWS cloudprovider Dec 29 08:08:49 ip-10-119-1-234 hyperkube[1461]: I1229 08:08:49.649588 1461 aws.go:1211] Zone not specified in configuration file; querying AWS metadata service Dec 29 08:10:19 ip-10-119-1-234 systemd[1]: kubelet.service: start operation timed out. Terminating. Dec 29 08:10:50 ip-10-119-1-234 hyperkube[1461]: F1229 08:10:50.027080 1461 server.go:269] failed to run Kubelet: could not init cloud provider "aws": error finding instance i-0ac535ea30c00904c: "error listing AWS instances: \"RequestError: send request failed\\ncaused by: Post \\\"https://ec2.us-east-1.amazonaws.com/\\\": dial tcp: i/o timeout\"" Dec 29 08:10:50 ip-10-119-1-234 hyperkube[1461]: goroutine 1 [running]: Dec 29 08:10:50 ip-10-119-1-234 hyperkube[1461]: k8s.io/kubernetes/vendor/k8s.io/klog/v2.stacks(0xc000012001, 0xc00059c300, 0x130, 0x2fe) Dec 29 08:10:50 ip-10-119-1-234 hyperkube[1461]: /builddir/build/BUILD/openshift-git-97012.0616638/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:1026 +0xb9 <--SNIP--> Dec 29 08:10:50 ip-10-119-1-234 hyperkube[1461]: created by internal/singleflight.(*Group).DoChan Dec 29 08:10:50 ip-10-119-1-234 hyperkube[1461]: /usr/lib/golang/src/internal/singleflight/singleflight.go:88 +0x2cc Dec 29 08:10:50 ip-10-119-1-234 systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a Dec 29 08:10:50 ip-10-119-1-234 systemd[1]: kubelet.service: Failed with result 'timeout'. Dec 29 08:10:50 ip-10-119-1-234 systemd[1]: Failed to start Kubernetes Kubelet. The kubelet log is attached, named m11_kubelet.service.log
Created attachment 1742888 [details] m11_kubelet.service.log
I have traced this down to a deficiency in the C2S simulator. Kubelet relies on the AWS metadata to determine the region that the instance is in. However, the C2S simulator does not adjust the instance metadata. Consequently, kubelet thinks that the instance is in us-east-1 instead of us-iso-east-1. In a real C2S environment, the instance metadata would indicate that the instance is in us-iso-east-1, and kubelet would its subsequent AWS calls to the correct us-iso-east-1 endpoints. I am working on seeing what can be done to unblock testing.
The workaround for the incorrect instance metadata is to add the following in the data of the manifests/cloud-provider-config.yaml manifest before creating the cluster. config: | [ServiceOverride "0"] Service = ec2 Region = us-east-1 URL = https://ec2.us-iso-east-1.c2s.ic.gov SigningRegion = us-iso-east-1 This will trick kubelet into using the us-iso-east-1 endpoint even though the instance metadata told it to use the us-east-1 endpoint. Again, this is only a workaround for the C2S simulated environment. In a real environment this is not necessary.
After getting past the kubelet issue, I found another issue with the machine-api-operator. I would like to track that in a separate bug (https://bugzilla.redhat.com/show_bug.cgi?id=1915114).
Thanks Matthew, I'll change setting in my environment.
@yunjiang Are you satisfied with the service endpoints being a resolution for this bug? Can we close this bug? Also, in addition to adding the override for the ec2 endpoint to the service endpoints, it is also necessary to add the elb endpoint to the service endpoints for load balancing. [ServiceOverride "0"] Service = ec2 Region = us-east-1 URL = https://ec2.us-iso-east-1.c2s.ic.gov SigningRegion = us-iso-east-1 [ServiceOverride "1"] Service = elasticloadbalancing Region = us-east-1 URL = https://elasticloadbalancing.us-iso-east-1.c2s.ic.gov SigningRegion = us-iso-east-1
PASS. verified on: 4.7.0-0.nightly-2021-01-19-051335 >> NOTE the trust-ca could not be found in kube-cloud-config, this issue will be tracked in Bug 1915500 >> Steps: 1. create manifests 2. inject cloud-provider-config per comment 6 and comment 12 : cat << EOF > manifests/cloud-provider-config.yaml apiVersion: v1 kind: ConfigMap metadata: name: cloud-provider-config namespace: openshift-config data: config: | [ServiceOverride "0"] Service = ec2 Region = us-east-1 URL = https://ec2.us-iso-east-1.c2s.ic.gov SigningRegion = us-iso-east-1 [ServiceOverride "1"] Service = elasticloadbalancing Region = us-east-1 URL = https://elasticloadbalancing.us-iso-east-1.c2s.ic.gov SigningRegion = us-iso-east-1 EOF 3. the kube-cloud-config is created ./oc get configmap -n openshift-config-managed kube-cloud-config -o yaml apiVersion: v1 data: cloud.conf: | [ServiceOverride "0"] Service = ec2 Region = us-east-1 URL = https://ec2.us-iso-east-1.c2s.ic.gov SigningRegion = us-iso-east-1 [ServiceOverride "1"] Service = elasticloadbalancing Region = us-east-1 URL = https://elasticloadbalancing.us-iso-east-1.c2s.ic.gov SigningRegion = us-iso-east-1 kind: ConfigMap metadata: creationTimestamp: "2021-01-19T09:01:43Z" managedFields: - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:data: .: {} f:cloud.conf: {} manager: cluster-config-operator operation: Update time: "2021-01-19T09:01:43Z" name: kube-cloud-config namespace: openshift-config-managed resourceVersion: "4666" selfLink: /api/v1/namespaces/openshift-config-managed/configmaps/kube-cloud-config uid: 1c2eeed9-7c17-46fb-af27-81e6a8275fc1
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633