Bug 1911257
| Summary: | [aws-c2s] failed to create cluster, kube-cloud-config was not created | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Yunfei Jiang <yunjiang> | ||||||
| Component: | Installer | Assignee: | Matthew Staebler <mstaeble> | ||||||
| Installer sub component: | openshift-installer | QA Contact: | Yunfei Jiang <yunjiang> | ||||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||||
| Severity: | urgent | ||||||||
| Priority: | urgent | CC: | awestbro, gpei, jialiu, mstaeble | ||||||
| Version: | 4.7 | Keywords: | TestBlocker | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | 4.7.0 | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | No Doc Update | |||||||
| Doc Text: |
This is a bug in new functionality added to 4.7.
|
Story Points: | --- | ||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2021-02-24 15:49:10 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Yunfei Jiang
2020-12-28 11:58:13 UTC
Found kubelet service error on one master machine: Dec 29 08:08:49 ip-10-119-1-234 hyperkube[1461]: W1229 08:08:49.648314 1461 plugins.go:105] WARNING: aws built-in cloud provider is now deprecated. The AWS provider is deprecated and will be removed in a future release Dec 29 08:08:49 ip-10-119-1-234 hyperkube[1461]: I1229 08:08:49.649512 1461 aws.go:1251] Building AWS cloudprovider Dec 29 08:08:49 ip-10-119-1-234 hyperkube[1461]: I1229 08:08:49.649588 1461 aws.go:1211] Zone not specified in configuration file; querying AWS metadata service Dec 29 08:10:19 ip-10-119-1-234 systemd[1]: kubelet.service: start operation timed out. Terminating. Dec 29 08:10:50 ip-10-119-1-234 hyperkube[1461]: F1229 08:10:50.027080 1461 server.go:269] failed to run Kubelet: could not init cloud provider "aws": error finding instance i-0ac535ea30c00904c: "error listing AWS instances: \"RequestError: send request failed\\ncaused by: Post \\\"https://ec2.us-east-1.amazonaws.com/\\\": dial tcp: i/o timeout\"" Dec 29 08:10:50 ip-10-119-1-234 hyperkube[1461]: goroutine 1 [running]: Dec 29 08:10:50 ip-10-119-1-234 hyperkube[1461]: k8s.io/kubernetes/vendor/k8s.io/klog/v2.stacks(0xc000012001, 0xc00059c300, 0x130, 0x2fe) Dec 29 08:10:50 ip-10-119-1-234 hyperkube[1461]: /builddir/build/BUILD/openshift-git-97012.0616638/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:1026 +0xb9 <--SNIP--> Dec 29 08:10:50 ip-10-119-1-234 hyperkube[1461]: created by internal/singleflight.(*Group).DoChan Dec 29 08:10:50 ip-10-119-1-234 hyperkube[1461]: /usr/lib/golang/src/internal/singleflight/singleflight.go:88 +0x2cc Dec 29 08:10:50 ip-10-119-1-234 systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a Dec 29 08:10:50 ip-10-119-1-234 systemd[1]: kubelet.service: Failed with result 'timeout'. Dec 29 08:10:50 ip-10-119-1-234 systemd[1]: Failed to start Kubernetes Kubelet. The kubelet log is attached, named m11_kubelet.service.log Created attachment 1742888 [details]
m11_kubelet.service.log
I have traced this down to a deficiency in the C2S simulator. Kubelet relies on the AWS metadata to determine the region that the instance is in. However, the C2S simulator does not adjust the instance metadata. Consequently, kubelet thinks that the instance is in us-east-1 instead of us-iso-east-1. In a real C2S environment, the instance metadata would indicate that the instance is in us-iso-east-1, and kubelet would its subsequent AWS calls to the correct us-iso-east-1 endpoints. I am working on seeing what can be done to unblock testing. The workaround for the incorrect instance metadata is to add the following in the data of the manifests/cloud-provider-config.yaml manifest before creating the cluster.
config: |
[ServiceOverride "0"]
Service = ec2
Region = us-east-1
URL = https://ec2.us-iso-east-1.c2s.ic.gov
SigningRegion = us-iso-east-1
This will trick kubelet into using the us-iso-east-1 endpoint even though the instance metadata told it to use the us-east-1 endpoint. Again, this is only a workaround for the C2S simulated environment. In a real environment this is not necessary.
After getting past the kubelet issue, I found another issue with the machine-api-operator. I would like to track that in a separate bug (https://bugzilla.redhat.com/show_bug.cgi?id=1915114). Thanks Matthew, I'll change setting in my environment. @yunjiang Are you satisfied with the service endpoints being a resolution for this bug? Can we close this bug?
Also, in addition to adding the override for the ec2 endpoint to the service endpoints, it is also necessary to add the elb endpoint to the service endpoints for load balancing.
[ServiceOverride "0"]
Service = ec2
Region = us-east-1
URL = https://ec2.us-iso-east-1.c2s.ic.gov
SigningRegion = us-iso-east-1
[ServiceOverride "1"]
Service = elasticloadbalancing
Region = us-east-1
URL = https://elasticloadbalancing.us-iso-east-1.c2s.ic.gov
SigningRegion = us-iso-east-1
PASS. verified on: 4.7.0-0.nightly-2021-01-19-051335 >> NOTE the trust-ca could not be found in kube-cloud-config, this issue will be tracked in Bug 1915500 >> Steps: 1. create manifests 2. inject cloud-provider-config per comment 6 and comment 12 : cat << EOF > manifests/cloud-provider-config.yaml apiVersion: v1 kind: ConfigMap metadata: name: cloud-provider-config namespace: openshift-config data: config: | [ServiceOverride "0"] Service = ec2 Region = us-east-1 URL = https://ec2.us-iso-east-1.c2s.ic.gov SigningRegion = us-iso-east-1 [ServiceOverride "1"] Service = elasticloadbalancing Region = us-east-1 URL = https://elasticloadbalancing.us-iso-east-1.c2s.ic.gov SigningRegion = us-iso-east-1 EOF 3. the kube-cloud-config is created ./oc get configmap -n openshift-config-managed kube-cloud-config -o yaml apiVersion: v1 data: cloud.conf: | [ServiceOverride "0"] Service = ec2 Region = us-east-1 URL = https://ec2.us-iso-east-1.c2s.ic.gov SigningRegion = us-iso-east-1 [ServiceOverride "1"] Service = elasticloadbalancing Region = us-east-1 URL = https://elasticloadbalancing.us-iso-east-1.c2s.ic.gov SigningRegion = us-iso-east-1 kind: ConfigMap metadata: creationTimestamp: "2021-01-19T09:01:43Z" managedFields: - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:data: .: {} f:cloud.conf: {} manager: cluster-config-operator operation: Update time: "2021-01-19T09:01:43Z" name: kube-cloud-config namespace: openshift-config-managed resourceVersion: "4666" selfLink: /api/v1/namespaces/openshift-config-managed/configmaps/kube-cloud-config uid: 1c2eeed9-7c17-46fb-af27-81e6a8275fc1 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |