Hide Forgot
Description of problem: Currently the AWS DNS Suffix (amazonaws.com) is hardcoded [1] into the go-aws-sdk that is being utilized in the AWS autoscaler feature for OCP 3.11. Version-Release number of selected component (if applicable): OCP 3.11 How reproducible: Any OCP disconnected deployment on an AWS cloud that is not hosted on amazonaws.com will fail when attempting to deploy with the auto-scaler Steps to Reproduce: 1. Configure vars to AWS auto-scaling 2. Deploy on normal AWS (commercial) results in successful deployment 3. Deploy same configuration on disconnected (non amazonaws.com) domain and deployer fails Actual results: Failed to start kubelet Expected results: Successful deployment Additional info: Before using the aws-go-sdk this was an easily overridden boto.cfg option using a custom endpoints.json file. Now the domain is hardcoded into the aws-go-sdk. [1] https://github.com/aws/aws-sdk-go/blob/master/aws/endpoints/defaults.go#L211 Description of problem: Version-Release number of the following components: rpm -q openshift-ansible rpm -q ansible ansible --version How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Please include the entire output from the last TASK line through the end of output if an error is generated Expected results: Additional info: Please attach logs from ansible-playbook with the -vvv flag
Unable to add logs etc, this is an air-gapped environment.
Folks, this is a bit bigger than initially reported. The kubelet is calling out to the statically defined AWS endpoint causing a deployment failure for any non-public AWS Cloud OpenShift 3.11 deployments.
There are a few differences in AWS air-gapped regions from the public regions: https://digitalageexperts.com/how-are-aws-private-air-gapped-regions-different/ We have identified the spot in the cluster-autoscaler where additional options will be needed to allow configuring for private regions: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/aws_manager.go#L97 In addition, there are multiple spots in upstream Kubernetes cloudprovider where these options would be needed: https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/aws.go#L645-L745 The AWS SDK provides facilities for using custom CAs and endpoints: https://docs.aws.amazon.com/sdk-for-go/api/aws/endpoints/ + environment variable AWS_CA_BUNDLE These mechanisms need leveraged. We will raise the upstream issues needed next.
Upstream issue for the kubernetes/kubernetes https://github.com/kubernetes/kubernetes/issues/70588. It's a quick summary of the current issue. I will extend the description after more detailed analysis of the underlying code.
There is also a spot in the AWS service broker: https://github.com/awslabs/aws-servicebroker/blob/master/pkg/broker/aws_sdk.go#L24-L38
Here are the specifics on the custom CA in the AWS SDK for Go: https://github.com/aws/aws-sdk-go/blob/master/aws/session/session.go#L222-L225
Specifically affected in AWS Govcloud.
Done: (1.14) Upstream Kubernetes PR: https://github.com/kubernetes/kubernetes/pull/72245 In-flight: (1.15) Upstream autoscaler PR: https://github.com/kubernetes/autoscaler/pull/1717 (3.11) Origin PR: https://github.com/openshift/origin/pull/22138 Todo: (3.11) Ansible support for AWS_CA_BUNDLE environment variable: https://github.com/openshift/openshift-ansible (3.11) Origin autoscaler PR: https://github.com/openshift/kubernetes-autoscaler
Ansible PR now in-flight: https://github.com/openshift/openshift-ansible/pull/11277/
Update: Done: (1.14) Upstream Kubernetes PR: https://github.com/kubernetes/kubernetes/pull/72245 (3.11) Origin PR: https://github.com/openshift/origin/pull/22138 (3.11) Ansible support for AWS_CA_BUNDLE environment variable: https://github.com/openshift/openshift-ansible/pull/11277 In-flight: (1.15) Upstream autoscaler PR: https://github.com/kubernetes/autoscaler/pull/1717 Todo: (3.11) Origin autoscaler PR: https://github.com/openshift/kubernetes-autoscaler
(In reply to Stephen Cuppett from comment #23) > Update: > > Done: > > (1.14) Upstream Kubernetes PR: > https://github.com/kubernetes/kubernetes/pull/72245 > (3.11) Origin PR: https://github.com/openshift/origin/pull/22138 > (3.11) Ansible support for AWS_CA_BUNDLE environment variable: > https://github.com/openshift/openshift-ansible/pull/11277 > > In-flight: > > (1.15) Upstream autoscaler PR: > https://github.com/kubernetes/autoscaler/pull/1717 > > Todo: > > (3.11) Origin autoscaler PR: > https://github.com/openshift/kubernetes-autoscaler Looks like the reviewer didn't like the code Qual and closed it unmerged.
(1.15) Upstream autoscaler PR: https://github.com/kubernetes/autoscaler/pull/1745
Update: Done: (1.14) Upstream Kubernetes PR: https://github.com/kubernetes/kubernetes/pull/72245 (3.11) Origin PR: https://github.com/openshift/origin/pull/22138 (3.11) Ansible support for AWS_CA_BUNDLE environment variable: https://github.com/openshift/openshift-ansible/pull/11277 (1.15) Upstream autoscaler PR: https://github.com/kubernetes/autoscaler/pull/1745 (3.11) Origin autoscaler PR: https://github.com/openshift/kubernetes-autoscaler/pull/56
jhou: What info are we looking for here? I see I'm tagged on the needinfo flag, but not seeing what info is being requested.
Verified. We did a regression testing for the playbook and the autoscaler features. No problems found. "atomic-openshift version: v3.11.98"
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0636