Created attachment 1534301 [details] openshift installer log Description of problem: Openshift installer fails to create Load Balancers with type 'network' Version-Release number of the following components: 0.12 installer How reproducible: Steps to Reproduce: 1.Run ./openshift-install create install-config --dir ocp4_$(date +%Y%m%d) --log-level debug 2. ./openshift-install create --dir ./ocp4_20190212/ cluster --log-level debug 3. Actual results: Please include the entire output from the last TASK line through the end of output if an error is generated DEBUG aws_route53_record.etcd_cluster: Creation complete after 52s (ID: Z1EVXLZ14W24D4__etcd-server-ssl._tcp.unknown_SRV) ERROR ERROR Error: Error applying plan: ERROR ERROR 3 errors occurred: ERROR * module.vpc.aws_lb.api_external: 1 error occurred: ERROR * aws_lb.api_external: Error creating Application Load Balancer: AvailabilityZoneNotSupported: Load balancers with type 'network' are not supported in sa-east-1b ERROR status code: 400, request id: b8e1f404-2f47-11e9-aabe-2dc4890b1bd6 ERROR ERROR ERROR * module.vpc.aws_lb.api_internal: 1 error occurred: ERROR * aws_lb.api_internal: Error creating Application Load Balancer: AvailabilityZoneNotSupported: Load balancers with type 'network' are not supported in sa-east-1b ERROR status code: 400, request id: b9eb74d7-2f47-11e9-9aea-49efdc2c3cf6 ERROR ERROR ERROR * module.vpc.aws_nat_gateway.nat_gw[1]: 1 error occurred: ERROR * aws_nat_gateway.nat_gw.1: Error creating NAT Gateway: NotAvailableInZone: Nat Gateway is not available in this availability zone ERROR status code: 400, request id: d1c3163c-0fe6-44d9-893b-f567ae3425b5 ERROR ERROR ERROR ERROR ERROR ERROR Terraform does not automatically rollback in the face of errors. ERROR Instead, your Terraform state file has been partially updated with ERROR any resources that successfully completed. Please address the error ERROR above and apply again to incrementally change your infrastructure. ERROR ERROR FATAL failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply using Terraform Expected results: Openshift cluster should be up and running in sa-east-1 region Additional info: Please attach logs from ansible-playbook with the -vvv flag
I wasn't able to reproduce this. It's especially strange that AWS claims it doesn't support NLBs and NAT gateways... I don't believe that claim. Could you try this again?
> ERROR * aws_lb.api_external: Error creating Application Load Balancer: AvailabilityZoneNotSupported: Load balancers with type 'network' are not supported in sa-east-1b I thought there was an off chance that this was some AWS region-specific limitation, but checking with the openshift-dev account: $ aws --region sa-east-1 ec2 create-vpc --cidr-block 10.0.0.0/16 { "Vpc": { ... "VpcId": "vpc-02085a93004fc0a14", ... } } $ aws --region sa-east-1 ec2 create-subnet --vpc-id vpc-02085a93004fc0a14 --cidr-block 10.0.0.0/20 { "Subnet": { ... "SubnetId": "subnet-0dd714725ed9bb31e", ... } } $ aws --region sa-east-1 ec2 create-internet-gateway { "InternetGateway": { "InternetGatewayId": "igw-02a52f9cbb14a5c28", "Attachments": [], "Tags": [] } } $ aws --region sa-east-1 ec2 attach-internet-gateway --internet-gateway-id igw-02a52f9cbb14a5c28 --vpc-id vpc-02085a93004fc0a14 aws --region sa-east-1 elbv2 create-load-balancer --subnets subnet-0dd714725ed9bb31e --type network --name testing { "LoadBalancers": [ { "Type": "network", ... } ] } So that seems fine.
Hi Alex/Trevor, Thanks for looking into it, I retried again and am able to reproduce the issue. I am suspecting this could be some permission issue with my account, Since am not able to get even a single OCP4.0 cluster up and running, I have tried in multiple regions and every regions gives different errors. I am attaching the install folder relating to this issue, Please let me know if you want to use my AWS credentials I can share it privately to reproduce the issue. Thanks, Dixit
Created attachment 1535908 [details] Install Directory Zip
QE can not reproduce this bug with v0.12.0 installer in sa-east-1. time="2019-02-21T04:46:38-05:00" level=debug msg="module.vpc.aws_lb.api_external: Creation complete after 2m41s (ID: arn:aws:elasticloadbalancing:sa-east-1:...er/net/qe-jialiu5-ext/5f74a270310ef759)" time="2019-02-21T04:46:39-05:00" level=debug msg="module.vpc.aws_lb.api_internal: Creation complete after 2m40s (ID: arn:aws:elasticloadbalancing:sa-east-1:...er/net/qe-jialiu5-int/01c47c913331a00f)" time="2019-02-21T04:45:42-05:00" level=debug msg="module.vpc.aws_nat_gateway.nat_gw[0]: Creation complete after 1m45s (ID: nat-08975382b55af09e6)" time="2019-02-21T04:45:42-05:00" level=debug msg="module.vpc.aws_nat_gateway.nat_gw[1]: Creation complete after 1m45s (ID: nat-0fdaa5fbb58451fee)"
(In reply to dgangaia from comment #3) > I am suspecting this could be some permission issue with my account... [1] added some install-host credential checking, which should make these sorts of issues more obvious. Can you reproduce the "Load balancers with type 'network' are not supported" error in v0.13.0? Or does the installer exit with an error about you not giving it sufficient permissions? [1]: https://github.com/openshift/installer/pull/1156 v0.13.0
I'm going to close this INSUFFICIENT_DATA, although I suspect #1156 fixed this for you (as discussed in comment 6).
Recent CI run hit this: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-api-actuator-pkg/43/pull-ci-openshift-cluster-api-actuator-pkg-master-e2e-aws-operator/87/build-log.txt https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-api-actuator-pkg/43/pull-ci-openshift-cluster-api-actuator-pkg-master-e2e-aws-operator/87/artifacts/e2e-aws-operator/installer/.openshift_install.log
From the logs linked above: time="2019-03-07T15:28:19Z" level=debug msg="2019-03-07T15:28:19.771Z [DEBUG] plugin.terraform-provider-aws: 2019/03/07 15:28:19 [DEBUG] [aws-sdk-go] DEBUG: Request elasticloadbalancing/CreateLoadBalancer Details:" time="2019-03-07T15:28:19Z" level=debug msg="2019-03-07T15:28:19.771Z [DEBUG] plugin.terraform-provider-aws: ---[ REQUEST POST-SIGN ]-----------------------------" time="2019-03-07T15:28:19Z" level=debug msg="2019-03-07T15:28:19.771Z [DEBUG] plugin.terraform-provider-aws: POST / HTTP/1.1" time="2019-03-07T15:28:19Z" level=debug msg="2019-03-07T15:28:19.771Z [DEBUG] plugin.terraform-provider-aws: Host: elasticloadbalancing.us-east-1.amazonaws.com" time="2019-03-07T15:28:19Z" level=debug msg="2019-03-07T15:28:19.771Z [DEBUG] plugin.terraform-provider-aws: User-Agent: aws-sdk-go/1.16.6 (go1.10.3; linux; amd64) APN/1.0 HashiCorp/1.0 Terraform/0.11.10" time="2019-03-07T15:28:19Z" level=debug msg="2019-03-07T15:28:19.771Z [DEBUG] plugin.terraform-provider-aws: Content-Length: 611" time="2019-03-07T15:28:19Z" level=debug msg="2019-03-07T15:28:19.771Z [DEBUG] plugin.terraform-provider-aws: Authorization: AWS4-HMAC-SHA256 Credential=.../us-east-1/elasticloadbalancing/aws4_request, SignedHeaders=content-length;content-type;host;x-amz-date, Signature=..." time="2019-03-07T15:28:19Z" level=debug msg="2019-03-07T15:28:19.771Z [DEBUG] plugin.terraform-provider-aws: Content-Type: application/x-www-form-urlencoded; charset=utf-8" time="2019-03-07T15:28:19Z" level=debug msg="2019-03-07T15:28:19.771Z [DEBUG] plugin.terraform-provider-aws: X-Amz-Date: 20190307T152819Z" time="2019-03-07T15:28:19Z" level=debug msg="2019-03-07T15:28:19.771Z [DEBUG] plugin.terraform-provider-aws: Accept-Encoding: gzip" time="2019-03-07T15:28:19Z" level=debug msg="2019-03-07T15:28:19.771Z [DEBUG] plugin.terraform-provider-aws: " time="2019-03-07T15:28:19Z" level=debug msg="2019-03-07T15:28:19.771Z [DEBUG] plugin.terraform-provider-aws: Action=CreateLoadBalancer&Name=ci-op-pps6jwwb-79b09-lpxjh-int&Scheme=internal&Subnets.member.1=subnet-03f6b4ebd82744560&Subnets.member.2=subnet-0b83c28a203763734&Subnets.member.3=subnet-0f0380dc1f05a720b&Subnets.member.4=subnet-0838522a91264d191&Subnets.member.5=subnet-0\ 075056025b73376e&Subnets.member.6=subnet-09e9dd996168a6d4a&Tags.member.1.Key=Name&Tags.member.1.Value=ci-op-pps6jwwb-79b09-lpxjh-int&Tags.member.2.Key=expirationDate&Tags.member.2.Value=2019-03-07T19%3A24%2B0000Tags.member.3.Key=kubernetes.io%2Fcluster%2Fci-op-pps6jwwb-79b09-lpxjh&Tags.member.3.Value=owned&Type=network&Version=2015-12-01" ... time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: 2019/03/07 15:28:20 [DEBUG] [aws-sdk-go] DEBUG: Response elasticloadbalancing/CreateLoadBalancer Details:" time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: ---[ RESPONSE ]--------------------------------------" time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: HTTP/1.1 400 Bad Request" time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: Connection: close" time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: Content-Length: 342" time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: Content-Type: text/xml" time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: Date: Thu, 07 Mar 2019 15:28:19 GMT" time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: X-Amzn-Requestid: a309b561-40ed-11e9-9c06-a3fa2a5e26a1" time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: " time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: " time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: -----------------------------------------------------" time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: 2019/03/07 15:28:20 [DEBUG] [aws-sdk-go] <ErrorResponse xmlns=\"http://elasticloadbalancing.amazonaws.com/doc/2015-12-01/\">" time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: <Error>" time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: <Type>Sender</Type>" time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: <Code>AvailabilityZoneNotSupported</Code>" time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: <Message>Load balancers with type 'network' are not supported in us-east-1e</Message>" time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: </Error>" time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: <RequestId>a309b561-40ed-11e9-9c06-a3fa2a5e26a1</RequestId>" time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: </ErrorResponse>" time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: 2019/03/07 15:28:20 [DEBUG] [aws-sdk-go] DEBUG: Validate Response elasticloadbalancing/CreateLoadBalancer failed, not retrying, error AvailabilityZoneNotSupported: Load balancers with type 'network' are not supported in us-east-1e" time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: \tstatus code: 400, request id: a309b561-40ed-11e9-9c06-a3fa2a5e26a1"
I'm still not sure what's going on here, but we have enough to go on without needing more information from dgangaia.
I followed up with AWS, and they pointed me at [1,2]. They also explained that this was likely a temporary issue with capacity in the target zone, and that each zone may consist of multiple datacenters and that the capacity issues might only affect a subset of those datacenters. They also said that "constrained" is independent of the zone being in "information" or "impaired" states [3]. So hitting this error, you could use the output Terraform variables to try again to finish off creating cluster infrastructure. Or (with better support in our command line API), delete the broken cluster and try creating a new one. From a usability perspective, we may want to catch this error and translate to something a bit less opaque. [1]: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#availability-zones [2]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-availability-zones [3]: https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_AvailabilityZone.html
> I followed up with AWS, and they pointed me at [1,2]. They also explained that this was likely a temporary issue with capacity in the target zone, and that each zone may consist of multiple datacenters and that the capacity issues might only affect a subset of those datacenters. They also said that "constrained" is independent of the zone being in "information" or "impaired" states [3]. So hitting this error, you could use the output Terraform variables to try again to finish off creating cluster infrastructure. Or (with better support in our command line API), delete the broken cluster and try creating a new one. I agree with Trevor that transient bugs due to failures on AWS end are something we cannot avoid. Destroy and try again with Create fixes this problem when AWS is back up. So closing this bug.
I think this needs to be documented somewhere , if someone hits this bug atleast they know its not an issue on Openshift side and retrying may fix this.