Bug 1676737 - Openshift Installer fails with " Load balancers with type 'network' are not supported "
Summary: Openshift Installer fails with " Load balancers with type 'network' are not s...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 4.1.0
Assignee: Abhinav Dahiya
QA Contact: Johnny Liu
URL:
Whiteboard:
Depends On:
Blocks: 1664187
TreeView+ depends on / blocked
 
Reported: 2019-02-13 04:47 UTC by dgangaia
Modified: 2019-03-21 22:29 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-03-21 22:12:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
openshift installer log (359.40 KB, text/plain)
2019-02-13 04:47 UTC, dgangaia
no flags Details
Install Directory Zip (573.49 KB, application/zip)
2019-02-18 12:10 UTC, dgangaia
no flags Details

Description dgangaia 2019-02-13 04:47:33 UTC
Created attachment 1534301 [details]
openshift installer log

Description of problem:
Openshift installer fails to create Load Balancers with type 'network'

Version-Release number of the following components:
0.12 installer

How reproducible:

Steps to Reproduce:
1.Run ./openshift-install create install-config --dir ocp4_$(date +%Y%m%d) --log-level debug
2. ./openshift-install create  --dir ./ocp4_20190212/ cluster --log-level debug
3.

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

DEBUG aws_route53_record.etcd_cluster: Creation complete after 52s (ID: Z1EVXLZ14W24D4__etcd-server-ssl._tcp.unknown_SRV) 
ERROR                                              
ERROR Error: Error applying plan:                  
ERROR                                              
ERROR 3 errors occurred:                           
ERROR 	* module.vpc.aws_lb.api_external: 1 error occurred: 
ERROR 	* aws_lb.api_external: Error creating Application Load Balancer: AvailabilityZoneNotSupported: Load balancers with type 'network' are not supported in sa-east-1b 
ERROR 	status code: 400, request id: b8e1f404-2f47-11e9-aabe-2dc4890b1bd6 
ERROR                                              
ERROR                                              
ERROR 	* module.vpc.aws_lb.api_internal: 1 error occurred: 
ERROR 	* aws_lb.api_internal: Error creating Application Load Balancer: AvailabilityZoneNotSupported: Load balancers with type 'network' are not supported in sa-east-1b 
ERROR 	status code: 400, request id: b9eb74d7-2f47-11e9-9aea-49efdc2c3cf6 
ERROR                                              
ERROR                                              
ERROR 	* module.vpc.aws_nat_gateway.nat_gw[1]: 1 error occurred: 
ERROR 	* aws_nat_gateway.nat_gw.1: Error creating NAT Gateway: NotAvailableInZone: Nat Gateway is not available in this availability zone 
ERROR 	status code: 400, request id: d1c3163c-0fe6-44d9-893b-f567ae3425b5 
ERROR                                              
ERROR                                              
ERROR                                              
ERROR                                              
ERROR                                              
ERROR Terraform does not automatically rollback in the face of errors. 
ERROR Instead, your Terraform state file has been partially updated with 
ERROR any resources that successfully completed. Please address the error 
ERROR above and apply again to incrementally change your infrastructure. 
ERROR                                              
ERROR                                              
FATAL failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply using Terraform 

Expected results:
Openshift cluster should be up and running in sa-east-1 region

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Alex Crawford 2019-02-14 01:06:34 UTC
I wasn't able to reproduce this. It's especially strange that AWS claims it doesn't support NLBs and NAT gateways... I don't believe that claim. Could you try this again?

Comment 2 W. Trevor King 2019-02-14 07:51:22 UTC
> ERROR 	* aws_lb.api_external: Error creating Application Load Balancer: AvailabilityZoneNotSupported: Load balancers with type 'network' are not supported in sa-east-1b 

I thought there was an off chance that this was some AWS region-specific limitation, but checking with the openshift-dev account:

$ aws --region sa-east-1 ec2 create-vpc --cidr-block 10.0.0.0/16
{
    "Vpc": {
        ...
        "VpcId": "vpc-02085a93004fc0a14",
        ...
    }
}
$ aws --region sa-east-1 ec2 create-subnet --vpc-id vpc-02085a93004fc0a14 --cidr-block 10.0.0.0/20
{
    "Subnet": {
        ...
        "SubnetId": "subnet-0dd714725ed9bb31e",
        ...
    }
}
$ aws --region sa-east-1 ec2 create-internet-gateway
{
    "InternetGateway": {
        "InternetGatewayId": "igw-02a52f9cbb14a5c28",
        "Attachments": [],
        "Tags": []
    }
}
$ aws --region sa-east-1 ec2 attach-internet-gateway --internet-gateway-id igw-02a52f9cbb14a5c28 --vpc-id vpc-02085a93004fc0a14
 aws --region sa-east-1 elbv2 create-load-balancer --subnets subnet-0dd714725ed9bb31e --type network --name testing
{
    "LoadBalancers": [
        {
            "Type": "network",
            ...
        }
    ]
}

So that seems fine.

Comment 3 dgangaia 2019-02-18 12:09:29 UTC
Hi Alex/Trevor,

 Thanks for looking into it, I retried again and am able to reproduce the issue.

 I am suspecting this could be some permission issue with my account, Since am not able to get even a single OCP4.0 cluster up and running, I have tried in multiple regions and every regions gives different errors.

 I am attaching the install folder relating to this issue, Please let me know if you want to use my AWS credentials I can share it privately to reproduce the issue.


Thanks,
Dixit

Comment 4 dgangaia 2019-02-18 12:10:18 UTC
Created attachment 1535908 [details]
Install Directory Zip

Comment 5 Johnny Liu 2019-02-21 10:26:36 UTC
QE can not reproduce this bug with v0.12.0 installer in sa-east-1.

time="2019-02-21T04:46:38-05:00" level=debug msg="module.vpc.aws_lb.api_external: Creation complete after 2m41s (ID: arn:aws:elasticloadbalancing:sa-east-1:...er/net/qe-jialiu5-ext/5f74a270310ef759)"
time="2019-02-21T04:46:39-05:00" level=debug msg="module.vpc.aws_lb.api_internal: Creation complete after 2m40s (ID: arn:aws:elasticloadbalancing:sa-east-1:...er/net/qe-jialiu5-int/01c47c913331a00f)"
time="2019-02-21T04:45:42-05:00" level=debug msg="module.vpc.aws_nat_gateway.nat_gw[0]: Creation complete after 1m45s (ID: nat-08975382b55af09e6)"
time="2019-02-21T04:45:42-05:00" level=debug msg="module.vpc.aws_nat_gateway.nat_gw[1]: Creation complete after 1m45s (ID: nat-0fdaa5fbb58451fee)"

Comment 6 W. Trevor King 2019-02-27 06:08:31 UTC
(In reply to dgangaia from comment #3)
>  I am suspecting this could be some permission issue with my account...

[1] added some install-host credential checking, which should make these sorts of issues more obvious.  Can you reproduce the "Load balancers with type 'network' are not supported" error in v0.13.0?  Or does the installer exit with an error about you not giving it sufficient permissions?

[1]: https://github.com/openshift/installer/pull/1156 v0.13.0

Comment 7 W. Trevor King 2019-03-05 19:36:04 UTC
I'm going to close this INSUFFICIENT_DATA, although I suspect #1156 fixed this for you (as discussed in comment 6).

Comment 9 W. Trevor King 2019-03-07 19:40:10 UTC
From the logs linked above:

time="2019-03-07T15:28:19Z" level=debug msg="2019-03-07T15:28:19.771Z [DEBUG] plugin.terraform-provider-aws: 2019/03/07 15:28:19 [DEBUG] [aws-sdk-go] DEBUG: Request elasticloadbalancing/CreateLoadBalancer Details:"
time="2019-03-07T15:28:19Z" level=debug msg="2019-03-07T15:28:19.771Z [DEBUG] plugin.terraform-provider-aws: ---[ REQUEST POST-SIGN ]-----------------------------"
time="2019-03-07T15:28:19Z" level=debug msg="2019-03-07T15:28:19.771Z [DEBUG] plugin.terraform-provider-aws: POST / HTTP/1.1"
time="2019-03-07T15:28:19Z" level=debug msg="2019-03-07T15:28:19.771Z [DEBUG] plugin.terraform-provider-aws: Host: elasticloadbalancing.us-east-1.amazonaws.com"
time="2019-03-07T15:28:19Z" level=debug msg="2019-03-07T15:28:19.771Z [DEBUG] plugin.terraform-provider-aws: User-Agent: aws-sdk-go/1.16.6 (go1.10.3; linux; amd64) APN/1.0 HashiCorp/1.0 Terraform/0.11.10"
time="2019-03-07T15:28:19Z" level=debug msg="2019-03-07T15:28:19.771Z [DEBUG] plugin.terraform-provider-aws: Content-Length: 611"
time="2019-03-07T15:28:19Z" level=debug msg="2019-03-07T15:28:19.771Z [DEBUG] plugin.terraform-provider-aws: Authorization: AWS4-HMAC-SHA256 Credential=.../us-east-1/elasticloadbalancing/aws4_request, SignedHeaders=content-length;content-type;host;x-amz-date, Signature=..."
time="2019-03-07T15:28:19Z" level=debug msg="2019-03-07T15:28:19.771Z [DEBUG] plugin.terraform-provider-aws: Content-Type: application/x-www-form-urlencoded; charset=utf-8"
time="2019-03-07T15:28:19Z" level=debug msg="2019-03-07T15:28:19.771Z [DEBUG] plugin.terraform-provider-aws: X-Amz-Date: 20190307T152819Z"
time="2019-03-07T15:28:19Z" level=debug msg="2019-03-07T15:28:19.771Z [DEBUG] plugin.terraform-provider-aws: Accept-Encoding: gzip"
time="2019-03-07T15:28:19Z" level=debug msg="2019-03-07T15:28:19.771Z [DEBUG] plugin.terraform-provider-aws: "
time="2019-03-07T15:28:19Z" level=debug msg="2019-03-07T15:28:19.771Z [DEBUG] plugin.terraform-provider-aws: Action=CreateLoadBalancer&Name=ci-op-pps6jwwb-79b09-lpxjh-int&Scheme=internal&Subnets.member.1=subnet-03f6b4ebd82744560&Subnets.member.2=subnet-0b83c28a203763734&Subnets.member.3=subnet-0f0380dc1f05a720b&Subnets.member.4=subnet-0838522a91264d191&Subnets.member.5=subnet-0\
075056025b73376e&Subnets.member.6=subnet-09e9dd996168a6d4a&Tags.member.1.Key=Name&Tags.member.1.Value=ci-op-pps6jwwb-79b09-lpxjh-int&Tags.member.2.Key=expirationDate&Tags.member.2.Value=2019-03-07T19%3A24%2B0000Tags.member.3.Key=kubernetes.io%2Fcluster%2Fci-op-pps6jwwb-79b09-lpxjh&Tags.member.3.Value=owned&Type=network&Version=2015-12-01"
...
time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: 2019/03/07 15:28:20 [DEBUG] [aws-sdk-go] DEBUG: Response elasticloadbalancing/CreateLoadBalancer Details:"
time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: ---[ RESPONSE ]--------------------------------------"
time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: HTTP/1.1 400 Bad Request"
time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: Connection: close"
time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: Content-Length: 342"
time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: Content-Type: text/xml"
time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: Date: Thu, 07 Mar 2019 15:28:19 GMT"
time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: X-Amzn-Requestid: a309b561-40ed-11e9-9c06-a3fa2a5e26a1"
time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: "
time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: "
time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: -----------------------------------------------------"
time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: 2019/03/07 15:28:20 [DEBUG] [aws-sdk-go] <ErrorResponse xmlns=\"http://elasticloadbalancing.amazonaws.com/doc/2015-12-01/\">"
time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws:   <Error>"
time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws:     <Type>Sender</Type>"
time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws:     <Code>AvailabilityZoneNotSupported</Code>"
time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws:     <Message>Load balancers with type 'network' are not supported in us-east-1e</Message>"
time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws:   </Error>"
time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws:   <RequestId>a309b561-40ed-11e9-9c06-a3fa2a5e26a1</RequestId>"
time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: </ErrorResponse>"
time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: 2019/03/07 15:28:20 [DEBUG] [aws-sdk-go] DEBUG: Validate Response elasticloadbalancing/CreateLoadBalancer failed, not retrying, error AvailabilityZoneNotSupported: Load balancers with type 'network' are not supported in us-east-1e"
time="2019-03-07T15:28:20Z" level=debug msg="2019-03-07T15:28:20.588Z [DEBUG] plugin.terraform-provider-aws: \tstatus code: 400, request id: a309b561-40ed-11e9-9c06-a3fa2a5e26a1"

Comment 11 W. Trevor King 2019-03-07 20:52:27 UTC
I'm still not sure what's going on here, but we have enough to go on without needing more information from dgangaia.

Comment 14 W. Trevor King 2019-03-15 00:22:24 UTC
I followed up with AWS, and they pointed me at [1,2].  They also explained that this was likely a temporary issue with capacity in the target zone, and that each zone may consist of multiple datacenters and that the capacity issues might only affect a subset of those datacenters.  They also said that "constrained" is independent of the zone being in "information" or "impaired" states [3].  So hitting this error, you could use the output Terraform variables to try again to finish off creating cluster infrastructure.  Or (with better support in our command line API), delete the broken cluster and try creating a new one.  From a usability perspective, we may want to catch this error and translate to something a bit less opaque.

[1]: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#availability-zones
[2]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-availability-zones
[3]: https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_AvailabilityZone.html

Comment 15 Abhinav Dahiya 2019-03-21 22:12:42 UTC
> I followed up with AWS, and they pointed me at [1,2].  They also explained that this was likely a temporary issue with capacity in the target zone, and that each zone may consist of multiple datacenters and that the capacity issues might only affect a subset of those datacenters.  They also said that "constrained" is independent of the zone being in "information" or "impaired" states [3].  So hitting this error, you could use the output Terraform variables to try again to finish off creating cluster infrastructure.  Or (with better support in our command line API), delete the broken cluster and try creating a new one. 

I agree with Trevor that transient bugs due to failures on AWS end are something we cannot avoid. Destroy and try again with Create fixes this problem when AWS is back up.

So closing this bug.

Comment 16 dgangaia 2019-03-21 22:29:50 UTC
I think this needs to be documented somewhere , if someone hits this bug atleast they know its not an issue on Openshift side and retrying may fix this.


Note You need to log in before you can comment on or make changes to this bug.