Bug 1613546 - Azure load balancer cannot be created
Summary: Azure load balancer cannot be created
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.11.0
Assignee: Ryan Cook
QA Contact: Wenkai Shi
URL:
Whiteboard:
Depends On:
Blocks: 1615903
TreeView+ depends on / blocked
 
Reported: 2018-08-07 20:02 UTC by Ryan Cook
Modified: 2018-10-11 07:24 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Without the fully defined azure.conf file when a load balancer service was requested through OpenShift the load balancer would never fully register and provide the external IP address. Now, the azure.conf with all the required variables allows the load balancer to be deployed and provide the external ip address.
Clone Of:
: 1615903 (view as bug list)
Environment:
Last Closed: 2018-10-11 07:24:08 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:2652 0 None None None 2018-10-11 07:24:26 UTC

Description Ryan Cook 2018-08-07 20:02:09 UTC
Description of problem: In azure the load balancer svc cannot be created due to missing configuration parameters.


Version-Release number of selected component (if applicable):
atomic-openshift-excluder-3.10.14-1.git.0.ba8ae6d.el7.noarch
atomic-openshift-hyperkube-3.10.14-1.git.0.ba8ae6d.el7.x86_64
atomic-openshift-node-3.10.14-1.git.0.ba8ae6d.el7.x86_64
atomic-openshift-clients-3.10.14-1.git.0.ba8ae6d.el7.x86_64
atomic-openshift-docker-excluder-3.10.14-1.git.0.ba8ae6d.el7.noarch
atomic-openshift-3.10.14-1.git.0.ba8ae6d.el7.x86_64



How reproducible:
Perform the installation with the the following parameters in the inventory as they relate to your environment
#310
openshift_cloudprovider_kind=azure
openshift_cloudprovider_azure_client_id=ID
openshift_cloudprovider_azure_client_secret=SECRET
openshift_cloudprovider_azure_tenant_id=TENANT
openshift_cloudprovider_azure_subscription_id=SUB
openshift_cloudprovider_azure_resource_group=refarch-azr
openshift_cloudprovider_azure_location=eastus
openshift_release=v3.10
#310end



Steps to Reproduce:
1. deploy 3.10 with the above values
2. create a load balancer svc
vi load.yaml
apiVersion: v1
kind: Service
metadata:
  name: egress-21
spec:
  ports:
  - name: web
    port: 8080 
  type: LoadBalancer 
  selector:
    deploymentconfig: app

oc create -f load.yaml

Actual results: load balancer stays in pending state
egress-21   LoadBalancer   172.30.245.15   <pending>   8080:32566/TCP   7m


Expected results:
egress-21   LoadBalancer   172.30.245.15   137.117.45.245   8080:32566/TCP   7m


Additional info:
The required values are actually located here https://docs.openshift.com/container-platform/3.10/install_config/configuring_azure.html#azure-configuration-file

Description of problem:

Version-Release number of the following components:
rpm -q openshift-ansible
rpm -q ansible
ansible --version

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Ryan Cook 2018-08-07 20:23:14 UTC
TESTING https://github.com/openshift/openshift-ansible/pull/9473

Comment 2 Ryan Cook 2018-08-09 14:07:27 UTC
The PR does the needful

Comment 3 Scott Dodson 2018-08-14 21:25:01 UTC
Should be in openshift-ansible-3.11.0-0.15.0

Comment 4 Wenkai Shi 2018-08-23 05:59:09 UTC
Without those added parameters, OCP with Azure could also works well. QE doesn't think those parameters are "must have".
Could you please make them to be "options" parameters as doc[1] said.

[1]. https://docs.openshift.com/container-platform/3.10/install_config/configuring_azure.html#azure-configuration-file

Comment 5 Wenkai Shi 2018-08-23 06:15:12 UTC
Another doc to explain those parameters.

https://github.com/kubernetes/cloud-provider-azure/blob/master/docs/cloud-provider-config.md#cluster-config

Comment 6 Wenkai Shi 2018-08-23 06:17:37 UTC
Could you please help understand primaryAvailabilitySetName? I can't understand from here[1], would you mind give some examples?

[1]. https://github.com/kubernetes/cloud-provider-azure/blob/master/docs/cloud-provider-config.md#primaryavailabilitysetname

Comment 8 Aleksandar Kostadinov 2018-08-23 11:19:30 UTC
Also parameters are optional for Azure itself. When I create VMs in Azure, there might be no security groups and availability sets and the machines start/work just fine.

wrt cloud name, I didn't know such thing even existed. I assume we can have a default of whatever Azure has as a default cloud. Then only users hooked to other clouds will need to know what should be put in there.

Comment 9 Ryan Cook 2018-08-23 13:25:35 UTC
@Wenkai, With primaryavailabilityset you are defining which set of nodes should be assigned the load balancer. With the current limitations of the Azure cloud provider if a primaryavailabilityset is not defined and any other load balancers are used for instances in the cluster the load balancer will error out because there is a limit on internal and external load balancers that can be assigned to a set of instances.

Comment 10 Ryan Cook 2018-08-23 13:29:10 UTC
@Aleksandar I believe they are required in raw kubernetes.  I do agree that you can get away with them being optional with no security groups and availability sets but the functionality isn't complete.

In regards to cloudname there are a a few options such as govcloud etc

Comment 11 Aleksandar Kostadinov 2018-08-23 15:10:16 UTC
Ryan, could you clarify how exactly machines should be put into availability sets? Create one set and use it always? One set for each cluster? Something else?

Another question is why need a security group?

In any case we need documentation about how exactly to create availability sets and security groups to have complete functionality of the cluster.

Comment 12 Ryan Cook 2018-08-23 18:38:04 UTC
So machines have to be added to the availability set at launch. When talking with Harold from M$ we both agreed for the architectures we were suggesting. 

One availability set per machine type:
1 for masters
1 for infra
1 for apps

So 3 per cluster

For the security group I believe is just used to be assigned to the load balanacer and update any required rules for the load balancer to access those nodes

Sadly, machines cannot be added to availability sets once created but security groups can be added and removed as needed

Comment 13 Aleksandar Kostadinov 2018-08-23 18:47:49 UTC
Then having only one availability set parameter doesn't make sense? If we create 3 new availability sets per cluster I mean. If we create 3 sets, then OpenShift must need to know about them such that cluster can later be scaled up. I know this is not an immediate feature that we will have but given existing plans this appears to be something that we will have to support at some point.

My suggestion would be to at the very least rename `primaryAvailabilitySetName` to `lbAvailabilitySetName` or `infraAvailabilitySetName` such that we don't need to rename the setting later on and make it more clear what `primary` means. Excuse me if I'm missing the original point of this.

wrt security groups, "required rules for the load balancer to access those nodes" is not clear here. We need more specific instructions what exactly rules will be needed to be set.

Comment 14 Ryan Cook 2018-08-24 01:10:07 UTC
So this is a kubernetes feature rather than OpenShift. So if an availability set isn't specified and everything is in one set and a load balancer already exists then a kubernetes svc load balancer cannot be created. Due to this error

E0807 19:23:42.070640       1 service_controller.go:219] error processing service test/egress-2 (will retry): failed to ensure load balancer for service test/egress-2: [ensure(test/egress-2): backendPoolID(/subscriptions/8117c1c9-d10d-4366-86cc-e3ccaacaae2d/resourceGroups/refarch-azr/providers/Microsoft.Network/loadBalancers/kubernetes/backendAddressPools/kubernetes) - failed to ensure host in pool: "network.InterfacesClient#CreateOrUpdate: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code=\"NetworkInterfaceUsesMultipleLoadBalancersOfSameType\" Message=\"Network interface /subscriptions/8117c1c9-d10d-4366-86cc-e3ccaacaae2d/resourceGroups/refarch-azr/providers/Microsoft.Network/networkInterfaces/ocp-infra-1VMNic references more than one load balancer of the same type (internal or public): /subscriptions/8117c1c9-d10d-4366-86cc-e3ccaacaae2d/resourceGroups/refarch-azr/providers/Microsoft.Network/loadBalancers/OcpRouterLB, /subscriptions/8117c1c9-d10d-4366-86cc-e3ccaacaae2d/resourceGroups/refarch-azr/providers/Microsoft.Network/loadBalancers/kubernetes. Only one internal and one public load balancer are allowed per availability set.\" Details=[]", ensure(test/egress-2): backendPoolID(/subscriptions/8117c1c9-d10d-4366-86cc-e3ccaacaae2d/resourceGroups/refarch-azr/providers/Microsoft.Network/loadBalancers/kubernetes/backendAddressPools/kubernetes) - failed to ensure host in pool: "network.InterfacesClient#CreateOrUpdate: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code=\"NetworkInterfaceUsesMultipleLoadBalancersOfSameType\" Message=\"Network interface /subscriptions/8117c1c9-d10d-4366-86cc-e3ccaacaae2d/resourceGroups/refarch-azr/providers/Microsoft.Network/networkInterfaces/ocp-infra-3VMNic references more than one load balancer of the same type (internal or public): /subscriptions/8117c1c9-d10d-4366-86cc-e3ccaacaae2d/resourceGroups/refarch-azr/providers/Microsoft.Network/loadBalancers/OcpRouterLB, /subscriptions/8117c1c9-d10d-4366-86cc-e3ccaacaae2d/resourceGroups/refarch-azr/providers/Microsoft.Network/loadBalancers/kubernetes. Only one internal and one public load balancer are allowed per availability set.\" Details=[]", ensure(test/egress-2): backendPoolID(/subscriptions/8117c1c9-d10d-4366-86cc-e3ccaacaae2d/resourceGroups/refarch-azr/providers/Microsoft.Network/loadBalancers/kubernetes/backendAddressPools/kubernetes) - failed to ensure host in pool: "network.InterfacesClient#CreateOrUpdate: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code=\"NetworkInterfaceUsesMultipleLoadBalancersOfSameType\" Message=\"Network interface /subscriptions/8117c1c9-d10d-4366-86cc-e3ccaacaae2d/resourceGroups/refarch-azr/providers/Microsoft.Network/networkInterfaces/ocp-infra-2VMNic references more than one load balancer of the same type (internal or public): /subscriptions/8117c1c9-d10d-4366-86cc-e3ccaacaae2d/resourceGroups/refarch-azr/providers/Microsoft.Network/loadBalancers/OcpRouterLB, /subscriptions/8117c1c9-d10d-4366-86cc-e3ccaacaae2d/resourceGroups/refarch-azr/providers/Microsoft.Network/loadBalancers/kubernetes. Only one internal and one public load balancer are allowed per availability set.\" Details=[]"]

I am basing these variable names based specifically on what kubernetes expects. I think trying to stay as close to kubernetes allows other engineers that pick this up to know what we are actually doing.

I can verify the security group items tomorrow if you need me to investigate. Like I said I am just basing this off of my experience. If all these conditions are not met then a load balancer svc cannot be created. I agree that to an extent these variables would be considered optional but then again you pretty much have to define them anyways if you do a deployment so might as well just make them mandatory. The only reason I am pushing for this I found out the hard way when trying to use an external load balanacer for the summit demo that these items were required.

Here is the kubernetes variables https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/azure/azure.go#L98

Comment 19 Wenkai Shi 2018-08-28 07:42:31 UTC
I've create a PR for make those parameters as option: https://github.com/openshift/openshift-ansible/pull/9789

Comment 20 Ryan Cook 2018-08-28 13:38:21 UTC
(In reply to Wenkai Shi from comment #6)
> Could you please help understand primaryAvailabilitySetName? I can't
> understand from here[1], would you mind give some examples?
> 
> [1].
> https://github.com/kubernetes/cloud-provider-azure/blob/master/docs/cloud-
> provider-config.md#primaryavailabilitysetname

# VM Create 
az vm availability-set create \
    --resource-group refarch-azr \
    --name ocp-master-instances

It's just a value that is placed on a set of launched instances. The above is an example of creating an availability set for masters. The master instances are then launched into the availability set like this.

    az vm create \
    --resource-group refarch-azr \
    --name ocp-master-$i \
    --availability-set ocp-master-instances \
    --size Standard_E2S_v3 \
    --image RedHat:RHEL:7-RAW:latest \
    --admin-user cloud-user \
    --ssh-key /var/lib/jenkins/.ssh/id_rsa.pub \
    --data-disk-sizes-gb 32 32 32 \
    --no-wait \
    --nics ocp-master-${i}VMNic;

Comment 21 Scott Dodson 2018-08-29 15:47:03 UTC
PR from 19 merged.

Comment 22 Scott Dodson 2018-08-29 15:47:38 UTC
In openshift-ansible-3.11.0-0.25.0

Comment 23 Wenkai Shi 2018-08-30 03:24:01 UTC
Verified with version openshift-ansible-3.11.0-0.25.0.git.0.7497e69.el7, it works good.

Comment 25 errata-xmlrpc 2018-10-11 07:24:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652


Note You need to log in before you can comment on or make changes to this bug.