Bug 1987083

Summary: excludeMastersFromLB in Azure Cloud Config prevents service controller from adding masters
Product: OpenShift Container Platform Reporter: Patrick Dillon <padillon>
Component: InstallerAssignee: Patrick Dillon <padillon>
Installer sub component: openshift-installer QA Contact: Shu Wang <shwan>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: jialiu
Version: 4.9   
Target Milestone: ---   
Target Release: 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-18 17:43:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Patrick Dillon 2021-07-28 20:42:02 UTC
With the move to out-of-tree providers in Azure (4.10) and Azure Stack Hub(4.9), the excludeMastersFromLB: true value in the cloud provider config has created an issue where if a master node restarts the service controller will not add it back to the load balancer. 

This value should be set to false.

Comment 2 Shu Wang 2021-08-02 07:40:29 UTC
Verified fixed.
Verified with 4.9 nightly build: 4.9.0-0.nightly-2021-08-01-223336, after restarting the master, the service controller added it back to the load balancer.
Created a related test case: 
https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-4317

Comment 3 Shu Wang 2021-08-02 08:59:32 UTC
updated the test case link: 
https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-43176

Comment 4 Johnny Liu 2021-08-27 07:41:27 UTC
Add some more verification steps (per 4.9.0-0.nightly-2021-08-26-040328 build) based on comment 2.

[root@preserve-jialiu-ansible ~]# oc debug node/qeci-26032-h5ngk-master-0
Starting pod/qeci-26032-h5ngk-master-0-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.0.7
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# ps -ef|grep kubelet
root        1988       1 19 04:20 ?        00:37:05 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --runtime-cgroups=/system.slice/crio.service --node-labels=node-role.kubernetes.io/master,node.openshift.io/os_id=rhcos --node-ip= --minimum-container-ttl-duration=6m0s --cloud-provider=azure --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --cloud-config=/etc/kubernetes/cloud.conf --hostname-override= --register-with-taints=node-role.kubernetes.io/master=:NoSchedule --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:347702e4f91395e1f3d4cbae92248fd164e58f577da8c453a9d0b225f867426b --system-reserved=cpu=500m,memory=1Gi --v=2
sh-4.4# cat /etc/kubernetes/cloud.conf
{
	"cloud": "AzurePublicCloud",
	"tenantId": "6047c7e9-b2ad-488d-a54e-dc3f6be6a7ee",
	"aadClientId": "",
	"aadClientSecret": "",
	"aadClientCertPath": "",
	"aadClientCertPassword": "",
	"useManagedIdentityExtension": true,
	"userAssignedIdentityID": "",
	"subscriptionId": "53b8f551-f0fc-4bea-8cba-6d1fefd54c8a",
	"resourceGroup": "qeci-26032-h5ngk-rg",
	"location": "centralus",
	"vnetName": "qeci-26032-h5ngk-vnet",
	"vnetResourceGroup": "qeci-26032-h5ngk-rg",
	"subnetName": "qeci-26032-h5ngk-worker-subnet",
	"securityGroupName": "qeci-26032-h5ngk-nsg",
	"routeTableName": "qeci-26032-h5ngk-node-routetable",
	"primaryAvailabilitySetName": "",
	"vmType": "",
	"primaryScaleSetName": "",
	"cloudProviderBackoff": true,
	"cloudProviderBackoffRetries": 0,
	"cloudProviderBackoffExponent": 0,
	"cloudProviderBackoffDuration": 6,
	"cloudProviderBackoffJitter": 0,
	"cloudProviderRateLimit": false,
	"cloudProviderRateLimitQPS": 0,
	"cloudProviderRateLimitBucket": 0,
	"cloudProviderRateLimitQPSWrite": 0,
	"cloudProviderRateLimitBucketWrite": 0,
	"useInstanceMetadata": true,
	"loadBalancerSku": "standard",
	"excludeMasterFromStandardLB": false,
	"disableOutboundSNAT": null,
	"maximumLoadBalancerRuleCount": 0
}sh-4.4# cat /etc/kubernetes/cloud.conf|grep excludeMasterFromStandardLB
	"excludeMasterFromStandardLB": false,
sh-4.4# exit

Comment 7 errata-xmlrpc 2021-10-18 17:43:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759