1959479 – machines doesn't support dual-stack loadbalancers on Azure

Bug 1959479 - machines doesn't support dual-stack loadbalancers on Azure

Summary: machines doesn't support dual-stack loadbalancers on Azure

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Machine Config Operator
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	4.8.0
Assignee:	Antonio Ojea
QA Contact:	Milind Yadav
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-05-11 15:46 UTC by Antonio Ojea
Modified:	2021-07-27 23:08 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-07-27 23:07:53 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-api-provider-azure pull 208	0	None	open	Bug 1959479: UPSTREAM: <carry>: openshift: Fix dual stack support for machines with Load Balancers associated	2021-05-11 15:46:54 UTC
Red Hat Product Errata	RHSA-2021:2438	0	None	None	None	2021-07-27 23:08:11 UTC

Description Antonio Ojea 2021-05-11 15:46:22 UTC

Description of problem:

Current dual-stack support fails when there are loadbalancers associated to the VMs, because load balancers implement different backends per IP family.

This causes that when a loadbalancer with multiples IPs is associates to a dual stack VM, the VM interfaces is added to a backend with wrong IP family, causing that the VM can not be created

I0317 22:42:50.679720       1 networkinterfaces.go:188] Found IPv6 address space. Adding IPv6 configuration to nic: aojeadual-wzzqr-worker-centralus2-dzlvq-nic
I0317 22:42:51.230128       1 machine_scope.go:160] aojeadual-wzzqr-worker-centralus2-dzlvq: patching machine
E0317 22:42:51.295170       1 actuator.go:78] Machine error: failed to reconcile machine "aojeadual-wzzqr-worker-centralus2-dzlvq": network.InterfacesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="AllNicIpConfigurationsOfLbBackendPoolMustHaveSamePrivateIpAddressVersion" Message="Network interface ipConfiguration '/subscriptions/5970b0fe-21de-4e1a-a192-0a785017e3b7/resourceGroups/aojeadual-wzzqr-rg/providers/Microsoft.Network/networkInterfaces/aojeadual-wzzqr-worker-centralus2-dzlvq-nic/ipConfigurations/pipConfig' with privateIpAddressVersion 'IPv4' does not match with privateIpAddressVersion 'IPv6' of network interface ipConfiguration '/subscriptions/5970b0fe-21de-4e1a-a192-0a785017e3b7/resourceGroups/aojeadual-wzzqr-rg/providers/Microsoft.Network/networkInterfaces/aojeadual-wzzqr-master0-nic/ipConfigurations/pipConfig-v6' in the load balancer backend address pool '/subscriptions/5970b0fe-21de-4e1a-a192-0a785017e3b7/resourceGroups/aojeadual-wzzqr-rg/providers/Microsoft.Network/loadBalancers/aojeadual-wzzqr/backendAddressPools/aojeadual-wzzqr-IPv6'." Details=[]

Comment 7 Milind Yadav 2021-05-26 05:07:43 UTC

thanks @Mike ,

I also repeated , I could validate the machines and nodes are running properly and adding machine-controller logs and other details below :

 Used - 4.8.0-0.nightly-2021-05-25-223219

step1. 
Create a IPI azure cluster as mentioned in #c6 .

Make sure all nodes are in ready status and machines are running .

Monitor logs of machine-controller 
.
.
.
.
I0526 03:51:24.508410       1 networkinterfaces.go:92] Found IPv6 address space. Adding IPv6 configuration to nic: miyadav-26-dkq4q-worker-centralus2-4wb44-nic
I0526 03:51:25.267687       1 reflector.go:255] Listing and watching *v1beta1.MachineSet from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:241
I0526 03:51:25.280782       1 controller.go:73] controllers/MachineSet "msg"="Reconciling" "machineset"="miyadav-26-dkq4q-worker-centralus1" "namespace"="openshift-machine-api" 
I0526 03:51:25.337441       1 reflector.go:255] Listing and watching *v1.Secret from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:241
I0526 03:51:25.407382       1 controller.go:73] controllers/MachineSet "msg"="Reconciling" "machineset"="miyadav-26-dkq4q-worker-centralus2" "namespace"="openshift-machine-api" 
I0526 03:51:25.421295       1 controller.go:73] controllers/MachineSet "msg"="Reconciling" "machineset"="miyadav-26-dkq4q-worker-centralus3" "namespace"="openshift-machine-api" 
I0526 03:51:25.509234       1 reflector.go:255] Listing and watching *v1beta1.Machine from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:241
I0526 03:51:25.747936       1 reflector.go:255] Listing and watching *v1.Infrastructure from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:241
I0526 03:51:35.116375       1 networkinterfaces.go:289] successfully created network interface miyadav-26-dkq4q-worker-centralus2-4wb44-nic
.
.
.

Described the machine to confirm it has both addresses :
.
.
.
Status:
  Addresses:
    Address:  miyadav-26-dkq4q-worker-centralus2-4wb44
    Type:     Hostname
    Address:  miyadav-26-dkq4q-worker-centralus2-4wb44
    Type:     InternalDNS
    Address:  miyadav-26-dkq4q-worker-centralus2-4wb44.ywamirewfskernh4d35byhn1mg.gx.internal.cloudapp.net
    Type:     InternalDNS
    Address:  10.0.32.4
    Type:     InternalIP
    Address:  fd00:0:0:1::4
    Type:     InternalIP
.
.

[miyadav@miyadav ~]$ oc get machines
NAME                                       PHASE     TYPE              REGION      ZONE   AGE
miyadav-26-dkq4q-master-0                  Running   Standard_D8s_v3   centralus   2      75m
miyadav-26-dkq4q-master-1                  Running   Standard_D8s_v3   centralus   3      75m
miyadav-26-dkq4q-master-2                  Running   Standard_D8s_v3   centralus   1      75m
miyadav-26-dkq4q-worker-centralus2-4wb44   Running   Standard_D2s_v3   centralus   2      68m
miyadav-26-dkq4q-worker-centralus3-fq8lm   Running   Standard_D2s_v3   centralus   3      68m
[miyadav@miyadav ~]$ 


Additional Info :
Since the cluster installation failed with machine-config operator being degraded 

I could see below in the config-operator pod logs , not sure if these are related to the change or entirely different and expected behavior please take a look , based on that we can move this to VERIFIED or wait for another PR if needed.
oc logs -f machine-config-operator-cb988447b-gcd7s

.
.
.
E0526 05:03:33.998741       1 sync.go:598] Error syncing Required MachineConfigPools: "error pool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 3)"
E0526 05:03:35.001260       1 sync.go:598] Error syncing Required MachineConfigPools: "error pool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 3)"
E0526 05:03:35.999896       1 sync.go:598] Error syncing Required MachineConfigPools: "error pool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 3)"
E0526 05:03:37.000291       1 sync.go:598] Error syncing Required MachineConfigPools: "error pool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 3)"
E0526 05:03:37.997342       1 sync.go:598] Error syncing Required MachineConfigPools: "error pool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 3)"
E0526 05:03:39.002106       1 sync.go:598] Error syncing Required MachineConfigPools: "error pool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 3)"
E0526 05:03:40.005047       1 sync.go:598] Error syncing Required MachineConfigPools: "error pool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 3)"
E0526 05:03:41.001380       1 sync.go:598] Error syncing Required MachineConfigPools: "error pool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 3)"
.
.
.

Comment 14 errata-xmlrpc 2021-07-27 23:07:53 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Note You need to log in before you can comment on or make changes to this bug.