Bug 1803844

Summary: Single-stack IPv6 Azure clusters on 4.3 assign only IPv4 addresses to worker nodes
Product: OpenShift Container Platform Reporter: Dan Winship <danw>
Component: Cloud ComputeAssignee: Alberto <agarcial>
Cloud Compute sub component: Other Providers QA Contact: Jianwei Hou <jhou>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: unspecified    
Priority: urgent CC: agarcial, mgugino
Version: 4.3.z   
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-19 11:57:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1805704, 1805705    

Description Dan Winship 2020-02-17 15:12:17 UTC
When creating a single-stack IPv6 (or presumably dual-stack) Azure 4.3 cluster using the as-yet-unmerged installer PR https://github.com/openshift/installer/pull/3029 and a hacked 4.3-ipv6 release image (eg as in the comment https://github.com/openshift/installer/pull/3029#issuecomment-586975568), the cluster comes up to a certain point but then fails because the workers have only IPv4 host IPs. (That is, "ip -6 addr" shows only a link-local address on the primary network interface, and the Azure console shows that the node has been assigned an IPv4 address, but not an IPv6 address.) This is in contrast to the master nodes, which get both IPv4 and IPv6 addresses.

Clayton had seen similar behavior in 4.4/master in December, but then it went away.

This blocks 4.3 IPv6 CI.

Comment 1 Dan Winship 2020-02-17 18:04:43 UTC
> Clayton had seen similar behavior in 4.4/master in December, but then it went away.

Actually... it's possible that the "went away" part isn't true. At some point other 4.4 components regressed in a way that made the 4.4 IPv6 install not get far enough to create worker nodes. So maybe it's still broken in master too...

Comment 2 Brad Ison 2020-02-20 23:45:54 UTC
So, I don't think this has ever worked. The machine-api doesn't actually create the master nodes in OpenShift. The installer uses Terraform to create them and then crafts matching Machine objects to "adopt" those instances. It makes sense that the master nodes had IPv6 addresses then, while the workers didn't.

The Azure actuator doesn't support IPv6 configurations. This is actually an open issue upstream:

  https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/318

I unfortunately have not been able to get an IPv6 cluster running with the changes in the PR linked originally. The infrastructure is created, but the master instances fail to finish provisioning. I haven't been able to figure out exactly why.

Anyway, I tried to get a start on this by creating an IPv4 cluster and manually configuring IPv6 address space on the worker subnet in a way that I think is similar to what the installer PR is doing. Here's the result:

       master: https://github.com/openshift/cluster-api-provider-azure/pull/107
  release-4.3: https://github.com/openshift/cluster-api-provider-azure/pull/108

Definitely a work-in-progress, but would be great to get some help testing this.

Comment 5 Michael Gugino 2020-05-18 21:00:24 UTC
Bumping, this bug has been on QA for some time now.

Comment 6 Dan Winship 2020-05-19 11:57:42 UTC
This was fixed and is working in 4.4, but since IPv6-on-Azure as a whole never became fully-functional it can't really be QA'ed.