Bug 1806067
Summary: | Ingress not supported on Azure IPv6 | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Dan Winship <danw> |
Component: | Networking | Assignee: | Andrew McDermott <amcdermo> |
Networking sub component: | router | QA Contact: | Hongan Li <hongli> |
Status: | CLOSED WONTFIX | Docs Contact: | |
Severity: | medium | ||
Priority: | low | CC: | amcdermo, aos-bugs, bbennett, ccoleman, erich, xtian |
Version: | 4.3.z | Keywords: | Reopened, TestBlocker |
Target Milestone: | --- | ||
Target Release: | 4.7.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-09-04 15:20:32 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Dan Winship
2020-02-21 22:31:05 UTC
ok, ingress clusteroperator says: - type: Degraded status: "True" reason: IngressControllersDegraded message: 'Some ingresscontrollers are degraded: default' default ingresscontroller says: - type: DNSReady status: "False" reason: FailedZones message: 'The record failed to provision in some zones: [{/subscriptions/5970b0fe-21de-4e1a-a192-0a785017e3b7/resourceGroups/dwinship-ipv6-43-tvkrk-rg/providers/Microsoft.Network/privateDnsZones/dwinship-ipv6-43.sdn.azure.devcluster.openshift.com map[]} {/subscriptions/5970b0fe-21de-4e1a-a192-0a785017e3b7/resourceGroups/os4-common/providers/Microsoft.Network/dnszones/sdn.azure.devcluster.openshift.com map[]}]' default-wildcard dnsrecord says: - type: Failed status: "True" reason: ProviderError message: 'The DNS provider failed to ensure the record: failed to update dns a record: *.apps.dwinship-ipv6-43.sdn.azure.devcluster.openshift.com: azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/5970b0fe-21de-4e1a-a192-0a785017e3b7/resourceGroups/dwinship-ipv6-43-tvkrk-rg/providers/Microsoft.Network/privateDnsZones/dwinship-ipv6-43.sdn.azure.devcluster.openshift.com/A/*.apps?api-version=2018-09-01: StatusCode=0 -- Original Error: adal: Failed to execute the refresh request. Error = ''Post https://login.microsoftonline.com/6047c7e9-b2ad-488d-a54e-dc3f6be6a7ee/oauth2/token?api-version=1.0: dial tcp 20.190.134.9:443: connect: network is unreachable''' and now I see that due to a bad rebase, ingress-operator had gotten dropped from the list of pods that need hacked-up external IPv4 access. OK, reopening... even with working DNS, ingress does not work. The installer seems to create a ${CLUSTER_NAME}-public-lb load balancer that is used for the apiserver which is dual-stack: danw@p50:installer (release-4.3 $)> host api.dwinship-ipv6-43.sdn.azure.devcluster.openshift.com api.dwinship-ipv6-43.sdn.azure.devcluster.openshift.com is an alias for dwinship-ipv6-43-nzxlv.centralus.cloudapp.azure.com. dwinship-ipv6-43-nzxlv.centralus.cloudapp.azure.com has address 52.154.163.125 dwinship-ipv6-43-nzxlv.centralus.cloudapp.azure.com has IPv6 address 2603:1030:b:3::48 but we get a ${CLUSTER_NAME} load balancer that is used for router-default that is single-stack IPv4: danw@p50:installer (release-4.3 $)> host oauth-openshift.apps.dwinship-ipv6-43.sdn.azure.devcluster.openshift.com oauth-openshift.apps.dwinship-ipv6-43.sdn.azure.devcluster.openshift.com has address 13.86.5.132 connections to this LB do not succeed, causing, eg: danw@p50:installer (release-4.3 $)> oc get clusteroperator authentication -o yaml ... status: conditions: - lastTransitionTime: "2020-02-23T23:57:05Z" message: 'RouteHealthDegraded: failed to GET route: dial tcp: i/o timeout' reason: RouteHealthDegradedFailedGet status: "True" type: Degraded I'm not sure if that load balancer is initially created by kube-controller-manager or the installer, but kube-controller-manager at least eventually takes over maintenance of it. The Azure kube CloudProvider does not appear to support single-stack IPv6; all of the code for handling IPv6 is inside checks for dual-stack being enabled, and in several places there are explicit comments about not supporting single-stack IPv6. (eg, https://github.com/openshift/origin/blob/20075b26/vendor/k8s.io/kubernetes/staging/src/k8s.io/legacy-cloud-providers/azure/azure_loadbalancer.go#L564. There are no relevant differences between kube 1.16 and kube master.) I played around with just unconditionally enabling all the dual-stack code... I'm not sure it actually works even for dual-stack as-is though, because it doesn't seem to take into account the fact that you can't create a single-stack IPv6 load balancer; you have to create a dual-stack load balancer even if you only want IPv6 backends. https://github.com/openshift-kni/origin/commit/56931373 is what I came up with, which does not yet work. Although the Azure console now shows that the ${CLUSTER_NAME} load balancer has both IPv4 and IPv6 frontend IPs, kube-controller-manager repeatedly complains that: I0223 23:56:28.357260 1 azure_backoff.go:287] LoadBalancerClient.CreateOrUpdate(dwinship-ipv6-hacked-8bnw2): end E0223 23:56:28.357296 1 azure_backoff.go:749] processHTTPRetryResponse: backoff failure, will retry, err=Code="AtleastOneIpV4RequiredForIpV6LbFrontendIpConfiguration" Message="At least one IPv4 frontend ipConfiguration is required for an IPv6 frontend ipConfiguration on the load balancer '/subscriptions/5970b0fe-21de-4e1a-a192-0a785017e3b7/resourceGroups/dwinship-ipv6-hacked-8bnw2-rg/providers/Microsoft.Network/loadBalancers/dwinship-ipv6-hacked-8bnw2'" Details=[] which in turn leads to danw@p50:installer (release-4.3 $)> oc get services -n openshift-ingress router-default NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE router-default LoadBalancer fd02::a8eb <pending> 80:30564/TCP,443:31817/TCP 35m Google isn't turning up anything useful about the error message. You can't create a partial IPv6 only LB. I suspect your change is missing the second configuration when you create it (you need ip_address_version set to IPv4 for one and IPv6 for the other) I think the cloud provider code need some large refactoring. The frontend config setup code makes foundational assumptions about a single frontend config, and the IPv6 config is bolted on such that no frontend->backend rules are generated for the ipv4 config, and to get them generated. It'll take some work, but I think it's fixable. Moving to 4.5 because we won't block the release, but I may change it back to a 4.4[.z] if the fix comes more quickly than expected. The work is becoming much more involved than anticipated. Moving out to 4.6 I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint. I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint. I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint. Target reset to 4.7 while investigation is either ongoing or not yet started. Will be considered for earlier release versions when diagnosed and resolved. |