Bug 1805936
Summary: | installer crashes while destroying bootstrap host on Azure IPv6 cluster | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Dan Winship <danw> | |
Component: | Installer | Assignee: | John Hixson <jhixson> | |
Installer sub component: | openshift-installer | QA Contact: | Gaoyun Pei <gpei> | |
Status: | CLOSED DUPLICATE | Docs Contact: | ||
Severity: | urgent | |||
Priority: | urgent | CC: | adahiya, ccoleman, dmace, jhixson, sdodson, wking | |
Version: | 4.4 | |||
Target Milestone: | --- | |||
Target Release: | 4.5.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1808969 1808973 (view as bug list) | Environment: | ||
Last Closed: | 2020-03-17 01:10:01 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: |
Description
Dan Winship
2020-02-21 17:51:41 UTC
lol, ok, it happens in 4.4 too, I just hadn't noticed because apparently I had never gotten a 4.4 IPv6 install to the point where the installer tears down the bootstrap host before... > Target Release: --- → 4.5.0 note that if you're going to try to debug this against git master, you'll have to fix bug 1805251 first... I can confirm that I receive the same errors when following the instructions. I have found the problem and have a patch for it. I will write up the details tomorrow. The problem seems to be with the address_prefix field when terraform is refreshing state for the subnets within the virtual network. The "address_prefix" field is null, while "address_prefixes" gets populated. The crash occurs in this section of the terraform code: terraform-provider-azurerm/azurerm/internal/services/network/resource_arm_virtual_network.go: func resourceAzureSubnetHash(v interface{}) int { var buf bytes.Buffer if m, ok := v.(map[string]interface{}); ok { buf.WriteString(m["name"].(string)) // This is causing the crash buf.WriteString(m["address_prefix"].(string)) if v, ok := m["security_group"]; ok { buf.WriteString(v.(string)) } } return hashcode.String(buf.String()) } The fix I have tested with is: func resourceAzureSubnetHash(v interface{}) int { var buf bytes.Buffer if m, ok := v.(map[string]interface{}); ok { buf.WriteString(m["name"].(string)) if v, ok := m["address_prefix"]; ok { buf.WriteString(v.(string)) } if v, ok := m["security_group"]; ok { buf.WriteString(v.(string)) } } return hashcode.String(buf.String()) } There are still problems with this fixed though. I don't know if this is masking another problem. Was this known to work before this problem was found? I can submit a patch for this, but it looks like more work is required. > Was this known to work before this problem was found?
It works in 4.4. Presumably this function changed in the rebase?
Probably it needs to look at both address_prefix and address_prefixes, and use whichever is set.
I was no able to get a 4.4 IPv6 install to work. I am trying with image release:4.3.0-0.nightly-2020-02-17-205936-ipv6.1d3. Is there a 4.4 image known to work? I was following the instructions as per the document in this ticket. For 4.3, with the following diff I get an almost complete install: diff --git a/pkg/terraform/exec/plugins/vendor/github.com/terraform-providers/terraform-provider-azurerm/azurerm/resource_arm_loadbalancer.go b/pkg/terraform/exec/plugins/ven index 4950399..ad92244 100644 --- a/pkg/terraform/exec/plugins/vendor/github.com/terraform-providers/terraform-provider-azurerm/azurerm/resource_arm_loadbalancer.go +++ b/pkg/terraform/exec/plugins/vendor/github.com/terraform-providers/terraform-provider-azurerm/azurerm/resource_arm_loadbalancer.go @@ -68,10 +68,10 @@ func resourceArmLoadBalancer() *schema.Resource { }, "private_ip_address": { - Type: schema.TypeString, - Optional: true, - Computed: true, - ValidateFunc: validate.IPv4AddressOrEmpty, + Type: schema.TypeString, + Optional: true, + //Computed: true, + //ValidateFunc: validate.IPv4AddressOrEmpty, }, "private_ip_address_version": { diff --git a/pkg/terraform/exec/plugins/vendor/github.com/terraform-providers/terraform-provider-azurerm/azurerm/resource_arm_virtual_network.go b/pkg/terraform/exec/plugins/ index 968fc70..762449b 100644 --- a/pkg/terraform/exec/plugins/vendor/github.com/terraform-providers/terraform-provider-azurerm/azurerm/resource_arm_virtual_network.go +++ b/pkg/terraform/exec/plugins/vendor/github.com/terraform-providers/terraform-provider-azurerm/azurerm/resource_arm_virtual_network.go @@ -416,7 +416,10 @@ func resourceAzureSubnetHash(v interface{}) int { if m, ok := v.(map[string]interface{}); ok { buf.WriteString(m["name"].(string)) - buf.WriteString(m["address_prefix"].(string)) + //buf.WriteString(m["address_prefix"].(string)) + if a, ok := m["address_prefix"]; ok { + buf.WriteString(a.(string)) + } if v, ok := m["security_group"]; ok { buf.WriteString(v.(string)) The install does bomb out waiting for operators to become stable. With image release:4.3.0-0.nightly-2020-02-17-205936-ipv6.1d3, there are several. With image elease:4.3.0-0.nightly-2020-02-21-091838-ipv6.2d9 it is just authentication and console. As for the address_prefix, nothing changed in 4.4, except for some dependency clauses in the terraform and removable of a non-used route table: diff --git a/dns/dns.tf b/dns/dns.tf index 69c0431..5816448 100644 --- a/dns/dns.tf +++ b/dns/dns.tf @@ -6,6 +6,8 @@ locals { resource "azureprivatedns_zone" "private" { name = var.cluster_domain resource_group_name = var.resource_group_name + + depends_on = [azurerm_dns_cname_record.api_external_v4, azurerm_dns_cname_record.api_external_v6] } resource "azureprivatedns_zone_virtual_network_link" "network" { diff --git a/vnet/vnet.tf b/vnet/vnet.tf index 7328fba..ddbd632 100644 --- a/vnet/vnet.tf +++ b/vnet/vnet.tf @@ -7,12 +7,6 @@ resource "azurerm_virtual_network" "cluster_vnet" { address_space = concat(var.vnet_v4_cidrs, var.vnet_v6_cidrs) } -resource "azurerm_route_table" "route_table" { - name = "${var.cluster_id}-node-routetable" - location = var.region - resource_group_name = var.resource_group_name -} - resource "azurerm_subnet" "master_subnet" { count = var.preexisting_network ? 0 : 1 Is anything here making any sense? Should a pull request be opened with the proposed fixes? Any pointers? (cc: Clayton since it relates to your original Azure IPv6 modifications and your pending external PR. cc: Dan Mace because this may be part of the cause of bug 1806067.) (In reply to John Hixson from comment #14) > I was no able to get a 4.4 IPv6 install to work. I am trying with image > release:4.3.0-0.nightly-2020-02-17-205936-ipv6.1d3. Is there a 4.4 image > known to work? There is not yet any image that will get you a complete successful install on Azure, but I think at this point the latest official 4.3 and 4.4 images will get you to the point where the installer destroys the bootstrap resources. > For 4.3, with the following diff I get an almost complete install: > "private_ip_address": { > - Type: > schema.TypeString, > - Optional: true, > - Computed: true, > - ValidateFunc: > validate.IPv4AddressOrEmpty, > + Type: > schema.TypeString, > + Optional: true, > + //Computed: true, > + //ValidateFunc: > validate.IPv4AddressOrEmpty, This is one of the files that Clayton modified when adding Azure IPv6 support (https://github.com/openshift/installer/commit/3d00b6c#diff-f5330fc6f6380bb4171ec763f15d803a / https://github.com/terraform-providers/terraform-provider-azurerm/pull/5590). Presumably it's failing because "private_ip_address" actually contains an IPv6 address here and you need to add an appropriate validator for that to pkg/terraform/exec/plugins/vendor/github.com/terraform-providers/terraform-provider-azurerm/azurerm/helpers/validate/network.go. > - buf.WriteString(m["address_prefix"].(string)) > + //buf.WriteString(m["address_prefix"].(string)) > + if a, ok := m["address_prefix"]; ok { > + buf.WriteString(a.(string)) > + } I think you need to use "address_prefixes" here if "address_prefix" isn't set. > The install does bomb out waiting for operators to become stable. Yup. Expected. Ingress isn't working yet. (Although maybe that's partly because of the problems here?) > As for the address_prefix, nothing changed in 4.4, except for some > dependency clauses in the terraform and removable of a non-used route table: Yeah, it seems like I was just confused about it being 4.3-only. (In reply to Dan Winship from comment #8) > lol, ok, it happens in 4.4 too, I just hadn't noticed because apparently I > had never gotten a 4.4 IPv6 install to the point where the installer tears > down the bootstrap host before... I haven't had any luck with 4.4. It always fails with a OSProvisioningTimedOut. For now, I think I will just work on the 4.3 problem and open up a PR for it. I will handle 4.4 separately. I just tried 4.4.0-0.nightly-2020-03-03-110909 (with an appropriate install-config and OPENSHIFT_INSTALL_AZURE_EMULATE_SINGLESTACK_IPV6=true) and got the terraform errors. Can this be synthesized without a complete and successful IPv6 install? ie: 1) `openshift-install create cluster`, wait for failure 2) `openshift-install destroy cluster`, confirm bootstrap host destroy failure 3) fix `openshift-install` 4) `openshift-install destroy cluster`, successful removal of bootstrap host *** This bug has been marked as a duplicate of bug 1805251 *** |