Bug 1805936
| Summary: | installer crashes while destroying bootstrap host on Azure IPv6 cluster | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Dan Winship <danw> | |
| Component: | Installer | Assignee: | John Hixson <jhixson> | |
| Installer sub component: | openshift-installer | QA Contact: | Gaoyun Pei <gpei> | |
| Status: | CLOSED DUPLICATE | Docs Contact: | ||
| Severity: | urgent | |||
| Priority: | urgent | CC: | adahiya, ccoleman, dmace, jhixson, sdodson, wking | |
| Version: | 4.4 | |||
| Target Milestone: | --- | |||
| Target Release: | 4.5.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1808969 1808973 (view as bug list) | Environment: | ||
| Last Closed: | 2020-03-17 01:10:01 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
|
Description
Dan Winship
2020-02-21 17:51:41 UTC
lol, ok, it happens in 4.4 too, I just hadn't noticed because apparently I had never gotten a 4.4 IPv6 install to the point where the installer tears down the bootstrap host before... > Target Release: --- → 4.5.0 note that if you're going to try to debug this against git master, you'll have to fix bug 1805251 first... I can confirm that I receive the same errors when following the instructions. I have found the problem and have a patch for it. I will write up the details tomorrow. The problem seems to be with the address_prefix field when terraform is refreshing state for the subnets within the virtual network. The "address_prefix" field is null, while "address_prefixes" gets populated. The crash occurs in this section of the terraform code:
terraform-provider-azurerm/azurerm/internal/services/network/resource_arm_virtual_network.go:
func resourceAzureSubnetHash(v interface{}) int {
var buf bytes.Buffer
if m, ok := v.(map[string]interface{}); ok {
buf.WriteString(m["name"].(string))
// This is causing the crash
buf.WriteString(m["address_prefix"].(string))
if v, ok := m["security_group"]; ok {
buf.WriteString(v.(string))
}
}
return hashcode.String(buf.String())
}
The fix I have tested with is:
func resourceAzureSubnetHash(v interface{}) int {
var buf bytes.Buffer
if m, ok := v.(map[string]interface{}); ok {
buf.WriteString(m["name"].(string))
if v, ok := m["address_prefix"]; ok {
buf.WriteString(v.(string))
}
if v, ok := m["security_group"]; ok {
buf.WriteString(v.(string))
}
}
return hashcode.String(buf.String())
}
There are still problems with this fixed though. I don't know if this is masking another problem. Was this known to work before this problem was found? I can submit a patch for this, but it looks like more work is required.
> Was this known to work before this problem was found?
It works in 4.4. Presumably this function changed in the rebase?
Probably it needs to look at both address_prefix and address_prefixes, and use whichever is set.
I was no able to get a 4.4 IPv6 install to work. I am trying with image release:4.3.0-0.nightly-2020-02-17-205936-ipv6.1d3. Is there a 4.4 image known to work? I was following the instructions as per the document in this ticket. For 4.3, with the following diff I get an almost complete install:
diff --git a/pkg/terraform/exec/plugins/vendor/github.com/terraform-providers/terraform-provider-azurerm/azurerm/resource_arm_loadbalancer.go b/pkg/terraform/exec/plugins/ven
index 4950399..ad92244 100644
--- a/pkg/terraform/exec/plugins/vendor/github.com/terraform-providers/terraform-provider-azurerm/azurerm/resource_arm_loadbalancer.go
+++ b/pkg/terraform/exec/plugins/vendor/github.com/terraform-providers/terraform-provider-azurerm/azurerm/resource_arm_loadbalancer.go
@@ -68,10 +68,10 @@ func resourceArmLoadBalancer() *schema.Resource {
},
"private_ip_address": {
- Type: schema.TypeString,
- Optional: true,
- Computed: true,
- ValidateFunc: validate.IPv4AddressOrEmpty,
+ Type: schema.TypeString,
+ Optional: true,
+ //Computed: true,
+ //ValidateFunc: validate.IPv4AddressOrEmpty,
},
"private_ip_address_version": {
diff --git a/pkg/terraform/exec/plugins/vendor/github.com/terraform-providers/terraform-provider-azurerm/azurerm/resource_arm_virtual_network.go b/pkg/terraform/exec/plugins/
index 968fc70..762449b 100644
--- a/pkg/terraform/exec/plugins/vendor/github.com/terraform-providers/terraform-provider-azurerm/azurerm/resource_arm_virtual_network.go
+++ b/pkg/terraform/exec/plugins/vendor/github.com/terraform-providers/terraform-provider-azurerm/azurerm/resource_arm_virtual_network.go
@@ -416,7 +416,10 @@ func resourceAzureSubnetHash(v interface{}) int {
if m, ok := v.(map[string]interface{}); ok {
buf.WriteString(m["name"].(string))
- buf.WriteString(m["address_prefix"].(string))
+ //buf.WriteString(m["address_prefix"].(string))
+ if a, ok := m["address_prefix"]; ok {
+ buf.WriteString(a.(string))
+ }
if v, ok := m["security_group"]; ok {
buf.WriteString(v.(string))
The install does bomb out waiting for operators to become stable. With image release:4.3.0-0.nightly-2020-02-17-205936-ipv6.1d3, there are several. With image elease:4.3.0-0.nightly-2020-02-21-091838-ipv6.2d9 it is just authentication and console.
As for the address_prefix, nothing changed in 4.4, except for some dependency clauses in the terraform and removable of a non-used route table:
diff --git a/dns/dns.tf b/dns/dns.tf
index 69c0431..5816448 100644
--- a/dns/dns.tf
+++ b/dns/dns.tf
@@ -6,6 +6,8 @@ locals {
resource "azureprivatedns_zone" "private" {
name = var.cluster_domain
resource_group_name = var.resource_group_name
+
+ depends_on = [azurerm_dns_cname_record.api_external_v4, azurerm_dns_cname_record.api_external_v6]
}
resource "azureprivatedns_zone_virtual_network_link" "network" {
diff --git a/vnet/vnet.tf b/vnet/vnet.tf
index 7328fba..ddbd632 100644
--- a/vnet/vnet.tf
+++ b/vnet/vnet.tf
@@ -7,12 +7,6 @@ resource "azurerm_virtual_network" "cluster_vnet" {
address_space = concat(var.vnet_v4_cidrs, var.vnet_v6_cidrs)
}
-resource "azurerm_route_table" "route_table" {
- name = "${var.cluster_id}-node-routetable"
- location = var.region
- resource_group_name = var.resource_group_name
-}
-
resource "azurerm_subnet" "master_subnet" {
count = var.preexisting_network ? 0 : 1
Is anything here making any sense? Should a pull request be opened with the proposed fixes? Any pointers?
(cc: Clayton since it relates to your original Azure IPv6 modifications and your pending external PR. cc: Dan Mace because this may be part of the cause of bug 1806067.) (In reply to John Hixson from comment #14) > I was no able to get a 4.4 IPv6 install to work. I am trying with image > release:4.3.0-0.nightly-2020-02-17-205936-ipv6.1d3. Is there a 4.4 image > known to work? There is not yet any image that will get you a complete successful install on Azure, but I think at this point the latest official 4.3 and 4.4 images will get you to the point where the installer destroys the bootstrap resources. > For 4.3, with the following diff I get an almost complete install: > "private_ip_address": { > - Type: > schema.TypeString, > - Optional: true, > - Computed: true, > - ValidateFunc: > validate.IPv4AddressOrEmpty, > + Type: > schema.TypeString, > + Optional: true, > + //Computed: true, > + //ValidateFunc: > validate.IPv4AddressOrEmpty, This is one of the files that Clayton modified when adding Azure IPv6 support (https://github.com/openshift/installer/commit/3d00b6c#diff-f5330fc6f6380bb4171ec763f15d803a / https://github.com/terraform-providers/terraform-provider-azurerm/pull/5590). Presumably it's failing because "private_ip_address" actually contains an IPv6 address here and you need to add an appropriate validator for that to pkg/terraform/exec/plugins/vendor/github.com/terraform-providers/terraform-provider-azurerm/azurerm/helpers/validate/network.go. > - buf.WriteString(m["address_prefix"].(string)) > + //buf.WriteString(m["address_prefix"].(string)) > + if a, ok := m["address_prefix"]; ok { > + buf.WriteString(a.(string)) > + } I think you need to use "address_prefixes" here if "address_prefix" isn't set. > The install does bomb out waiting for operators to become stable. Yup. Expected. Ingress isn't working yet. (Although maybe that's partly because of the problems here?) > As for the address_prefix, nothing changed in 4.4, except for some > dependency clauses in the terraform and removable of a non-used route table: Yeah, it seems like I was just confused about it being 4.3-only. (In reply to Dan Winship from comment #8) > lol, ok, it happens in 4.4 too, I just hadn't noticed because apparently I > had never gotten a 4.4 IPv6 install to the point where the installer tears > down the bootstrap host before... I haven't had any luck with 4.4. It always fails with a OSProvisioningTimedOut. For now, I think I will just work on the 4.3 problem and open up a PR for it. I will handle 4.4 separately. I just tried 4.4.0-0.nightly-2020-03-03-110909 (with an appropriate install-config and OPENSHIFT_INSTALL_AZURE_EMULATE_SINGLESTACK_IPV6=true) and got the terraform errors. Can this be synthesized without a complete and successful IPv6 install? ie: 1) `openshift-install create cluster`, wait for failure 2) `openshift-install destroy cluster`, confirm bootstrap host destroy failure 3) fix `openshift-install` 4) `openshift-install destroy cluster`, successful removal of bootstrap host *** This bug has been marked as a duplicate of bug 1805251 *** |