Bug 1805936 - installer crashes while destroying bootstrap host on Azure IPv6 cluster
Summary: installer crashes while destroying bootstrap host on Azure IPv6 cluster
Keywords:
Status: CLOSED DUPLICATE of bug 1805251
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.4
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.5.0
Assignee: John Hixson
QA Contact: Gaoyun Pei
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-02-21 17:51 UTC by Dan Winship
Modified: 2020-04-15 02:33 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1808969 1808973 (view as bug list)
Environment:
Last Closed: 2020-03-17 01:10:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 3247 0 None closed bug 1805251: Master azure terraform address prefixes 2021-01-01 04:51:30 UTC

Description Dan Winship 2020-02-21 17:51:41 UTC
When installing an Azure IPv6 cluster with the 4.3 installer, it errors out while destroying the bootstrap resources

ERROR                                              
ERROR Error: rpc error: code = Unavailable desc = transport is closing 
ERROR                                              
ERROR                                              
ERROR                                              
ERROR Error: rpc error: code = Unavailable desc = transport is closing 
ERROR                                              
ERROR                                              
ERROR                                              
ERROR Error: rpc error: code = Unavailable desc = transport is closing 
ERROR                                              
ERROR                                              
ERROR                                              
ERROR Error: rpc error: code = Unavailable desc = transport is closing 
ERROR                                              
ERROR                                              
ERROR                                              
ERROR Error: rpc error: code = Unavailable desc = transport is closing 
ERROR                                              
ERROR                                              
ERROR                                              
ERROR Error: rpc error: code = Unavailable desc = transport is closing 
ERROR                                              
ERROR                                              
ERROR                                              
ERROR Error: rpc error: code = Unavailable desc = transport is closing 
ERROR                                              
ERROR                                              
ERROR                                              
ERROR Error: rpc error: code = Unavailable desc = transport is closing 
ERROR                                              
ERROR                                              
FATAL Terraform destroy: failed to destroy using Terraform 


This does not seem to happen with IPv4 installs, and does not seem to happen with the 4.4 installer.

Comment 8 Dan Winship 2020-02-25 02:21:26 UTC
lol, ok, it happens in 4.4 too, I just hadn't noticed because apparently I had never gotten a 4.4 IPv6 install to the point where the installer tears down the bootstrap host before...

Comment 9 Dan Winship 2020-02-25 20:22:04 UTC
> Target Release: --- → 4.5.0

note that if you're going to try to debug this against git master, you'll have to fix bug 1805251 first...

Comment 10 John Hixson 2020-02-25 23:36:24 UTC
I can confirm that I receive the same errors when following the instructions.

Comment 11 John Hixson 2020-02-27 08:25:57 UTC
I have found the problem and have a patch for it. I will write up the details tomorrow.

Comment 12 John Hixson 2020-02-28 01:41:14 UTC
The problem seems to be with the address_prefix field when terraform is refreshing state for the subnets within the virtual network. The "address_prefix" field is null, while "address_prefixes" gets populated. The crash occurs in this section of the terraform code: 

terraform-provider-azurerm/azurerm/internal/services/network/resource_arm_virtual_network.go:

func resourceAzureSubnetHash(v interface{}) int {
        var buf bytes.Buffer

        if m, ok := v.(map[string]interface{}); ok {
                buf.WriteString(m["name"].(string))

                // This is causing the crash
                buf.WriteString(m["address_prefix"].(string))

                if v, ok := m["security_group"]; ok {
                        buf.WriteString(v.(string))
                }
        }

        return hashcode.String(buf.String())
}

The fix I have tested with is:


func resourceAzureSubnetHash(v interface{}) int {
        var buf bytes.Buffer

        if m, ok := v.(map[string]interface{}); ok {
                buf.WriteString(m["name"].(string))

                if v, ok := m["address_prefix"]; ok {
                        buf.WriteString(v.(string))
                }

                if v, ok := m["security_group"]; ok {
                        buf.WriteString(v.(string))
                }
        }

        return hashcode.String(buf.String())
}

There are still problems with this fixed though. I don't know if this is masking another problem. Was this known to work before this problem was found? I can submit a patch for this, but it looks like more work is required.

Comment 13 Dan Winship 2020-02-28 02:30:05 UTC
> Was this known to work before this problem was found?

It works in 4.4. Presumably this function changed in the rebase?

Probably it needs to look at both address_prefix and address_prefixes, and use whichever is set.

Comment 14 John Hixson 2020-02-29 03:28:23 UTC
I was no able to get a 4.4 IPv6 install to work. I am trying with image release:4.3.0-0.nightly-2020-02-17-205936-ipv6.1d3. Is there a 4.4 image known to work? I was following the instructions as per the document in this ticket. For 4.3, with the following diff I get an almost complete install: 

diff --git a/pkg/terraform/exec/plugins/vendor/github.com/terraform-providers/terraform-provider-azurerm/azurerm/resource_arm_loadbalancer.go b/pkg/terraform/exec/plugins/ven
index 4950399..ad92244 100644
--- a/pkg/terraform/exec/plugins/vendor/github.com/terraform-providers/terraform-provider-azurerm/azurerm/resource_arm_loadbalancer.go
+++ b/pkg/terraform/exec/plugins/vendor/github.com/terraform-providers/terraform-provider-azurerm/azurerm/resource_arm_loadbalancer.go
@@ -68,10 +68,10 @@ func resourceArmLoadBalancer() *schema.Resource {
                                                },
 
                                                "private_ip_address": {
-                                                       Type:         schema.TypeString,
-                                                       Optional:     true,
-                                                       Computed:     true,
-                                                       ValidateFunc: validate.IPv4AddressOrEmpty,
+                                                       Type:     schema.TypeString,
+                                                       Optional: true,
+                                                       //Computed: true,
+                                                       //ValidateFunc: validate.IPv4AddressOrEmpty,
                                                },
 
                                                "private_ip_address_version": {
diff --git a/pkg/terraform/exec/plugins/vendor/github.com/terraform-providers/terraform-provider-azurerm/azurerm/resource_arm_virtual_network.go b/pkg/terraform/exec/plugins/
index 968fc70..762449b 100644
--- a/pkg/terraform/exec/plugins/vendor/github.com/terraform-providers/terraform-provider-azurerm/azurerm/resource_arm_virtual_network.go
+++ b/pkg/terraform/exec/plugins/vendor/github.com/terraform-providers/terraform-provider-azurerm/azurerm/resource_arm_virtual_network.go
@@ -416,7 +416,10 @@ func resourceAzureSubnetHash(v interface{}) int {
 
        if m, ok := v.(map[string]interface{}); ok {
                buf.WriteString(m["name"].(string))
-               buf.WriteString(m["address_prefix"].(string))
+               //buf.WriteString(m["address_prefix"].(string))
+               if a, ok := m["address_prefix"]; ok {
+                       buf.WriteString(a.(string))
+               }
 
                if v, ok := m["security_group"]; ok {
                        buf.WriteString(v.(string))


The install does bomb out waiting for operators to become stable. With image release:4.3.0-0.nightly-2020-02-17-205936-ipv6.1d3, there are several. With image elease:4.3.0-0.nightly-2020-02-21-091838-ipv6.2d9 it is just authentication and console. 

As for the address_prefix, nothing changed in 4.4, except for some dependency clauses in the terraform and removable of a non-used route table:

diff --git a/dns/dns.tf b/dns/dns.tf
index 69c0431..5816448 100644
--- a/dns/dns.tf
+++ b/dns/dns.tf
@@ -6,6 +6,8 @@ locals {
 resource "azureprivatedns_zone" "private" {
   name                = var.cluster_domain
   resource_group_name = var.resource_group_name
+
+  depends_on = [azurerm_dns_cname_record.api_external_v4, azurerm_dns_cname_record.api_external_v6]
 }
 
 resource "azureprivatedns_zone_virtual_network_link" "network" {
diff --git a/vnet/vnet.tf b/vnet/vnet.tf
index 7328fba..ddbd632 100644
--- a/vnet/vnet.tf
+++ b/vnet/vnet.tf
@@ -7,12 +7,6 @@ resource "azurerm_virtual_network" "cluster_vnet" {
   address_space       = concat(var.vnet_v4_cidrs, var.vnet_v6_cidrs)
 }
 
-resource "azurerm_route_table" "route_table" {
-  name                = "${var.cluster_id}-node-routetable"
-  location            = var.region
-  resource_group_name = var.resource_group_name
-}
-
 resource "azurerm_subnet" "master_subnet" {
   count = var.preexisting_network ? 0 : 1


Is anything here making any sense? Should a pull request be opened with the proposed fixes? Any pointers?

Comment 15 Dan Winship 2020-02-29 14:27:57 UTC
(cc: Clayton since it relates to your original Azure IPv6 modifications and your pending external PR.
cc: Dan Mace because this may be part of the cause of bug 1806067.)

(In reply to John Hixson from comment #14)
> I was no able to get a 4.4 IPv6 install to work. I am trying with image
> release:4.3.0-0.nightly-2020-02-17-205936-ipv6.1d3. Is there a 4.4 image
> known to work?

There is not yet any image that will get you a complete successful install on Azure, but I think at this point the latest official 4.3 and 4.4 images will get you to the point where the installer destroys the bootstrap resources.

> For 4.3, with the following diff I get an almost complete install: 

>                                                 "private_ip_address": {
> -                                                       Type:        
> schema.TypeString,
> -                                                       Optional:     true,
> -                                                       Computed:     true,
> -                                                       ValidateFunc:
> validate.IPv4AddressOrEmpty,
> +                                                       Type:    
> schema.TypeString,
> +                                                       Optional: true,
> +                                                       //Computed: true,
> +                                                       //ValidateFunc:
> validate.IPv4AddressOrEmpty,

This is one of the files that Clayton modified when adding Azure IPv6 support (https://github.com/openshift/installer/commit/3d00b6c#diff-f5330fc6f6380bb4171ec763f15d803a / https://github.com/terraform-providers/terraform-provider-azurerm/pull/5590). Presumably it's failing because "private_ip_address" actually contains an IPv6 address here and you need to add an appropriate validator for that to pkg/terraform/exec/plugins/vendor/github.com/terraform-providers/terraform-provider-azurerm/azurerm/helpers/validate/network.go.


> -               buf.WriteString(m["address_prefix"].(string))
> +               //buf.WriteString(m["address_prefix"].(string))
> +               if a, ok := m["address_prefix"]; ok {
> +                       buf.WriteString(a.(string))
> +               }

I think you need to use "address_prefixes" here if "address_prefix" isn't set.


> The install does bomb out waiting for operators to become stable.

Yup. Expected. Ingress isn't working yet. (Although maybe that's partly because of the problems here?)

> As for the address_prefix, nothing changed in 4.4, except for some
> dependency clauses in the terraform and removable of a non-used route table:

Yeah, it seems like I was just confused about it being 4.3-only.

Comment 16 John Hixson 2020-03-03 00:19:33 UTC
(In reply to Dan Winship from comment #8)
> lol, ok, it happens in 4.4 too, I just hadn't noticed because apparently I
> had never gotten a 4.4 IPv6 install to the point where the installer tears
> down the bootstrap host before...

I haven't had any luck with 4.4. It always fails with a OSProvisioningTimedOut. For now, I think I will just work on the 4.3 problem and open up a PR for it. I will handle 4.4 separately.

Comment 17 Dan Winship 2020-03-03 14:21:51 UTC
I just tried 4.4.0-0.nightly-2020-03-03-110909 (with an appropriate install-config and OPENSHIFT_INSTALL_AZURE_EMULATE_SINGLESTACK_IPV6=true) and got the terraform errors.

Comment 18 Scott Dodson 2020-03-03 16:49:44 UTC
Can this be synthesized without a complete and successful IPv6 install?

ie:
1) `openshift-install create cluster`, wait for failure
2) `openshift-install destroy cluster`, confirm bootstrap host destroy failure
3) fix `openshift-install`
4) `openshift-install destroy cluster`, successful removal of bootstrap host

Comment 19 John Hixson 2020-03-17 01:10:01 UTC

*** This bug has been marked as a duplicate of bug 1805251 ***


Note You need to log in before you can comment on or make changes to this bug.