Bug 1805251

Summary:	vendor rebase in master breaks IPv6 on Azure
Product:	OpenShift Container Platform	Reporter:	Dan Winship <danw>
Component:	Installer	Assignee:	John Hixson <jhixson>
Installer sub component:	openshift-installer	QA Contact:	Gaoyun Pei <gpei>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	urgent
Priority:	urgent	CC:	adahiya, dmace, gpei, jhixson, sdodson, wking
Version:	unspecified
Target Milestone:	---
Target Release:	4.5.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-07-13 17:16:23 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1803321, 1808969

Description Dan Winship 2020-02-20 15:05:25 UTC

https://github.com/openshift/installer/pull/2745 breaks IPv6 on Azure. I mean, not that IPv6 on Azure actually works yet, but now it works WAY less:

danw@p50:installer (master)> yq -y .networking $(CLUSTER_DIR)/install-config.yaml 
clusterNetwork:
  - cidr: fd01::/48
    hostPrefix: 64
machineNetwork:
  - cidr: 10.0.0.0/16
  - cidr: fc00::/48
networkType: OVNKubernetes
serviceNetwork:
  - fd02::/112

danw@p50:installer (master)> OPENSHIFT_INSTALL_AZURE_EMULATE_SINGLESTACK_IPV6=true OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE=registry.svc.ci.openshift.org/ocp/release:4.4.0-0.nightly-2020-02-19-123359 ./bin/openshift-install --dir $(CLUSTER_DIR) create cluster
INFO Credentials loaded from file "/home/danw/.azure/osServicePrincipal.json" 
WARNING Found override for release image. Please be warned, this is not advised 
INFO Consuming Install Config from target directory 
INFO Creating infrastructure resources...         
ERROR                                              
ERROR Warning: "resource_group_name": [DEPRECATED] This field has been deprecated and is no longer used - will be removed in 2.0 of the Azure Provider 
ERROR                                              
ERROR   on ../../../../../../tmp/openshift-install-343119345/main.tf line 166, in resource "azurerm_storage_container" "vhd": 
ERROR  166: resource "azurerm_storage_container" "vhd" { 
ERROR                                              
ERROR (and 3 more similar warnings elsewhere)      
ERROR                                              
ERROR                                              
ERROR Error: Code="OSProvisioningTimedOut" Message="OS Provisioning for VM 'dwinship-ipv6-tbgwp-master-2' did not finish in the allotted time. The VM may still finish provisioning successfully. Please check provisioning state later. Also, make sure the image has been properly prepared (generalized).\r\n * Instructions for Windows: https://azure.microsoft.com/documentation/articles/virtual-machines-windows-upload-image/ \r\n * Instructions for Linux: https://azure.microsoft.com/documentation/articles/virtual-machines-linux-capture-image/ " 
ERROR                                              
ERROR   on ../../../../../../tmp/openshift-install-343119345/master/master.tf line 81, in resource "azurerm_virtual_machine" "master": 
ERROR   81: resource "azurerm_virtual_machine" "master" { 
ERROR                                              
ERROR                                              
ERROR                                              
ERROR Error: Code="OSProvisioningTimedOut" Message="OS Provisioning for VM 'dwinship-ipv6-tbgwp-master-1' did not finish in the allotted time. The VM may still finish provisioning successfully. Please check provisioning state later. Also, make sure the image has been properly prepared (generalized).\r\n * Instructions for Windows: https://azure.microsoft.com/documentation/articles/virtual-machines-windows-upload-image/ \r\n * Instructions for Linux: https://azure.microsoft.com/documentation/articles/virtual-machines-linux-capture-image/ " 
ERROR                                              
ERROR   on ../../../../../../tmp/openshift-install-343119345/master/master.tf line 81, in resource "azurerm_virtual_machine" "master": 
ERROR   81: resource "azurerm_virtual_machine" "master" { 
ERROR                                              
ERROR                                              
ERROR                                              
ERROR Error: Code="OSProvisioningTimedOut" Message="OS Provisioning for VM 'dwinship-ipv6-tbgwp-master-0' did not finish in the allotted time. The VM may still finish provisioning successfully. Please check provisioning state later. Also, make sure the image has been properly prepared (generalized).\r\n * Instructions for Windows: https://azure.microsoft.com/documentation/articles/virtual-machines-windows-upload-image/ \r\n * Instructions for Linux: https://azure.microsoft.com/documentation/articles/virtual-machines-linux-capture-image/ " 
ERROR                                              
ERROR   on ../../../../../../tmp/openshift-install-343119345/master/master.tf line 81, in resource "azurerm_virtual_machine" "master": 
ERROR   81: resource "azurerm_virtual_machine" "master" { 
ERROR                                              
ERROR                                              
ERROR                                              
ERROR Error: "frontend_ip_configuration.1.private_ip_address" is not a valid IPv4 address: "fc00::ffff:ffff:ffff:fffe" 
ERROR                                              
ERROR   on ../../../../../../tmp/openshift-install-343119345/vnet/internal-lb.tf line 6, in resource "azurerm_lb" "internal": 
ERROR    6: resource "azurerm_lb" "internal" {     
ERROR                                              
ERROR                                              
FATAL failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply using Terraform 



Expected result: the install fails, but not with a terraform error

Comment 2 Abhinav Dahiya 2020-02-20 21:16:09 UTC

The vendor matches the forked copy at openshift/terraform-providers-azurerm.1-openshift-2
So i think there is a bug in the CARRY patch.

Comment 3 Dan Winship 2020-02-20 21:57:04 UTC

I didn't mean to imply the bug was necessarily in the vendored code itself. I was guessing it was a bug in https://github.com/openshift/installer/pull/2745/commits/290df6a4

Comment 4 John Hixson 2020-03-13 01:26:19 UTC

I can confirm that https://github.com/openshift/terraform-provider-azurerm/pull/1 fixes this issue and mostly completes the install. I've tested this against the master branch of openshift-install. Here is some of the output:

INFO Waiting up to 30m0s for the cluster at https://api.jh456.installer.azure.devcluster.openshift.com:6443 to initialize... 
DEBUG Still waiting for the cluster to initialize: Working towards 4.5.0-0.nightly-2020-03-12-013015: 99% complete 
DEBUG Still waiting for the cluster to initialize: Some cluster operators are still updating: authentication, console, csi-snapshot-controller, image-registry, ingress, kube-storage-version-migrator, monitoring 
ERROR Cluster operator authentication Degraded is True with IngressStateEndpoints_MissingSubsets::RouteStatus_FailedHost: IngressStateEndpointsDegraded: No subsets found for the endpoints of oauth-server
RouteStatusDegraded: route is not available at canonical host oauth-openshift.apps.jh456.installer.azure.devcluster.openshift.com: [] 
INFO Cluster operator authentication Progressing is Unknown with NoData:  
INFO Cluster operator authentication Available is Unknown with NoData:  
INFO Cluster operator console Progressing is True with RouteSync_FailedHost: RouteSyncProgressing: route is not available at canonical host [] 
INFO Cluster operator console Available is Unknown with NoData:  
INFO Cluster operator image-registry Available is False with NoReplicasAvailable: The deployment does not have available replicas 
INFO Cluster operator image-registry Progressing is True with DeploymentNotCompleted: The deployment has not completed 
ERROR Cluster operator ingress Degraded is True with IngressControllersDegraded: Some ingresscontrollers are degraded: default 
INFO Cluster operator ingress Progressing is True with Reconciling: Not all ingress controllers are available.
Moving to release version "4.5.0-0.nightly-2020-03-12-013015".
Moving to ingress-controller image version "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:559e38c1171467ee375f2ea873495624920accf3ae0ff4b99cae98964e708897". 
INFO Cluster operator ingress Available is False with IngressUnavailable: Not all ingress controllers are available. 
INFO Cluster operator insights Disabled is False with :  
INFO Cluster operator kube-storage-version-migrator Available is False with _NoMigratorPod: Available: deployment/migrator.openshift-kube-storage-version-migrator: no replicas are available 
INFO Cluster operator monitoring Available is False with :  
INFO Cluster operator monitoring Progressing is True with RollOutInProgress: Rolling out the stack. 
ERROR Cluster operator monitoring Degraded is True with UpdatingUserWorkloadThanosRulerFailed: Failed to rollout the stack. Error: running task Updating User Workload Thanos Ruler failed: failed to retrieve Grafana datasources config: secrets "grafana-datasources" not found 
ERROR Cluster operator network Degraded is True with RolloutHung: DaemonSet "openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - last change 2020-03-13T00:49:46Z 
INFO Cluster operator network Progressing is True with Deploying: DaemonSet "openshift-ovn-kubernetes/ovnkube-node" is not available (awaiting 3 nodes) 
FATAL failed to initialize the cluster: Some cluster operators are still updating: authentication, console, csi-snapshot-controller, image-registry, ingress, kube-storage-version-migrator, monitoring 


[jhixson@redlap TestIPv6-openshift-master]$ oc get node
NAME                                  STATUS     ROLES    AGE   VERSION
jh456-dttmv-master-0                  Ready      master   42m   v1.17.1
jh456-dttmv-master-1                  Ready      master   41m   v1.17.1
jh456-dttmv-master-2                  Ready      master   43m   v1.17.1
jh456-dttmv-worker-centralus1-pl26n   NotReady   worker   26m   v1.17.1
jh456-dttmv-worker-centralus2-8nnct   NotReady   worker   26m   v1.17.1
jh456-dttmv-worker-centralus3-6zxtp   NotReady   worker   26m   v1.17.1

[jhixson@redlap TestIPv6-openshift-master]$ oc get clusteroperators
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                                                                 Unknown     Unknown       True       41m
cloud-credential                           4.5.0-0.nightly-2020-03-12-013015   True        False         False      46m
cluster-autoscaler                         4.5.0-0.nightly-2020-03-12-013015   True        False         False      34m
console                                    4.5.0-0.nightly-2020-03-12-013015   Unknown     True          False      35m
dns                                        4.5.0-0.nightly-2020-03-12-013015   True        False         False      40m
etcd                                       4.5.0-0.nightly-2020-03-12-013015   True        False         False      38m
image-registry                                                                 False       True          False      35m
ingress                                    unknown                             False       True          True       35m
insights                                   4.5.0-0.nightly-2020-03-12-013015   True        False         False      35m
kube-apiserver                             4.5.0-0.nightly-2020-03-12-013015   True        False         False      38m
kube-controller-manager                    4.5.0-0.nightly-2020-03-12-013015   True        False         False      39m
kube-scheduler                             4.5.0-0.nightly-2020-03-12-013015   True        False         False      39m
kube-storage-version-migrator              4.5.0-0.nightly-2020-03-12-013015   False       False         False      41m
machine-api                                4.5.0-0.nightly-2020-03-12-013015   True        False         False      35m
machine-config                             4.5.0-0.nightly-2020-03-12-013015   True        False         False      40m
marketplace                                4.5.0-0.nightly-2020-03-12-013015   True        False         False      35m
monitoring                                                                     False       True          True       34m
network                                    4.5.0-0.nightly-2020-03-12-013015   True        True          True       42m
node-tuning                                4.5.0-0.nightly-2020-03-12-013015   True        False         False      41m
openshift-apiserver                        4.5.0-0.nightly-2020-03-12-013015   True        False         False      34m
openshift-controller-manager               4.5.0-0.nightly-2020-03-12-013015   True        False         False      35m
openshift-samples                          4.5.0-0.nightly-2020-03-12-013015   True        False         False      34m
operator-lifecycle-manager                 4.5.0-0.nightly-2020-03-12-013015   True        False         False      40m
operator-lifecycle-manager-catalog         4.5.0-0.nightly-2020-03-12-013015   True        False         False      40m
operator-lifecycle-manager-packageserver   4.5.0-0.nightly-2020-03-12-013015   True        False         False      34m
service-ca                                 4.5.0-0.nightly-2020-03-12-013015   True        False         False      41m
service-catalog-apiserver                  4.5.0-0.nightly-2020-03-12-013015   True        False         False      41m
service-catalog-controller-manager         4.5.0-0.nightly-2020-03-12-013015   True        False         False      41m
storage                                    4.5.0-0.nightly-2020-03-12-013015   True        False         False      35m

tested with OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE=registry.svc.ci.openshift.org/ocp/release:4.5.0-0.nightly-2020-03-12-013015

Comment 5 John Hixson 2020-03-17 01:10:01 UTC

*** Bug 1805936 has been marked as a duplicate of this bug. ***

Comment 6 John Hixson 2020-03-17 01:12:43 UTC

*** Bug 1808973 has been marked as a duplicate of this bug. ***

Comment 7 John Hixson 2020-03-17 01:13:31 UTC

*** Bug 1808969 has been marked as a duplicate of this bug. ***

Comment 8 John Hixson 2020-03-17 01:15:18 UTC

PR: https://github.com/openshift/installer/pull/3247

Comment 11 Gaoyun Pei 2020-03-18 06:19:41 UTC

Verify this bug using 4.5.0-0.nightly-2020-03-17-232152.

Masters and workers are created successfully.

# oc get node -o wide
NAME                                     STATUS   ROLES    AGE    VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                CONTAINER-RUNTIME
gpei-45t-8c6dd-master-0                  Ready    master   150m   v1.17.1   fd00::4         <none>        Red Hat Enterprise Linux CoreOS 45.81.202003172018-0 (Ootpa)   4.18.0-147.5.1.el8_1.x86_64   cri-o://1.17.0-9.dev.rhaos4.4.gitdfc8414.el8
gpei-45t-8c6dd-master-1                  Ready    master   146m   v1.17.1   fd00::5         <none>        Red Hat Enterprise Linux CoreOS 45.81.202003172018-0 (Ootpa)   4.18.0-147.5.1.el8_1.x86_64   cri-o://1.17.0-9.dev.rhaos4.4.gitdfc8414.el8
gpei-45t-8c6dd-master-2                  Ready    master   149m   v1.17.1   fd00::7         <none>        Red Hat Enterprise Linux CoreOS 45.81.202003172018-0 (Ootpa)   4.18.0-147.5.1.el8_1.x86_64   cri-o://1.17.0-9.dev.rhaos4.4.gitdfc8414.el8
gpei-45t-8c6dd-worker-centralus2-gcg9x   Ready    worker   126m   v1.17.1   fd00:0:0:1::4   <none>        Red Hat Enterprise Linux CoreOS 45.81.202003172018-0 (Ootpa)   4.18.0-147.5.1.el8_1.x86_64   cri-o://1.17.0-9.dev.rhaos4.4.gitdfc8414.el8
gpei-45t-8c6dd-worker-centralus3-6dndj   Ready    worker   127m   v1.17.1   fd00:0:0:1::5   <none>        Red Hat Enterprise Linux CoreOS 45.81.202003172018-0 (Ootpa)   4.18.0-147.5.1.el8_1.x86_64   cri-o://1.17.0-9.dev.rhaos4.4.gitdfc8414.el8

Bootstrap resources were destroyed.

level=debug msg="Bootstrap status: complete"
level=info msg="Destroying the bootstrap resources..."
...
level=debug msg="Destroy complete! Resources: 11 destroyed."

Comment 13 errata-xmlrpc 2020-07-13 17:16:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409