Bug 1837564

Summary: Generic error when installer fails to create resources using terraform
Product: OpenShift Container Platform Reporter: Abhinav Dahiya <adahiya>
Component: InstallerAssignee: Abhinav Dahiya <adahiya>
Installer sub component: openshift-installer QA Contact: Mike Gahagan <mgahagan>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium    
Version: 4.5   
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-13 17:40:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Abhinav Dahiya 2020-05-19 16:32:08 UTC
Description of problem:

When every the installer fails to create resources using terraform, it outputs the errors from terraform as-is but the actual FATAL error is completely generic and doesn't provide any insight into what could have cause the error.

see https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-api-provider-azure/130/pull-ci-openshift-cluster-api-provider-azure-master-e2e-azure/435/artifacts/e2e-azure/container-logs/setup.log

```
level=fatal msg="failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: failed to apply using Terraform"
```

and the tf error
```
level=error
level=error msg="Error: Error Creating/Updating Subnet \"ci-op-641yvqj0-64576-lx9tt-worker-subnet\" (Virtual Network \"ci-op-641yvqj0-64576-lx9tt-vnet\" / Resource Group \"ci-op-641yvqj0-64576-lx9tt-rg\"): network.SubnetsClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status=<nil> Code=\"AnotherOperationInProgress\" Message=\"Another operation on this or dependent resource is in progress. To retrieve status of the operation use uri: https://management.azure.com/subscriptions/d38f1e38-4bed-438e-b227-833f997adf6a/providers/Microsoft.Network/locations/eastus2/operations/ad2d125e-f7c7-4da3-97f5-8df128e7e8e5?api-version=2019-09-01.\" Details=[]"
level=error
level=error msg="  on ../tmp/openshift-install-786806570/vnet/vnet.tf line 22, in resource \"azurerm_subnet\" \"worker_subnet\":"
level=error msg="  22: resource \"azurerm_subnet\" \"worker_subnet\" {"
level=error
```

The installer should provide more context in the error reported to the error, maybe summarize the actual error and provide user with some action item.


Steps to Reproduce:

An easy way is to create an azure cluster with invalid release image, this causes the bootstrap to fail and the master nodes cannot get their ignition.

this manifests as a terraform error `OSProvisioningTimedOut` which doens't really explain anything.

Comment 3 Mike Gahagan 2020-06-02 14:41:45 UTC
Looks like error messages are better now. Here is a failure caused by referencing a vnet that is in a different region than the region that was specified for the cluster:

INFO Credentials loaded from file "/home/m/.azure/osServicePrincipal.json" 
INFO Consuming Install Config from target directory 
INFO Creating infrastructure resources...         
ERROR                                              
ERROR Error: network.InterfacesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="InvalidResourceReference" Message="Resource /subscriptions/REDACTED/resourceGroups/aro-v4-eastus/providers/Microsoft.Network/virtualNetworks/aro-vnet/subnets/master-subnet referenced by resource /subscriptions/REDACTED/resourceGroups/mgahagan-100206-hjj7f-rg/providers/Microsoft.Network/networkInterfaces/mgahagan-100206-hjj7f-bootstrap-nic was not found. Please make sure that the referenced resource exists, and that both resources are in the same region." Details=[] 
ERROR                                              
ERROR   on ../../../../tmp/openshift-install-280793528/bootstrap/main.tf line 100, in resource "azurerm_network_interface" "bootstrap": 
ERROR  100: resource "azurerm_network_interface" "bootstrap" { 
ERROR                                              
ERROR                                              
ERROR                                              
ERROR Error: network.InterfacesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="InvalidResourceReference" Message="Resource /subscriptions/REDACTED/resourceGroups/aro-v4-eastus/providers/Microsoft.Network/virtualNetworks/aro-vnet/subnets/master-subnet referenced by resource /subscriptions/REDACTED/resourceGroups/mgahagan-100206-hjj7f-rg/providers/Microsoft.Network/networkInterfaces/mgahagan-100206-hjj7f-master1-nic was not found. Please make sure that the referenced resource exists, and that both resources are in the same region." Details=[] 
ERROR                                              
ERROR   on ../../../../tmp/openshift-install-280793528/master/master.tf line 9, in resource "azurerm_network_interface" "master": 
ERROR    9: resource "azurerm_network_interface" "master" { 
ERROR                                              
ERROR                                              
ERROR                                              
ERROR Error: network.InterfacesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="InvalidResourceReference" Message="Resource /subscriptions/REDACTED/resourceGroups/aro-v4-eastus/providers/Microsoft.Network/virtualNetworks/aro-vnet/subnets/master-subnet referenced by resource /subscriptions/REDACTED/resourceGroups/mgahagan-100206-hjj7f-rg/providers/Microsoft.Network/networkInterfaces/mgahagan-100206-hjj7f-master0-nic was not found. Please make sure that the referenced resource exists, and that both resources are in the same region." Details=[] 
ERROR                                              
ERROR   on ../../../../tmp/openshift-install-280793528/master/master.tf line 9, in resource "azurerm_network_interface" "master": 
ERROR    9: resource "azurerm_network_interface" "master" { 
ERROR                                              
ERROR                                              
ERROR                                              
ERROR Error: network.InterfacesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="InvalidResourceReference" Message="Resource /subscriptions/REDACTED/resourceGroups/aro-v4-eastus/providers/Microsoft.Network/virtualNetworks/aro-vnet/subnets/master-subnet referenced by resource /subscriptions/REDACTED/resourceGroups/mgahagan-100206-hjj7f-rg/providers/Microsoft.Network/networkInterfaces/mgahagan-100206-hjj7f-master2-nic was not found. Please make sure that the referenced resource exists, and that both resources are in the same region." Details=[] 
ERROR                                              
ERROR   on ../../../../tmp/openshift-install-280793528/master/master.tf line 9, in resource "azurerm_network_interface" "master": 
ERROR    9: resource "azurerm_network_interface" "master" { 
ERROR                                              
ERROR                                              
ERROR                                              
ERROR Error: Error Creating/Updating Load Balancer "mgahagan-100206-hjj7f-internal" (Resource Group "mgahagan-100206-hjj7f-rg"): network.LoadBalancersClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="InvalidResourceReference" Message="Resource /subscriptions/REDACTED/resourceGroups/ARO-V4-EASTUS/providers/Microsoft.Network/virtualNetworks/ARO-VNET referenced by resource /subscriptions/REDACTED/resourceGroups/mgahagan-100206-hjj7f-rg/providers/Microsoft.Network/loadBalancers/mgahagan-100206-hjj7f-internal was not found. Please make sure that the referenced resource exists, and that both resources are in the same region." Details=[{"code":"NotFound","message":"Resource /subscriptions/REDACTED/resourceGroups/ARO-V4-EASTUS/providers/Microsoft.Network/virtualNetworks/ARO-VNET not found."}] 
ERROR                                              
ERROR   on ../../../../tmp/openshift-install-280793528/vnet/internal-lb.tf line 6, in resource "azurerm_lb" "internal": 
ERROR    6: resource "azurerm_lb" "internal" {     
ERROR                                              
ERROR                                              
FATAL failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply Terraform: failed to complete the change

The "Please make sure that the referenced resource exists, and that both resources are in the same region." comes directly from Azure.
Verified with 4.5.0-0.nightly-2020-06-01-165039

Comment 4 errata-xmlrpc 2020-07-13 17:40:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409