Description of problem: During our ACM automated system test, we are seeing issues with provisioning Azure managed clusters about 20 percent of the time. When this error occurs, the ClusterDeployment never completes. We see the following in the provision log: ``` 2020-10-22T05:53:39.781554721Z time="2020-10-22T05:53:39Z" level=debug msg="checking for SSH private key" installID=s7svrt5l 2020-10-22T05:53:39.781832546Z time="2020-10-22T05:53:39Z" level=info msg="initializing ssh agent with 1 keys" installID=s7svrt5l 2020-10-22T05:53:39.781832546Z time="2020-10-22T05:53:39Z" level=debug msg="no SSH_AUTH_SOCK defined. starting ssh-agent" installID=s7svrt5l 2020-10-22T05:53:39.833016860Z Identity added: /tmp/ssh-privatekey (/tmp/ssh-privatekey) 2020-10-22T05:53:39.833466318Z time="2020-10-22T05:53:39Z" level=info msg="added ssh private key to agent" installID=s7svrt5l key=/tmp/ssh-privatekey 2020-10-22T05:53:39.833495019Z time="2020-10-22T05:53:39Z" level=info msg="waiting for files to be available: [/output/openshift-install /output/oc]" installID=s7svrt5l 2020-10-22T05:53:39.833506784Z time="2020-10-22T05:53:39Z" level=info msg="found file" installID=s7svrt5l path=/output/openshift-install 2020-10-22T05:53:39.833517508Z time="2020-10-22T05:53:39Z" level=info msg="found file" installID=s7svrt5l path=/output/oc 2020-10-22T05:53:39.833525722Z time="2020-10-22T05:53:39Z" level=info msg="all files found, ready to proceed" installID=s7svrt5l 2020-10-22T05:53:40.025785576Z time="2020-10-22T05:53:40Z" level=info msg="copied /output/openshift-install to /home/hive/openshift-install" installID=s7svrt5l 2020-10-22T05:53:40.078305590Z time="2020-10-22T05:53:40Z" level=info msg="copied /output/oc to /home/hive/oc" installID=s7svrt5l 2020-10-22T05:53:40.078305590Z time="2020-10-22T05:53:40Z" level=info msg="copying install-config.yaml" installID=s7svrt5l 2020-10-22T05:53:40.078444536Z time="2020-10-22T05:53:40Z" level=info msg="waiting for files to be available: [/output/.openshift_install.log]" installID=s7svrt5l 2020-10-22T05:53:40.079673098Z time="2020-10-22T05:53:40Z" level=info msg="copied /installconfig/install-config.yaml to /output/install-config.yaml" installID=s7svrt5l 2020-10-22T05:53:40.079673098Z time="2020-10-22T05:53:40Z" level=info msg="cleaning up from past install attempts" installID=s7svrt5l 2020-10-22T05:53:40.085505956Z time="2020-10-22T05:53:40Z" level=debug msg="object does not exist" installID=s7svrt5l object=console-ui-test-cluster-azure-191574659/console-ui-test-cluster-azure-191574659-1-knxkh-admin-kubeconfig 2020-10-22T05:53:40.090080580Z time="2020-10-22T05:53:40Z" level=debug msg="object does not exist" installID=s7svrt5l object=console-ui-test-cluster-azure-191574659/console-ui-test-cluster-azure-191574659-1-knxkh-admin-password 2020-10-22T05:53:40.090097055Z time="2020-10-22T05:53:40Z" level=info msg="InfraID set from failed install, running deprovison" installID=s7svrt5l 2020-10-22T05:53:40.090232640Z time="2020-10-22T05:53:40Z" level=info msg="Credentials loaded from file \"/.azure/osServicePrincipal.json\"" 2020-10-22T05:53:40.090330191Z time="2020-10-22T05:53:40Z" level=debug msg="deleting public records" installID=s7svrt5l 2020-10-22T05:53:40.546215786Z time="2020-10-22T05:53:40Z" level=debug msg="[failed to list dns zone: dns.ZonesClient#ListByResourceGroup: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code=\"ResourceGroupNotFound\" Message=\"Resource group 'console-ui-test-clust-smzfs-rg' could not be found.\", failed to list private dns zone: privatedns.PrivateZonesClient#ListByResourceGroup: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code=\"ResourceGroupNotFound\" Message=\"Resource group 'console-ui-test-clust-smzfs-rg' could not be found.\"]" installID=s7svrt5l 2020-10-22T05:53:41.546366376Z time="2020-10-22T05:53:41Z" level=debug msg="deleting public records" installID=s7svrt5l 2020-10-22T05:53:41.588157528Z time="2020-10-22T05:53:41Z" level=debug msg="[failed to list dns zone: dns.ZonesClient#ListByResourceGroup: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code=\"ResourceGroupNotFound\" Message=\"Resource group 'console-ui-test-clust-smzfs-rg' could not be found.\", failed to list private dns zone: privatedns.PrivateZonesClient#ListByResourceGroup: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code=\"ResourceGroupNotFound\" Message=\"Resource group 'console-ui-test-clust-smzfs-rg' could not be found.\"]" installID=s7svrt5l 2020-10-22T05:53:42.588443026Z time="2020-10-22T05:53:42Z" level=debug msg="deleting public records" installID=s7svrt5l 2020-10-22T05:53:42.629777826Z time="2020-10-22T05:53:42Z" level=debug msg="[failed to list dns zone: dns.ZonesClient#ListByResourceGroup: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code=\"ResourceGroupNotFound\" Message=\"Resource group 'console-ui-test-clust-smzfs-rg' could not be found.\", failed to list private dns zone: privatedns.PrivateZonesClient#ListByResourceGroup: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code=\"ResourceGroupNotFound\" Message=\"Resource group 'console-ui-test-clust-smzfs-rg' could not be found.\"]" installID=s7svrt5l 2020-10-22T05:53:43.630030648Z time="2020-10-22T05:53:43Z" level=debug msg="deleting public records" installID=s7svrt5l 2020-10-22T05:53:43.671064510Z time="2020-10-22T05:53:43Z" level=debug msg="[failed to list dns zone: dns.ZonesClient#ListByResourceGroup: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code=\"ResourceGroupNotFound\" Message=\"Resource group 'console-ui-test-clust-smzfs-rg' could not be found.\", failed to list private dns zone: privatedns.PrivateZonesClient#ListByResourceGroup: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code=\"ResourceGroupNotFound\" Message=\"Resource group 'console-ui-test-clust-smzfs-rg' could not be found.\"]" installID=s7svrt5l 2020-10-22T05:53:44.671281919Z time="2020-10-22T05:53:44Z" level=debug msg="deleting public records" installID=s7svrt5l 2020-10-22T05:53:44.732483391Z time="2020-10-22T05:53:44Z" level=debug msg="[failed to list dns zone: dns.ZonesClient#ListByResourceGroup: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code=\"ResourceGroupNotFound\" Message=\"Resource group 'console-ui-test-clust-smzfs-rg' could not be found.\", failed to list private dns zone: privatedns.PrivateZonesClient#ListByResourceGroup: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code=\"ResourceGroupNotFound\" Message=\"Resource group 'console-ui-test-clust-smzfs-rg' could not be found.\"]" installID=s7svrt5l 2020-10-22T05:53:45.732693535Z time="2020-10-22T05:53:45Z" level=debug msg="deleting public records" installID=s7svrt5l 2020-10-22T05:53:45.771719792Z time="2020-10-22T05:53:45Z" level=debug msg="[failed to list dns zone: dns.ZonesClient#ListByResourceGroup: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code=\"ResourceGroupNotFound\" Message=\"Resource group 'console-ui-test-clust-smzfs-rg' could not be found.\", failed to list private dns zone: privatedns.PrivateZonesClient#ListByResourceGroup: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code=\"ResourceGroupNotFound\" Message=\"Resource group 'console-ui-test-clust-smzfs-rg' could not be found.\"]" installID=s7svrt5l 2020-10-22T05:53:46.771946401Z time="2020-10-22T05:53:46Z" level=debug msg="deleting public records" installID=s7svrt5l 2020-10-22T05:53:46.815633673Z time="2020-10-22T05:53:46Z" level=debug msg="[failed to list dns zone: dns.ZonesClient#ListByResourceGroup: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code=\"ResourceGroupNotFound\" Message=\"Resource group 'console-ui-test-clust-smzfs-rg' could not be found.\", failed to list private dns zone: privatedns.PrivateZonesClient#ListByResourceGroup: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code=\"ResourceGroupNotFound\" Message=\"Resource group 'console-ui-test-clust-smzfs-rg' could not be found.\"]" installID=s7svrt5l 2020-10-22T05:53:47.815858860Z time="2020-10-22T05:53:47Z" level=debug msg="deleting public records" installID=s7svrt5l 2020-10-22T05:53:47.855856731Z time="2020-10-22T05:53:47Z" level=debug msg="[failed to list dns zone: dns.ZonesClient#ListByResourceGroup: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code=\"ResourceGroupNotFound\" Message=\"Resource group 'console-ui-test-clust-smzfs-rg' could not be found.\", failed to list private dns zone: privatedns.PrivateZonesClient#ListByResourceGroup: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code=\"ResourceGroupNotFound\" Message=\"Resource group 'console-ui-test-clust-smzfs-rg' could not be found.\"]" installID=s7svrt5l 2020-10-22T05:53:48.856102215Z time="2020-10-22T05:53:48Z" level=debug msg="deleting public records" installID=s7svrt5l 2020-10-22T05:53:48.896645566Z time="2020-10-22T05:53:48Z" level=debug msg="[failed to list dns zone: dns.ZonesClient#ListByResourceGroup: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code=\"ResourceGroupNotFound\" Message=\"Resource group 'console-ui-test-clust-smzfs-rg' could not be found.\", failed to list private dns zone: privatedns.PrivateZonesClient#ListByResourceGroup: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code=\"ResourceGroupNotFound\" Message=\"Resource group 'console-ui-test-clust-smzfs-rg' could not be found.\"]" installID=s7svrt5l 2020-10-22T05:53:49.896792353Z time="2020-10-22T05:53:49Z" level=debug msg="deleting public records" installID=s7svrt5l 2020-10-22T05:53:49.937463131Z time="2020-10-22T05:53:49Z" level=debug msg="[failed to list dns zone: dns.ZonesClient#ListByResourceGroup: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code=\"ResourceGroupNotFound\" Message=\"Resource group 'console-ui-test-clust-smzfs-rg' could not be found.\", failed to list private dns zone: privatedns.PrivateZonesClient#ListByResourceGroup: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code=\"ResourceGroupNotFound\" Message=\"Resource group 'console-ui-test-clust-smzfs-rg' could not be found.\"]" installID=s7svrt5l ``` There are hundreds of these messages. Version-Release number of selected component (if applicable): hive repo, branch ocm-2.1, version 1.0.11. (Part of ACM 2.1) How reproducible: I have not been able to reproduce manually. We have only seen this during automated testing which is run multiple times a day. We are wondering if this could be caused by a prior deployment not completely cleaning up resources. Even if this is true, the hive provision should not get caught in this seemingly infinite loop. Actual results: Provision runs forever Expected results: Provision either succeeds or fails. Additional info:
The installer bug responsible for this error has been fixed and the changes have been vendored in the required ACM branches
The bug has fixed. test hive image: 0d339227e0f8338f04122018961e0c33622eb016 test steps: 1. create an azure cluster using hive, and wait for installation complete 2. manually delete the related resource group on azure console 3. delete clusterdeployment , the deprovision-pod can complete and won't stuck in loop
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.7.2 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0749