Bug 1673251 - Openshift installer create cluster returns "FATAL failed to fetch Cluster: failed to load asset "Cluster": "terraform.tfstate" already exists. There may already be a running cluster"
Summary: Openshift installer create cluster returns "FATAL failed to fetch Cluster: f...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.4.0
Assignee: Alex Crawford
QA Contact: Johnny Liu
URL:
Whiteboard:
Depends On:
Blocks: 1664187
TreeView+ depends on / blocked
 
Reported: 2019-02-07 06:34 UTC by dgangaia
Modified: 2023-09-07 19:43 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-01-27 19:30:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Openshift destroy logs (115.07 KB, text/plain)
2019-02-07 06:34 UTC, dgangaia
no flags Details
openshift 0.13 installer log with tfstate left (693.02 KB, text/plain)
2019-02-28 00:11 UTC, William Markito
no flags Details

Description dgangaia 2019-02-07 06:34:02 UTC
Created attachment 1527756 [details]
Openshift destroy logs

Description of problem:
Openshift installer create cluster command fails saying terraform.tfstate  is present although, Open shift destroy cluster was run prior to it. The openshift destroy command is not cleaning up the terraform tfstate file if we have a failed installation.

Version-Release number of the following components:
rpm -q openshift-ansible
rpm -q ansible
ansible --version

How reproducible:

Steps to Reproduce:
1. Create a role with same name as clustername you are planning to provide for install
2. Create a cluster using openshift installer, the installation will fail since the it will complain saying same role exist
3. Since installer has partially created resources, try destroying the cluster.
4. the destroy will not remove terraform.tfstate file.
5) try running openshift delete cluster, but still the file terraform.tfstate doesnt get removed
6) Try creating a new cluster it will complain "FATAL failed to fetch Cluster: failed to load asset "Cluster": "terraform.tfstate" already exists.  There may already be a running cluster"  


Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

./openshift-install create cluster --log-level=debug 
DEBUG Fetching "Terraform Variables"...            
DEBUG Loading "Terraform Variables"...             
DEBUG   Loading "Cluster ID"...                    
DEBUG   Using "Cluster ID" loaded from state file  
DEBUG   Loading "Install Config"...                
DEBUG     Loading "SSH Key"...                     
DEBUG     Using "SSH Key" loaded from state file   
DEBUG     Loading "Base Domain"...                 
DEBUG       Loading "Platform"...                  
DEBUG       Using "Platform" loaded from state file 
DEBUG     Using "Base Domain" loaded from state file 
DEBUG     Loading "Cluster Name"...                
DEBUG     Using "Cluster Name" loaded from state file 
DEBUG     Loading "Pull Secret"...                 
DEBUG     Using "Pull Secret" loaded from state file 
DEBUG     Loading "Platform"...                    
DEBUG   Using "Install Config" loaded from state file 
DEBUG   Loading "Image"...                         
DEBUG     Loading "Install Config"...              
DEBUG   Using "Image" loaded from state file       
DEBUG   Loading "Bootstrap Ignition Config"...     
DEBUG     Loading "Install Config"...              
DEBUG     Loading "Root CA"...                     
DEBUG     Using "Root CA" loaded from state file   
DEBUG     Loading "Certificate (etcd)"...          
DEBUG       Loading "Root CA"...                   
DEBUG     Using "Certificate (etcd)" loaded from state file 
DEBUG     Loading "Certificate (kube-ca)"...       
DEBUG       Loading "Root CA"...                   
DEBUG     Using "Certificate (kube-ca)" loaded from state file 
DEBUG     Loading "Certificate (aggregator)"...    
DEBUG       Loading "Root CA"...                   
DEBUG     Using "Certificate (aggregator)" loaded from state file 
DEBUG     Loading "Certificate (service-serving)"... 
DEBUG       Loading "Root CA"...                   
DEBUG     Using "Certificate (service-serving)" loaded from state file 
DEBUG     Loading "Certificate (etcd)"...          
DEBUG       Loading "Certificate (etcd)"...        
DEBUG     Using "Certificate (etcd)" loaded from state file 
DEBUG     Loading "Certificate (kube-apiaserver)"... 
DEBUG       Loading "Certificate (kube-ca)"...     
DEBUG       Loading "Install Config"...            
DEBUG     Using "Certificate (kube-apiaserver)" loaded from state file 
DEBUG     Loading "Certificate (system:kube-apiserver-proxy)"... 
DEBUG       Loading "Certificate (aggregator)"...  
DEBUG     Using "Certificate (system:kube-apiserver-proxy)" loaded from state file 
DEBUG     Loading "Certificate (system:admin)"...  
DEBUG       Loading "Certificate (kube-ca)"...     
DEBUG     Using "Certificate (system:admin)" loaded from state file 
DEBUG     Loading "Certificate (system:serviceaccount:kube-system:default)"... 
DEBUG       Loading "Certificate (kube-ca)"...     
DEBUG     Using "Certificate (system:serviceaccount:kube-system:default)" loaded from state file 
DEBUG     Loading "Certificate (mcs)"...           
DEBUG       Loading "Root CA"...                   
DEBUG       Loading "Install Config"...            
DEBUG     Using "Certificate (mcs)" loaded from state file 
DEBUG     Loading "Key Pair (service-account.pub)"... 
DEBUG     Using "Key Pair (service-account.pub)" loaded from state file 
DEBUG     Loading "Certificate (journal-gatewayd)"... 
DEBUG       Loading "Root CA"...                   
DEBUG     Loading "Kubeconfig Admin"...            
DEBUG       Loading "Root CA"...                   
DEBUG       Loading "Certificate (system:admin)"... 
DEBUG       Loading "Install Config"...            
DEBUG     Loading "Kubeconfig Kubelet"...          
DEBUG       Loading "Root CA"...                   
DEBUG       Loading "Certificate (system:serviceaccount:kube-system:default)"... 
DEBUG       Loading "Install Config"...            
DEBUG     Using "Kubeconfig Kubelet" loaded from state file 
DEBUG     Loading "Common Manifests"...            
DEBUG       Loading "Cluster ID"...                
DEBUG       Loading "Install Config"...            
DEBUG       Loading "Ingress Config"...            
DEBUG         Loading "Install Config"...          
DEBUG       Using "Ingress Config" loaded from state file 
DEBUG       Loading "DNS Config"...                
DEBUG         Loading "Install Config"...          
DEBUG       Using "DNS Config" loaded from state file 
DEBUG       Loading "Infrastructure Config"...     
DEBUG         Loading "Install Config"...          
DEBUG         Loading "Infrastructure"...          
DEBUG         Using "Infrastructure" loaded from state file 
DEBUG       Using "Infrastructure Config" loaded from state file 
DEBUG       Loading "Network Config"...            
DEBUG         Loading "Install Config"...          
DEBUG       Using "Network Config" loaded from state file 
DEBUG       Loading "Root CA"...                   
DEBUG       Loading "Certificate (etcd)"...        
DEBUG       Loading "Certificate (ingress)"...     
DEBUG         Loading "Certificate (kube-ca)"...   
DEBUG         Loading "Install Config"...          
DEBUG       Using "Certificate (ingress)" loaded from state file 
DEBUG       Loading "Certificate (kube-ca)"...     
DEBUG       Loading "Certificate (service-serving)"... 
DEBUG       Loading "Certificate (etcd)"...        
DEBUG       Loading "Certificate (mcs)"...         
DEBUG       Loading "Certificate (system:serviceaccount:kube-system:default)"... 
DEBUG       Loading "KubeCloudConfig"...           
DEBUG       Using "KubeCloudConfig" loaded from state file 
DEBUG       Loading "MachineConfigServerTLSSecret"... 
DEBUG       Using "MachineConfigServerTLSSecret" loaded from state file 
DEBUG       Loading "OpenshiftServiceCertSignerSecret"... 
DEBUG       Using "OpenshiftServiceCertSignerSecret" loaded from state file 
DEBUG       Loading "Pull"...                      
DEBUG       Using "Pull" loaded from state file    
DEBUG       Loading "CVOOverrides"...              
DEBUG       Using "CVOOverrides" loaded from state file 
DEBUG       Loading "HostEtcdServiceEndpointsKubeSystem"... 
DEBUG       Using "HostEtcdServiceEndpointsKubeSystem" loaded from state file 
DEBUG       Loading "KubeSystemConfigmapEtcdServingCA"... 
DEBUG       Using "KubeSystemConfigmapEtcdServingCA" loaded from state file 
DEBUG       Loading "KubeSystemConfigmapRootCA"... 
DEBUG       Using "KubeSystemConfigmapRootCA" loaded from state file 
DEBUG       Loading "KubeSystemSecretEtcdClient"... 
DEBUG       Using "KubeSystemSecretEtcdClient" loaded from state file 
DEBUG       Loading "OpenshiftMachineConfigOperator"... 
DEBUG       Using "OpenshiftMachineConfigOperator" loaded from state file 
DEBUG       Loading "OpenshiftClusterAPINamespace"... 
DEBUG       Using "OpenshiftClusterAPINamespace" loaded from state file 
DEBUG       Loading "OpenshiftServiceCertSignerNamespace"... 
DEBUG       Using "OpenshiftServiceCertSignerNamespace" loaded from state file 
DEBUG       Loading "EtcdServiceKubeSystem"...     
DEBUG       Using "EtcdServiceKubeSystem" loaded from state file 
DEBUG       Loading "HostEtcdServiceKubeSystem"... 
DEBUG       Using "HostEtcdServiceKubeSystem" loaded from state file 
DEBUG     Using "Common Manifests" loaded from state file 
DEBUG     Loading "Openshift Manifests"...         
DEBUG       Loading "Install Config"...            
DEBUG       Loading "Cluster.cluster.k8s.io/v1alpha1"... 
DEBUG         Loading "Install Config"...          
DEBUG         Loading "Network Config"...          
DEBUG       Using "Cluster.cluster.k8s.io/v1alpha1" loaded from state file 
DEBUG       Loading "Worker Machines"...           
DEBUG         Loading "Cluster ID"...              
DEBUG         Loading "Install Config"...          
DEBUG         Loading "Image"...                   
DEBUG         Loading "Worker Ignition Config"...  
DEBUG           Loading "Install Config"...        
DEBUG           Loading "Root CA"...               
DEBUG         Using "Worker Ignition Config" loaded from state file 
DEBUG       Using "Worker Machines" loaded from state file 
DEBUG       Loading "Master Machines"...           
DEBUG         Loading "Cluster ID"...              
DEBUG         Loading "Install Config"...          
DEBUG         Loading "Image"...                   
DEBUG         Loading "Master Ignition Config"...  
DEBUG           Loading "Install Config"...        
DEBUG           Loading "Root CA"...               
DEBUG         Using "Master Ignition Config" loaded from state file 
DEBUG       Using "Master Machines" loaded from state file 
DEBUG       Loading "Kubeadmin Password"...        
DEBUG       Using "Kubeadmin Password" loaded from state file 
DEBUG       Loading "BindingDiscovery"...          
DEBUG       Using "BindingDiscovery" loaded from state file 
DEBUG       Loading "CloudCredsSecret"...          
DEBUG       Using "CloudCredsSecret" loaded from state file 
DEBUG       Loading "KubeadminPasswordSecret"...   
DEBUG       Using "KubeadminPasswordSecret" loaded from state file 
DEBUG       Loading "RoleCloudCredsSecretReader"... 
DEBUG       Using "RoleCloudCredsSecretReader" loaded from state file 
DEBUG     Using "Openshift Manifests" loaded from state file 
DEBUG   Using "Bootstrap Ignition Config" loaded from state file 
DEBUG   Loading "Master Ignition Config"...        
DEBUG   Fetching "Cluster ID"...                   
DEBUG   Reusing previously-fetched "Cluster ID"    
DEBUG   Fetching "Install Config"...               
DEBUG   Reusing previously-fetched "Install Config" 
DEBUG   Fetching "Image"...                        
DEBUG   Reusing previously-fetched "Image"         
DEBUG   Fetching "Bootstrap Ignition Config"...    
DEBUG   Reusing previously-fetched "Bootstrap Ignition Config" 
DEBUG   Fetching "Master Ignition Config"...       
DEBUG   Reusing previously-fetched "Master Ignition Config" 
DEBUG Generating "Terraform Variables"...          
DEBUG Fetching "Kubeconfig Admin"...               
DEBUG   Fetching "Root CA"...                      
DEBUG   Reusing previously-fetched "Root CA"       
DEBUG   Fetching "Certificate (system:admin)"...   
DEBUG   Reusing previously-fetched "Certificate (system:admin)" 
DEBUG   Fetching "Install Config"...               
DEBUG   Reusing previously-fetched "Install Config" 
DEBUG Generating "Kubeconfig Admin"...             
DEBUG Fetching "Certificate (journal-gatewayd)"... 
DEBUG   Fetching "Root CA"...                      
DEBUG   Reusing previously-fetched "Root CA"       
DEBUG Generating "Certificate (journal-gatewayd)"... 
DEBUG Fetching "Cluster"...                        
DEBUG Loading "Cluster"...                         
DEBUG   Loading "Cluster ID"...                    
DEBUG   Loading "Install Config"...                
DEBUG   Loading "Terraform Variables"...           
DEBUG   Loading "Kubeadmin Password"...            
FATAL failed to fetch Cluster: failed to load asset "Cluster": "terraform.tfstate" already exists.  There may already be a running cluster

Expected results:
openshift destroy command should have deleted all the resources and removed the terraform.tfstate file and should have allowed for creation of new cluster

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 dgangaia 2019-02-07 23:13:19 UTC
The IAM role for instance if my clustername is test, create a IAM role test-master-role and try installation , this is bound to fail after that try destroying the cluster the terraform.tfstate file never gets deleted

Comment 2 Alex Crawford 2019-02-13 22:56:44 UTC
We leave the Terraform state around in order to help with debugging. We expect every cluster to be installed from a new asset directory and we've added a note [1] to our docs. Do you disagree with this behavior?

[1]: https://github.com/openshift/installer#cleanup

Comment 3 dgangaia 2019-02-18 12:35:39 UTC
Hi Alex,

  I agree, that every installation should be carried out from a new assets directory, However here I am trying to reuse a directory which had a failed installation , so even after running "destroy cluster" on the directory terrform files are not cleared leaving the directory unusable causing future installs to fail.

  Just want know why destroy cluster didn't clean up the terraform file, Please let me know if any other information is required.

 Thanks,
Dixit

Comment 4 W. Trevor King 2019-02-27 05:40:26 UTC
Since [1], the installer has been more aggressive about clearing the asset directory during a successful 'destroy cluster'.  Although I'd have expected a successful 'destroy cluster' to remove the Terraform state since 
[2].  What version installer are you using?

[1]: http://github.com/openshift/installer/pull/1086 v0.10.1
[2]: https://github.com/openshift/installer/pull/547 v0.4.0

Comment 5 William Markito 2019-02-28 00:07:37 UTC
Using 0.13 installer after hitting a limit an elastic IP limit 

ERROR *** module.vpc.aws_eip.nat_eip[0]: 1 error occurred: ******************************************************************************************************ERROR *** aws_eip.nat_eip.0: Error creating EIP: AddressLimitExceeded: The maximum number of addresses has been reached. ****************************************ERROR **status code: 400, request id: b754746f-e87b-4633-a3e6-4d0fbc92bb9f **************************************************************************************ERROR

I ran into tfstate problems 

```FATAL failed to fetch Cluster: failed to load asset "Cluster": "terraform.tfstate" already exists.  There may already be a running cluster **********************```

Log file attached.

Comment 6 William Markito 2019-02-28 00:11:37 UTC
Created attachment 1539318 [details]
openshift 0.13 installer log with tfstate left

Comment 7 Alex Crawford 2019-03-08 17:49:57 UTC
William, this is a separate issue. Your cluster failed to install because your AWS account doesn't have the resources available. Your subsequent installation failed immediately with the terraform.tfstate error because your first attempt failed halfway through. You need to destroy that first cluster, increase your resource limits, and then try recreating the cluster.

Comment 8 Alex Crawford 2019-03-08 17:50:28 UTC
I'm closing the original issue due to inactivity.

Comment 9 pbajjuri 2019-11-08 14:19:39 UTC
FATAL failed to fetch Cluster: failed to load asset "Cluster": "terraform.tfstate" already exists.  There may already be a running cluster 

I am facing the same issue, please let me know solution for this one

Comment 10 Keith Fryklund 2020-01-16 18:24:47 UTC
I'm also facing this issue after having "destroyed" a cluster after a failed deployment

Comment 11 Keith Fryklund 2020-01-16 19:29:38 UTC
For anyone who also faces this, I was able to work around the issue quite easily by manually deleting "terraform.tfstate" from my deployment directory.  As far as I can tell, that's all that was left behind.  Why it's not cleaned up? I'm not sure.

Deployments are working again.

Comment 12 John Coleman 2020-01-17 19:47:59 UTC
>We expect every cluster to be installed from a new asset directory and we've added a note [1] to our docs.

If this is the expectation, could we encode the installer to enforce this?  

Also, I noticed that you added a note to the github documentation - is this apparent in product documentation or the destroy command help output?

Comment 13 Alex Crawford 2020-01-27 19:30:37 UTC
The installer will automatically remove the terraform.tfstate file once it has destroyed the cluster. If the cluster destruction is interrupted or otherwise fails, the state file will be left behind. It is safe to attempt the destruction again if you encounter this.

If you are still seeing a failure, please open a new issue with more detail (e.g. the installer version).

Comment 14 W. Trevor King 2020-01-27 20:31:26 UTC
> The installer will automatically remove the terraform.tfstate file once it has destroyed the cluster.

Not in all cases, but there's an existing bug 1791400 with an open installer PR for that.


Note You need to log in before you can comment on or make changes to this bug.