Description of problem: cns-deploy should NOT try to re-configure a working setup and end up aborting the deplyment on executing it for the second time Version-Release number of selected component (if applicable): cns-deploy-3.1.0-10.el7rhgs.x86_64 openshift v3.4.0.38 kubernetes v1.4.0+776c994 How reproducible: 100% Steps to Reproduce: 1. Install cns-deploy tool 2. Create a namespace called "storage-project" 3. Setup a router as described in our official doc 4. Create a topology file 5. Execute # cns-deploy -g and setup CNS on the "storage-project" namespace 6. Once completed, ensure that the setup is proper 7. Now re-run the cns-deploy tool again without aborting the current deployment. Actual results: It ended up deleting all the resources when executed the same tool for the second time on an already deployed setup. See below: ############## # cns-deploy -n storage-project -g -c oc -y -l /var/log/1-cns-deploy.log -v Using OpenShift CLI. Using namespace "storage-project". Error from server: error when creating "/usr/share/heketi/templates/deploy-heketi-template.yaml": templates "deploy-heketi" already exists Error from server: error when creating "/usr/share/heketi/templates/heketi-service-account.yaml": serviceaccounts "heketi-service-account" already exists Error from server: error when creating "/usr/share/heketi/templates/heketi-template.yaml": templates "heketi" already exists Error from server: error when creating "/usr/share/heketi/templates/glusterfs-template.yaml": templates "glusterfs" already exists Marking 'dhcp47-138.lab.eng.blr.redhat.com' as a GlusterFS node. error: 'storagenode' already has a value (glusterfs), and --overwrite is false Marking 'dhcp46-59.lab.eng.blr.redhat.com' as a GlusterFS node. error: 'storagenode' already has a value (glusterfs), and --overwrite is false Marking 'dhcp46-33.lab.eng.blr.redhat.com' as a GlusterFS node. error: 'storagenode' already has a value (glusterfs), and --overwrite is false Deploying GlusterFS pods. Error from server: daemonsets.extensions "glusterfs" already exists Waiting for GlusterFS pods to start ... Checking status of pods matching 'glusterfs-node=pod': glusterfs-3w641 1/1 Running 0 1h glusterfs-7rpwq 1/1 Running 0 1h glusterfs-h9a6e 1/1 Running 0 1h OK Found secret 'heketi-service-account-token-zics6' in namespace 'storage-project' for heketi-service-account. service "deploy-heketi" created route "deploy-heketi" created deploymentconfig "deploy-heketi" created Waiting for deploy-heketi pod to start ... Checking status of pods matching 'glusterfs=heketi-pod': heketi-1-eh8gx 1/1 Running 0 1h OK Determining heketi service URL ... OK % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 107 0 107 0 0 21125 0 --:--:-- --:--:-- --:--:-- 26750 Failed to communicate with deploy-heketi service. Please verify that a router has been properly configured. deploymentconfig "deploy-heketi" deleted route "deploy-heketi" deleted service "deploy-heketi" deleted No resources found service "heketi-storage-endpoints" deleted serviceaccount "heketi-service-account" deleted template "deploy-heketi" deleted template "heketi" deleted Removing label from 'dhcp47-138.lab.eng.blr.redhat.com' as a GlusterFS node. node "dhcp47-138.lab.eng.blr.redhat.com" labeled Removing label from 'dhcp46-59.lab.eng.blr.redhat.com' as a GlusterFS node. node "dhcp46-59.lab.eng.blr.redhat.com" labeled Removing label from 'dhcp46-33.lab.eng.blr.redhat.com' as a GlusterFS node. node "dhcp46-33.lab.eng.blr.redhat.com" labeled deploymentconfig "heketi" deleted route "heketi" deleted service "heketi" deleted pod "heketi-1-eh8gx" deleted daemonset "glusterfs" deleted template "glusterfs" deleted ############## Expected results: It should NOT abort any existing deployments automatically without proper checks or inputs from user Additional info:
There is a flag called '--load' which starts the execution from topology load. Appreciated if you can validate this report wrt to the same.
(In reply to Humble Chirammal from comment #1) > There is a flag called '--load' which starts the execution from topology > load. Appreciated if you can validate this report wrt to the same. That flag might not help here in this use case and I was not trying to resume from the topology load. Rather it was a fully configured setup on which the deletion of reources happened on a second run. So we should ideally prevent this from automatically happen
patch posted at https://github.com/gluster/gluster-kubernetes/pull/175
(In reply to Raghavendra Talur from comment #7) > patch posted at https://github.com/gluster/gluster-kubernetes/pull/175 Merged.
Verified ########## # cns-deploy -n storage-project -g -c oc -y -l /var/log/1-cns-deploy.log -v Using OpenShift CLI. NAME STATUS AGE storage-project Active 5d Using namespace "storage-project". Checking that heketi pod is not running ... Checking status of pods matching 'glusterfs=heketi-pod': heketi-1-lhqfh 1/1 Running 0 2d Found heketi pod running. Please destroy existing setup and try again. # cns-deploy -n storage-project -c oc -y -l /var/log/2-cns-deploy.log -v Using OpenShift CLI. NAME STATUS AGE storage-project Active 5d Using namespace "storage-project". Checking that heketi pod is not running ... Checking status of pods matching 'glusterfs=heketi-pod': heketi-1-lhqfh 1/1 Running 0 2d Found heketi pod running. Please destroy existing setup and try again. ##########
*** Bug 1437792 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1112
Marking qe-test-coverage as - since the preferred mode of deployment is using ansible