Description of problem: On a 3 node cns cluster which was already up and running, I tried appending the topology file with 3 more nodes with the hope that gluster pods would be installed on the new nodes and added to the TSP. However, after creating gluster pods on new nodes, deployment tool tried to setup heketi once again. And the worst happened when heketi deployment failed - the whole cluster was cleaned up, removing already setup gluster pods as well. So, scaling up of CNS using cns-deploy tool ends up deleting the whole cluster as it tries to configure deploy heketi once again as part of the setup process. Either cns-deploy tools shouldn't be suggested for scaling up and manual steps needs to be suggested or the cns-deploy tool has to handle scale up scenarios. For now, as we are in the last leg of the release, proper documentation about scale up of cns cluster would help. We might also have to provide suggestion for cases where a second TSP is created. cli output: cns-deploy -n storage-project -g topology.json Welcome to the deployment tool for GlusterFS on Kubernetes and OpenShift. Before getting started, this script has some requirements of the execution environment and of the container platform that you should verify. The client machine that will run this script must have: * Administrative access to an existing Kubernetes or OpenShift cluster * Access to a python interpreter 'python' * Access to the heketi client 'heketi-cli' Each of the nodes that will host GlusterFS must also have appropriate firewall rules for the required GlusterFS ports: * 2222 - sshd (if running GlusterFS in a pod) * 24007 - GlusterFS Daemon * 24008 - GlusterFS Management * 49152 to 49251 - Each brick for every volume on the host requires its own port. For every new brick, one new port will be used starting at 49152. We recommend a default range of 49152-49251 on each host, though you can adjust this to fit your needs. In addition, for an OpenShift deployment you must: * Have 'cluster_admin' role on the administrative account doing the deployment * Add the 'default' and 'router' Service Accounts to the 'privileged' SCC * Have a router deployed that is configured to allow apps to access services running in the cluster Do you wish to proceed with deployment? [Y]es, [N]o? [Default: Y]: y Multiple CLI options detected. Please select a deployment option. [O]penShift, [K]ubernetes? [O/o/K/k]: o Using OpenShift CLI. NAME STATUS AGE storage-project Active 9h Using namespace "storage-project". Error from server: error when creating "/usr/share/heketi/templates/deploy-heketi-template.yaml": templates "deploy-heketi" already exists Error from server: error when creating "/usr/share/heketi/templates/heketi-service-account.yaml": serviceaccounts "heketi-service-account" already exists Error from server: error when creating "/usr/share/heketi/templates/heketi-template.yaml": templates "heketi" already exists Error from server: error when creating "/usr/share/heketi/templates/glusterfs-template.yaml": templates "glusterfs" already exists error: 'storagenode' already has a value (glusterfs), and --overwrite is false error: 'storagenode' already has a value (glusterfs), and --overwrite is false error: 'storagenode' already has a value (glusterfs), and --overwrite is false node "dhcp47-37.lab.eng.blr.redhat.com" labeled node "dhcp47-113.lab.eng.blr.redhat.com" labeled node "dhcp46-35.lab.eng.blr.redhat.com" labeled Error from server: daemonsets.extensions "glusterfs" already exists Waiting for GlusterFS pods to start ... OK service "deploy-heketi" created route "deploy-heketi" created deploymentconfig "deploy-heketi" created Waiting for deploy-heketi pod to start ... OK % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:100 107 0 107 0 0 19640 0 --:--:-- --:--:-- --:--:-- 21400 Failed to communicate with deploy-heketi service. Please verify that a router has been properly configured. deploymentconfig "deploy-heketi" deleted route "deploy-heketi" deleted service "deploy-heketi" deleted deploymentconfig "heketi" deleted service "heketi" deleted route "heketi" deleted service "heketi-storage-endpoints" deleted serviceaccount "heketi-service-account" deleted template "deploy-heketi" deleted template "heketi" deleted node "dhcp46-53.lab.eng.blr.redhat.com" labeled node "dhcp46-223.lab.eng.blr.redhat.com" labeled node "dhcp47-145.lab.eng.blr.redhat.com" labeled node "dhcp47-37.lab.eng.blr.redhat.com" labeled node "dhcp47-113.lab.eng.blr.redhat.com" labeled node "dhcp46-35.lab.eng.blr.redhat.com" labeled daemonset "glusterfs" deleted template "glusterfs" deleted Version-Release number of selected component (if applicable): [root@dhcp46-5 ~]# openshift version openshift v3.4.0.39 kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0 [root@dhcp46-5 ~]# rpm -qa | grep 'heketi' heketi-client-3.1.0-12.el7rhgs.x86_64 [root@dhcp46-5 ~]# rpm -qa | grep 'cns' cns-deploy-3.1.0-12.el7rhgs.x86_64 How reproducible: Tried once, but this should be 100% Steps to Reproduce: 1. setup a 3 node cns 2. Try to scale up cns by modifying topology file Actual results: whole cluster is tore down Expected results: scale up of cns cluster Additional info:
Changing this bug as an RFE to support scale-up of CNS clusters. cns-deploy should support the following. 1) cns-deploy tool should support expansion of existing TSP 2) cns-deploy tool should support creation of new TSP on an existing system
Pardon, what is TSP?
(In reply to Jose A. Rivera from comment #3) > Pardon, what is TSP? Trusted Storage pool.
This is fixed in cns 3.5. Please test and reopen accordingly.