1411022 – [RFE] - cns-deploy should support scale-up of clusters

Bug 1411022 - [RFE] - cns-deploy should support scale-up of clusters

Summary: [RFE] - cns-deploy should support scale-up of clusters

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	cns-deploy-tool
Sub Component:
Version:	cns-3.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Humble Chirammal
QA Contact:	krishnaram Karthick
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-01-07 15:00 UTC by krishnaram Karthick
Modified:	2017-04-27 11:27 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-04-27 11:27:00 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description krishnaram Karthick 2017-01-07 15:00:52 UTC

Description of problem:

On a 3 node cns cluster which was already up and running, I tried appending the topology file with 3 more nodes with the hope that gluster pods would be installed on the new nodes and added to the TSP. However, after creating gluster pods on new nodes, deployment tool tried to setup heketi once again. And the worst happened when heketi deployment failed - the whole cluster was cleaned up, removing already setup gluster pods as well.

So, scaling up of CNS using cns-deploy tool ends up deleting the whole cluster as it tries to configure deploy heketi once again as part of the setup process.

Either cns-deploy tools shouldn't be suggested for scaling up and manual steps needs to be suggested or the cns-deploy tool has to handle scale up scenarios.

For now, as we are in the last leg of the release, proper documentation about scale up of cns cluster would help.

We might also have to provide suggestion for cases where a second TSP is created.

cli output:

cns-deploy -n storage-project -g topology.json
Welcome to the deployment tool for GlusterFS on Kubernetes and OpenShift.

Before getting started, this script has some requirements of the execution
environment and of the container platform that you should verify.

The client machine that will run this script must have:
 * Administrative access to an existing Kubernetes or OpenShift cluster
 * Access to a python interpreter 'python'
 * Access to the heketi client 'heketi-cli'

Each of the nodes that will host GlusterFS must also have appropriate firewall
rules for the required GlusterFS ports:
 * 2222  - sshd (if running GlusterFS in a pod)
 * 24007 - GlusterFS Daemon
 * 24008 - GlusterFS Management
 * 49152 to 49251 - Each brick for every volume on the host requires its own
   port. For every new brick, one new port will be used starting at 49152. We
   recommend a default range of 49152-49251 on each host, though you can adjust
   this to fit your needs.

In addition, for an OpenShift deployment you must:
 * Have 'cluster_admin' role on the administrative account doing the deployment
 * Add the 'default' and 'router' Service Accounts to the 'privileged' SCC
 * Have a router deployed that is configured to allow apps to access services
   running in the cluster

Do you wish to proceed with deployment?

[Y]es, [N]o? [Default: Y]: y
Multiple CLI options detected. Please select a deployment option.
[O]penShift, [K]ubernetes? [O/o/K/k]: o
Using OpenShift CLI.
NAME              STATUS    AGE
storage-project   Active    9h
Using namespace "storage-project".
Error from server: error when creating "/usr/share/heketi/templates/deploy-heketi-template.yaml": templates "deploy-heketi" already exists
Error from server: error when creating "/usr/share/heketi/templates/heketi-service-account.yaml": serviceaccounts "heketi-service-account" already exists
Error from server: error when creating "/usr/share/heketi/templates/heketi-template.yaml": templates "heketi" already exists
Error from server: error when creating "/usr/share/heketi/templates/glusterfs-template.yaml": templates "glusterfs" already exists
error: 'storagenode' already has a value (glusterfs), and --overwrite is false
error: 'storagenode' already has a value (glusterfs), and --overwrite is false
error: 'storagenode' already has a value (glusterfs), and --overwrite is false
node "dhcp47-37.lab.eng.blr.redhat.com" labeled
node "dhcp47-113.lab.eng.blr.redhat.com" labeled
node "dhcp46-35.lab.eng.blr.redhat.com" labeled
Error from server: daemonsets.extensions "glusterfs" already exists
Waiting for GlusterFS pods to start ... OK
service "deploy-heketi" created
route "deploy-heketi" created
deploymentconfig "deploy-heketi" created
Waiting for deploy-heketi pod to start ... OK
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:100   107    0   107    0     0  19640      0 --:--:-- --:--:-- --:--:-- 21400
Failed to communicate with deploy-heketi service.
Please verify that a router has been properly configured.
deploymentconfig "deploy-heketi" deleted
route "deploy-heketi" deleted
service "deploy-heketi" deleted
deploymentconfig "heketi" deleted
service "heketi" deleted
route "heketi" deleted
service "heketi-storage-endpoints" deleted
serviceaccount "heketi-service-account" deleted
template "deploy-heketi" deleted
template "heketi" deleted
node "dhcp46-53.lab.eng.blr.redhat.com" labeled
node "dhcp46-223.lab.eng.blr.redhat.com" labeled
node "dhcp47-145.lab.eng.blr.redhat.com" labeled
node "dhcp47-37.lab.eng.blr.redhat.com" labeled
node "dhcp47-113.lab.eng.blr.redhat.com" labeled
node "dhcp46-35.lab.eng.blr.redhat.com" labeled
daemonset "glusterfs" deleted
template "glusterfs" deleted


Version-Release number of selected component (if applicable):
[root@dhcp46-5 ~]# openshift version
openshift v3.4.0.39
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0
[root@dhcp46-5 ~]# rpm -qa | grep 'heketi'
heketi-client-3.1.0-12.el7rhgs.x86_64
[root@dhcp46-5 ~]# rpm -qa | grep 'cns'
cns-deploy-3.1.0-12.el7rhgs.x86_64


How reproducible:
Tried once, but this should be 100%

Steps to Reproduce:
1. setup a 3 node cns
2. Try to scale up cns by modifying topology file

Actual results:
whole cluster is tore down

Expected results:
scale up of cns cluster

Additional info:

Comment 2 krishnaram Karthick 2017-01-09 10:24:24 UTC

Changing this bug as an RFE to support scale-up of CNS clusters. cns-deploy should support the following.

1) cns-deploy tool should support expansion of existing TSP
2) cns-deploy tool should support creation of new TSP on an existing system

Comment 3 Jose A. Rivera 2017-01-09 17:46:19 UTC

Pardon, what is TSP?

Comment 4 Humble Chirammal 2017-01-23 11:24:40 UTC

(In reply to Jose A. Rivera from comment #3)
> Pardon, what is TSP?

Trusted Storage pool.

Comment 5 Humble Chirammal 2017-04-27 11:27:00 UTC

This is fixed in cns 3.5. Please test and reopen accordingly.

Note You need to log in before you can comment on or make changes to this bug.