Bug 1222505

Summary: 1.3.0: ceph-deploy With custom cluster name fails as sysvinit hosts do not support this feature, adding error message!
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Harish NV Rao <hnallurv>
Component: Ceph-InstallerAssignee: Travis Rhoden <trhoden>
Status: CLOSED ERRATA QA Contact: ceph-qe-bugs <ceph-qe-bugs>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 1.3.0CC: adeza, aschoen, ceph-eng-bugs, flucifre, kdreyer, nthomas, rnalakka, sankarshan, trhoden, vumrao
Target Milestone: rcKeywords: Reopened
Target Release: 1.3.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ceph-deploy-1.5.24-1.el7cp Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-09-29 05:42:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1210038    
Bug Blocks:    
Attachments:
Description Flags
logfiles
none
.bash_history contents. none

Description Harish NV Rao 2015-05-18 11:40:22 UTC
Created attachment 1026669 [details]
logfiles

Description of problem:
-----------------------
In order to test multi cluster calamari UI support, i brought up first cluster with default cluster name "ceph". The cluster was in active+clean state. Created the second cluster with name "ceph2". While doing so, i had made some changes to ceph-config directory. I was able to see these two clusters on calamari UI also. 
As i had done some changes in the ceph-config and was not sure whether they are going to impact the testing later, I thought of purging second cluster and bringing it up again. So, i purged the second cluster (purged ceph packages also).
Once purging was done and ceph-deploy --cluster ceph2 new <mon> were successful, i issued "ceph-deploy --cluster ceph2 install <node1> <node2>...". This resulted in error:
[ceph_deploy.install][INFO  ] Distro info: Red Hat Enterprise Linux Server 7.1 Maipo
[ceph_deploy.install][ERROR ] refusing to install on host: c2Mon, with custom cluster name: ceph2
[ceph_deploy.install][ERROR ] custom cluster names are not supported on sysvinit hosts


How reproducible: Always
-----------------

Steps to Reproduce:
-------------------
1. Install first cluster with default name and bring it active+clean state
2. Install second cluster:
   2.1. from ceph-config dir, run "ceph-deploy --cluster ceph2 new <mon node>
   2.2. from ceph-config dir, run "ceph-deploy install <node1> <node2>..."
        Note here that i did not provide second cluster name. so this command did not complete.
   2.3. I created another dir ceph2-config and copied all the files related to ceph2 from ceph-config dir to it
   2.4. Then i Ran ""ceph-deploy install <node1> <node2>..." from ceph2-config dir. It completed successfully. Then ran "ceph-deploy --cluster ceph2 mon create-initial"
   2.5. Later copied ceph2 related files from ceph2-config to ceph-config
   2.6. Applied the workaround as mentioned in 1219344 for connecting calamari to new cluster nodes.
   2.7. Added OSDs and accepted the new hosts from calamari UI. After this I was able to view GUI details of both clusters.
   2.8. As the steps 2.3, 2.4 and 2.5 were not correct theoretically, i decided to purge the second cluster and recreate it.
3. Purged the cluster using:
   ceph-deploy purge <ceph-node> [<ceph-node>]
   ceph-deploy purgedata <ceph-node> [<ceph-node>]
   ceph-deploy forgetkeys
4. from ceph-config dir, ran "ceph-deploy --cluster ceph2 new <mon node>"
5. from ceph-config dir, ran "ceph-deploy --cluster ceph2 install <node1> <node2>..."


Actual results:
---------------
"ceph-deploy --cluster ceph2 install <node1> <node2>..." failed with following message for all nodes:

[ceph_deploy.install][INFO  ] Distro info: Red Hat Enterprise Linux Server 7.1 Maipo
[ceph_deploy.install][ERROR ] refusing to install on host: c2Mon, with custom cluster name: ceph2
[ceph_deploy.install][ERROR ] custom cluster names are not supported on sysvinit hosts


Expected results:
----------------
Installation successful.


Additional info:
----------------
Please note the error message i am seeing above was not seen when i tried first time to install with custom cluster name. Looks like the purge has done something which is preventing installation with custom cluster name. 
Please look into this defect and provide workaround as i am blocked from executing Multi Cluster Calamari UI testing.

calamari admin: 10.8.128.6 (octo lab machines)
ssh username:cephuser, passwd:junk123
# Admin node:
10.8.128.6	Admin
#cluster 1:
10.8.128.33	Mon
10.8.128.40	osd0
10.8.128.86	osd1
10.8.128.29	osd2
#cluster2:
10.8.128.76	c2Mon
10.8.128.89	c2osd0
10.8.128.90	c2osd1
10.8.128.91	c2osd2

Comment 2 Travis Rhoden 2015-05-18 15:32:38 UTC
From my perspective, ceph-deploy is doing the right thing, and outputing the right info: "[ceph_deploy.install][ERROR ] custom cluster names are not supported on sysvinit hosts"

Ceph Hammer (and RHCS 1.3) does not include systemd scripts that I know of, thus we are using the sysvinit scripts, and thus custom cluster names are not supported.

But that's my perspective - I don't know if that's actually true or not.  :)  Are custom cluster names supposed to be a new feature in 1.3?  Were they supported on RHEL on 1.2.z?  I don't know the answers to those questions.

Comment 3 Harish NV Rao 2015-05-18 15:41:21 UTC
If custom names are not supported in 1.3 it should have failed when i attempted first time. It allowed first time and when tried re-installing after purge, it failed.

Comment 4 Ken Dreyer (Red Hat) 2015-05-18 15:52:08 UTC
Travis, can we check "if distro.init == 'sysvinit' and args.cluster != 'ceph'" in new.py as we're doing in install.py?

Comment 5 Travis Rhoden 2015-05-18 16:02:20 UTC
(In reply to Ken Dreyer (Red Hat) from comment #4)
> Travis, can we check "if distro.init == 'sysvinit' and args.cluster !=
> 'ceph'" in new.py as we're doing in install.py?

Yeah, we can do that.  It would have the effect of checking if any of the monitors given in the 'ceph-deploy new' command were sysvinit systems.  If any of them were, it would abort.

I am very curious about what Harish described, though, that it seemed to work the first time, but then didn't on a re-install.  That doesn't seem right to me.

But yes, it is definitely a problem that you can define a {cluster}.conf file with the 'new' coommand that we have the ability to detect is not going to work.  The "new" command does not actually instally anything, it just creates a brand new {cluster}.conf file based on your input, and IP address info it collects from the monitors.  But since we are contacting the monitors, we *can* see if we are going to use sysvinit on them.  :)

Comment 6 Travis Rhoden 2015-05-18 19:40:47 UTC
Fix committed upstream: https://github.com/ceph/ceph-deploy/commit/63c4ff70ec1a09205a58c7514dbfcdb00b15d975

Comment 7 Harish NV Rao 2015-05-19 05:53:14 UTC
Hi Travis,

As per my understanding, the fix above is now preventing users to use custom cluster name for "ceph-deploy new <mon>" too. 

Did you get a chance to look into why ceph-deploy allowed first time to install with custom cluster name and not when re-installing? I feel we need to know that before we say this defect is fixed.

Harish

Comment 8 Travis Rhoden 2015-05-19 13:19:22 UTC
(In reply to Harish NV Rao from comment #7)
> Hi Travis,
> 
> As per my understanding, the fix above is now preventing users to use custom
> cluster name for "ceph-deploy new <mon>" too. 
> 
> Did you get a chance to look into why ceph-deploy allowed first time to
> install with custom cluster name and not when re-installing? I feel we need
> to know that before we say this defect is fixed.
> 
> Harish

Hi Harish,

Your understanding is correct.  I did not look into the other problem you described.  Would you happen to have the exact sequence of commands that you used with ceph-deploy to get into that scenario?

Comment 9 Harish NV Rao 2015-05-19 15:28:37 UTC
Created attachment 1027284 [details]
.bash_history contents.

.bash_history file

Comment 10 Harish NV Rao 2015-05-19 15:30:29 UTC
Here are the sequence of commands extracted from .bash_history on Admin node. Few commands are omitted to give clarity.

# go to ceph-config dir
cd ceph-config/
ceph-deploy --cluster ceph2 new c2Mon

# following two commands did not work in ceph-config directory
ceph-deploy install --cluster ceph2 c2Mon c2osd0 c2osd1 c2osd2 #incorrect syntax
ceph-deploy install c2Mon c2osd0 c2osd1 c2osd2 # hung for long time

# so created ceph2-config dir
cd ..
mkdir ceph2-config

# copied all the files related to ceph2 from ceph-config directory to ceph2-config directory
ll ceph-config/
cp ceph-config/ceph2* ceph2-config/.
cd ceph2-config/

# Ran install and mon create-initial commands in ceph2-config directory
ceph-deploy install c2Mon c2osd0 c2osd1 c2osd2
ceph-deploy mon create-initial
ceph-deploy mon create-initial
ceph-deploy --cluster ceph2 mon create-initial

sudo yum -y install salt-minion
sudo mkdir /etc/salt/minion.d

# copied ceph2 related files in ceph2-config dir to ceph-config dir
cp ceph2* ../ceph-config/

Please see the attached file "CmdSequence" for complete history of the commands [Line #471 onwards]

Comment 12 Travis Rhoden 2015-05-19 18:38:55 UTC
(In reply to Harish NV Rao from comment #10)
> Here are the sequence of commands extracted from .bash_history on Admin
> node. Few commands are omitted to give clarity.
Hi Harish,

I think we are looking at a mixture of too many commands that used and did not use a custom cluster name.  I don't think you ever achieved a working cluster with a custom name. I'll call out a few issues I see...

> 
> # go to ceph-config dir
> cd ceph-config/
> ceph-deploy --cluster ceph2 new c2Mon

When you did this, was this a new ceph-config directory?  Or did it have an existing ceph.conf already in it?  From the history you uploaded, I think you were reusing this.  This becomes important later.

> 
> # following two commands did not work in ceph-config directory
> ceph-deploy install --cluster ceph2 c2Mon c2osd0 c2osd1 c2osd2 #incorrect
> syntax

I don't think this is incorrect syntax. At this point, it should have errored and said that you can't use a custom cluster name on these nodes. The change we've made in this BZ is to catch this in the previous step, with "ceph-deploy new"

> ceph-deploy install c2Mon c2osd0 c2osd1 c2osd2 # hung for long time

I wish I knew why this had hung. If this was run from the ceph-config directory, and "ceph-deploy new" was run here without a custom name, this would have started to install ceph on the remote nodes just fine.

> 
> # so created ceph2-config dir
> cd ..
> mkdir ceph2-config
> 
> # copied all the files related to ceph2 from ceph-config directory to
> ceph2-config directory
> ll ceph-config/
> cp ceph-config/ceph2* ceph2-config/.
> cd ceph2-config/
> 
> # Ran install and mon create-initial commands in ceph2-config directory
> ceph-deploy install c2Mon c2osd0 c2osd1 c2osd2

Just like we don't allow custom cluster names on RHEL/CentOS machines, they can't have more than one cluster on them (same reason). From your history, it seemed like there might have been a previous install on these nodes?

> ceph-deploy mon create-initial
> ceph-deploy mon create-initial
> ceph-deploy --cluster ceph2 mon create-initial

I worried about sequences like this, where --cluster was left out the first time.  If there were any ceph.conf files laying around, they would have been picked up and used by default.

> 
> sudo yum -y install salt-minion
> sudo mkdir /etc/salt/minion.d
> 
> # copied ceph2 related files in ceph2-config dir to ceph-config dir
> cp ceph2* ../ceph-config/
> 
> Please see the attached file "CmdSequence" for complete history of the
> commands [Line #471 onwards]


If you are still concerned about this, I would start fresh and see if it happens again.  But since it's not a scenario we support, I'm not sure if it is worth your time.

Comment 14 Harish NV Rao 2015-05-22 10:51:04 UTC
Travis, I am sure that it got installed first time with custom cluster name. I am unable to reproduce this now. I am lowering the severity for now. I will retry it one more time with new build on a new system next week. I will close it If issue is not reproducible.

Comment 15 Harish NV Rao 2015-06-02 07:31:21 UTC
Verified that custom cluster names cannot be used while issuing "ceph-deploy new <mon node>" on a 7.1 system. This defect has been verified for the fix done in https://github.com/ceph/ceph-deploy/commit/63c4ff70ec1a09205a58c7514dbfcdb00b15d975. I don't have cycles now to complete the task as mentioned in Comment 14 above. Marking this defect as Verified for now.

Log:
====
[cephuser@Admin ceph-config]$ ceph-deploy --cluster Test1 new Mon
[ceph_deploy.conf][DEBUG ] found configuration file at: /home/cephuser/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.25): /usr/bin/ceph-deploy --cluster Test1 new Mon
[ceph_deploy.new][DEBUG ] Creating new cluster named Test1
[ceph_deploy.new][INFO  ] making sure passwordless SSH succeeds
[Mon][DEBUG ] connected to host: Admin 
[Mon][INFO  ] Running command: ssh -CT -o BatchMode=yes Mon
[Mon][DEBUG ] connection detected need for sudo
[Mon][DEBUG ] connected to host: Mon 
[Mon][DEBUG ] detect platform information from remote host
[Mon][DEBUG ] detect machine type
[Mon][DEBUG ] find the location of an executable
[Mon][INFO  ] Running command: sudo /usr/sbin/ip link show
[Mon][INFO  ] Running command: sudo /usr/sbin/ip addr show
[Mon][DEBUG ] IP addresses found: ['10.12.27.14']

[ceph_deploy.new][ERROR ] custom cluster names are not supported on sysvinit hosts
[ceph_deploy][ERROR ] ClusterNameError: host Mon does not support custom cluster names

Comment 17 errata-xmlrpc 2015-06-24 15:53:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2015:1183

Comment 23 Vikhyat Umrao 2015-09-29 06:50:34 UTC
RFE : https://bugzilla.redhat.com/show_bug.cgi?id=1267145
[RFE]Ceph support for custom cluster name for the sysvinit hosts