Bug 1391920

Summary: [ceph-ansible] Encrypted OSD creation fails with collocated journal and custom cluster name
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Vasishta <vashastr>
Component: Ceph-DiskAssignee: Sébastien Han <shan>
Status: CLOSED ERRATA QA Contact: Vasishta <vashastr>
Severity: medium Docs Contact: Erin Donnelly <edonnell>
Priority: unspecified    
Version: 2.1CC: adeza, aschoen, ceph-eng-bugs, edonnell, flucifre, gmeno, hnallurv, kdreyer, nthomas, racpatel, sankarshan, seb, shan, tserlin, vashastr, wusui
Target Milestone: rc   
Target Release: 2.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-10.2.7-21.el7cp Ubuntu: ceph_10.2.7-23redhat1 Doc Type: Bug Fix
Doc Text:
.Ansible and "ceph-disk" no longer fail to create encrypted OSDs if the cluster name is different than "ceph" Previously, the `ceph-disk` utility did not support configuring the `dmcrypt` utility if the cluster name was different than "ceph". Consequently, it was not possible to use the `ceph-ansible` utility to create encrypted OSDs if you use a custom cluster name. This bug has been fixed, and custom cluster names can now be used.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-06-19 13:27:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1412948, 1437916    

Description Vasishta 2016-11-04 11:22:04 UTC
Description of problem:
Encrypted OSD creation fails with collocated journal and custom cluster name.

Version-Release number of selected component (if applicable):
ceph-ansible-1.0.5-39.el7scon.noarch

How reproducible:
always

Steps to Reproduce:
1. Install ceph-ansible
2. Change following settings in /usr/share/ceph-ansible/group_vars/osds file

   dmcrypt_journal_collocation: true
   devices:
  - /dev/sdb
  - /dev/sdc
  - /dev/sdd
3. Run playbook.

Actual results:

TASK: [ceph-osd | manually prepare osd disk(s) (dmcrypt)] ********************* 
failed: [magna030] => (item=[{u'cmd': u"parted --script /dev/sdb print | egrep -sq '^ 1.*ceph'", u'end': u'2016-11-04 11:01:40.252042', 'failed': False, u'stdout': u'', u'changed': False, u'rc': 1, u'start': u'2016-11-04 11:01:40.157490', 'item': '/dev/sdb', u'warnings': [], u'delta': u'0:00:00.094552', 'invocation': {'module_name': u'shell', 'module_complex_args': {}, 'module_args': u"parted --script /dev/sdb print | egrep -sq '^ 1.*ceph'"}, 'stdout_lines': [], 'failed_when_result': False, u'stderr': u''}, {u'cmd': u"echo '/dev/sdb' | egrep '/dev/([hsv]d[a-z]{1,2}|cciss/c[0-9]d[0-9]p|nvme[0-9]n[0-9]p)[0-9]{1,2}$'", u'end': u'2016-11-04 11:01:38.796395', 'failed': False, u'stdout': u'', u'changed': False, u'rc': 1, u'start': u'2016-11-04 11:01:38.791004', 'item': '/dev/sdb', u'warnings': [], u'delta': u'0:00:00.005391', 'invocation': {'module_name': u'shell', 'module_complex_args': {}, 'module_args': u"echo '/dev/sdb' | egrep '/dev/([hsv]d[a-z]{1,2}|cciss/c[0-9]d[0-9]p|nvme[0-9]n[0-9]p)[0-9]{1,2}$'"}, 'stdout_lines': [], 'failed_when_result': False, u'stderr': u''}, '/dev/sdb']) => {"changed": true, "cmd": ["ceph-disk", "prepare", "--dmcrypt", "--cluster", "master", "/dev/sdb"], "delta": "0:00:00.162401", "end": "2016-11-04 11:01:41.703409", "item": [{"changed": false, "cmd": "parted --script /dev/sdb print | egrep -sq '^ 1.*ceph'", "delta": "0:00:00.094552", "end": "2016-11-04 11:01:40.252042", "failed": false, "failed_when_result": false, "invocation": {"module_args": "parted --script /dev/sdb print | egrep -sq '^ 1.*ceph'", "module_complex_args": {}, "module_name": "shell"}, "item": "/dev/sdb", "rc": 1, "start": "2016-11-04 11:01:40.157490", "stderr": "", "stdout": "", "stdout_lines": [], "warnings": []}, {"changed": false, "cmd": "echo '/dev/sdb' | egrep '/dev/([hsv]d[a-z]{1,2}|cciss/c[0-9]d[0-9]p|nvme[0-9]n[0-9]p)[0-9]{1,2}$'", "delta": "0:00:00.005391", "end": "2016-11-04 11:01:38.796395", "failed": false, "failed_when_result": false, "invocation": {"module_args": "echo '/dev/sdb' | egrep '/dev/([hsv]d[a-z]{1,2}|cciss/c[0-9]d[0-9]p|nvme[0-9]n[0-9]p)[0-9]{1,2}$'", "module_complex_args": {}, "module_name": "shell"}, "item": "/dev/sdb", "rc": 1, "start": "2016-11-04 11:01:38.791004", "stderr": "", "stdout": "", "stdout_lines": [], "warnings": []}, "/dev/sdb"], "rc": 1, "start": "2016-11-04 11:01:41.541008", "warnings": []}
stderr: ceph-disk: Error: Device is mounted: /dev/sdb3


Expected results:


Additional info:
Complete ansible-playbook log and files of group_vars have been copied to home of ubuntu user in magna111.ceph.redhat.com (/home/ubuntu/ansible_log & /home/ubuntu/group_vars)

Comment 4 seb 2016-11-04 14:48:13 UTC
I think the real issue here is that ceph-disk with dmcrypt doesn't support storing keys with a cluster name different than "ceph".

While trying to activate an OSD manually I noticed the keys couldn't get stored.

Alfredo? Am I right?

Comment 5 seb 2016-11-04 15:06:11 UTC
Patch proposed upstream: https://github.com/ceph/ceph/pull/11786

Comment 10 seb 2016-11-08 11:23:36 UTC
LGTM Bara!

Comment 15 seb 2017-01-10 23:47:06 UTC
No we can not test it, we are still waiting for this: https://github.com/ceph/ceph/pull/11786 to be merged in ceph.

Comment 17 Andrew Schoen 2017-01-30 17:59:04 UTC
Moving to 2.3 to give more time for https://github.com/ceph/ceph/pull/11786 to be merged and tested.

Comment 19 Federico Lucifredi 2017-02-22 00:52:31 UTC
If this can be merged this week, we will test it — Gregory will have an update for us in the program call.

Comment 20 Federico Lucifredi 2017-02-22 00:52:36 UTC
If this can be merged this week, we will test it — Gregory will have an update for us in the program call.

Comment 22 Federico Lucifredi 2017-02-22 16:29:35 UTC
good customer insight from Gregory is that the issue is in Ceph-disk w/ encrypted OSDs. Therefore it is highly unlikely there are clusters out there with encrypted OSDs that are not named 'Ceph'. 

This should not block upgrades. Pushing the fix to 2.3 so we have time for sorting out the ceph-disk fix that is churning upstream right now.

Comment 23 Harish NV Rao 2017-03-30 19:58:52 UTC
(In reply to Andrew Schoen from comment #17)
> Moving to 2.3 to give more time for https://github.com/ceph/ceph/pull/11786
> to be merged and tested.

@Seb, will this be fixed in 2.3? If not, could you please move it out of 2.3?

Comment 24 seb 2017-04-04 09:32:11 UTC
It depends, Ken, is https://github.com/ceph/ceph/pull/13573 part of 2.3?
However this is still not in Jewel upstream so we don't test this in our CI.

Comment 25 Ken Dreyer (Red Hat) 2017-04-04 17:42:19 UTC
Seb, the jewel backport PR 13496 lacks approval from Loic and a clean Teuthology run, so it will not be in the v10.2.7 upstream release.

Once v10.2.7 is tagged upstream, I'll rebase the internal ceph-2-rhel-patches branch to that, and then we'll need to cherry-pick a fix internally for this BZ.

If we do not yet have a stable fix for jewel that we can ship to customers with a high level of confidence, we'll need to re-target this BZ to a future RH Ceph Storage release.

How would you like to proceed on this?

Comment 26 seb 2017-04-05 09:52:17 UTC
Let's postpone this for a future release once we have the right backport for Jewel upstream.

I guess this means rhcs 3.0 right?

Comment 27 Ken Dreyer (Red Hat) 2017-04-05 17:39:55 UTC
Thanks, re-targeted

Comment 28 tserlin 2017-05-23 18:07:02 UTC
*** Bug 1452316 has been marked as a duplicate of this bug. ***

Comment 29 tserlin 2017-05-23 18:09:07 UTC
*** Bug 1451168 has been marked as a duplicate of this bug. ***

Comment 30 tserlin 2017-05-25 17:54:20 UTC
Just a clarification: PR 13496 was closed in favor of https://github.com/ceph/ceph/pull/14765

Comment 35 Warren 2017-05-27 00:53:07 UTC
On my test systems:

group_vars/all.yml has the following field set:

cluster: aard

group_vars/osds.yml has the following fields set:

devices:
  - /dev/sdb
  - /dev/sdc
  - /dev/sdd

dmcrypt_journal_collocation: true

The ansible-playbook command finished with no errors.

Running the following command:
sudo docker exec ceph-mon-magna045 ceph --cluster aard -s

Shows:
    cluster b946af73-d4ca-4c60-b261-7cbe2c6ac104
     health HEALTH_WARN
            clock skew detected on mon.magna055, mon.magna060
            Monitor clock skew detected 
     monmap e2: 3 mons at {magna045=10.8.128.45:6789/0,magna055=10.8.128.55:6789/0,magna060=10.8.128.60:6789/0}
            election epoch 6, quorum 0,1,2 magna045,magna055,magna060
     osdmap e18: 9 osds: 9 up, 9 in
            flags sortbitwise,require_jewel_osds
      pgmap v44: 128 pgs, 1 pools, 0 bytes data, 0 objects
            302 MB used, 8378 GB / 8378 GB avail
                 128 active+clean

So the cluster is named aard and dmcrypt_journal_collocation is set.

I do not see the ceph-disk Error reported

Comment 37 errata-xmlrpc 2017-06-19 13:27:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1497