Bug 1475820

Summary: allow multi dedicated journals for container deployment
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: seb
Component: Ceph-AnsibleAssignee: Sébastien Han <shan>
Status: CLOSED ERRATA QA Contact: Vasishta <vashastr>
Severity: high Docs Contact:
Priority: high    
Version: 2.4CC: adeza, aschoen, bengland, ceph-eng-bugs, ddharwar, dwilson, flucifre, gfidente, gmeno, hnallurv, icolle, kdreyer, nthomas, sankarshan, seb, shan, vashastr
Target Milestone: rc   
Target Release: 3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-ansible-3.0.0-0.1.rc14.el7cp Ubuntu: ceph-ansible_3.0.0~rc14-2redhat1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-05 23:38:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
File contains contents of OSD journald log snippet, all.yml contents, ansible-playbook log and inventory file
none
File contains contents of OSD journald log snippet, all.yml contents, ansible-playbook log and inventory file
none
File contains contents of OSD journald log snippet
none
Contents of /usr/share/osd-run.sh
none
File contains contents of OSD journald log snippet
none
File contains OSD journald log snippet none

Description seb 2017-07-27 12:34:38 UTC
Description of problem:

We currently only support a single dedicated device to act as a journal when deploying ceph in containers with ceph-ansible.
We need to unlock this limitation.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Harish NV Rao 2017-08-02 09:15:11 UTC
@Seb, can you please let us know the customer use case for this enhancement? How many max dedicated journals are supported?

Comment 3 seb 2017-08-02 09:21:48 UTC
Harish, the use case is the same as a non-containerized deployment, users need to be able to use multiple dedicated devices to store their osd journal on, not only one. This is currently a huge limitation.

Work in progress here.

Comment 4 seb 2017-08-25 11:25:03 UTC
*** Bug 1484466 has been marked as a duplicate of this bug. ***

Comment 5 Ken Dreyer (Red Hat) 2017-08-30 20:00:17 UTC
Would you please tag and announce a new release of ceph-ansible upstream with this change?

Comment 8 Vasishta 2017-09-12 12:52:15 UTC
Hi Sebastien,

I couldn't get information anywhere to set variable 'dedicated_devices'. Can you please let me know how to set the variable ? Do I need to set any other variable with this one ?

As per my knowledge I think it can be initialized as :

dedicated_devices:
- - journal_device1
  - journal_device2
- - journal_device3
  - journal_device4

Please let me know whether if I'm right.

Thanks,
Vasishta

Comment 9 seb 2017-09-12 15:52:07 UTC
This is correct, you have to set 'devices' for the OSD data and 'dedicated_devices' for journals. And yes this is the same way as the non-containerized scenario.
Does that help?

Comment 10 Vasishta 2017-09-13 10:03:05 UTC
Created attachment 1325304 [details]
File contains contents of OSD journald log snippet, all.yml contents, ansible-playbook log and inventory file

Hi Sebastien, 

Thanks a lot for the info.
I tried it today but OSD activation failed, OSD journald log had below lines (Above attachment contains larger log snippet)

raise Error('%s does not exist' % args.path)
ceph-osd-run.sh[10379]: ceph_disk.main.Error: Error: /dev/sdb1 does not exist

I think I have hit the same issue as in BZ 1489835.

Contents of osds.yml -

$ cat group_vars/osds.yml | egrep -v ^# | grep -v ^$
---
dummy:
devices:
  - /dev/sdb
dedicated_devices:
  - - /dev/sdc
    - /dev/sdd
ceph_osd_docker_prepare_env: -e CLUSTER={{ cluster }} -e OSD_JOURNAL_SIZE={{ journal_size }} -e OSD_FORCE_ZAP=1 -e OSD_JOURNAL={{ dedicated_devices[0] }} -e OSD_FILESTORE=1
ceph_osd_docker_extra_env: -e CLUSTER={{ cluster }} -e CEPH_DAEMON=OSD_CEPH_DISK_ACTIVATE -e OSD_JOURNAL_SIZE={{ journal_size }} -e OSD_FILESTORE=1
------------------------------

Please let me know if I have missed anything.

Regards,
Vasishta

Comment 11 Vasishta 2017-09-13 10:43:15 UTC
Created attachment 1325308 [details]
File contains contents of OSD journald log snippet, all.yml contents, ansible-playbook log and inventory file

Comment 12 seb 2017-09-13 15:41:08 UTC
You need to set:

osd_scenario: non-collocated
devices:
  - /dev/sdb
  - /dev/sdc

dedicated_devices:
  - - /dev/sdd
    - /dev/sdd

Also leave ceph_osd_docker_extra_env empty and set ceph_osd_docker_prepare_env: -e OSD_JOURNAL_SIZE={{ journal_size }}

Thanks!

Comment 13 Vasishta 2017-09-14 15:36:23 UTC
Created attachment 1326116 [details]
File contains contents of OSD journald log snippet

Hi Sebastien,

Initially I was confused that it was 2 journal devices dedicated for single data disk. 
I tried today following your previous comment - having single dedicated journal devices for two data disks of different OSDs. Please Let me know if my inference is wrong.

It worked for non-dmcrypt scenario, but failed to activate OSD for dmcrypt scenario.

I have attached OSD journald logs as an attachment.

$ cat /etc/ansible/hosts |grep non-collocated
magna015 osd_scenario=non-collocated devices="['/dev/sdb','/dev/sdc']" dedicated_devices="['/dev/sdd','/dev/sdd']" ceph_osd_docker_prepare_env="-e OSD_JOURNAL_SIZE={{ journal_size }}"

magna020 osd_scenario=non-collocated dmcrypt=true devices="['/dev/sdb','/dev/sdc']" dedicated_devices="['/dev/sdd','/dev/sdd']" ceph_osd_docker_prepare_env="-e OSD_JOURNAL_SIZE={{ journal_size }}"


Regards,
Vasishta

Comment 14 seb 2017-09-14 17:27:37 UTC
Please show me you /usr/share/ceph-osd-run.sh for dmcrypt

Comment 15 seb 2017-09-14 17:35:05 UTC
We should also get a new container image from https://bugzilla.redhat.com/show_bug.cgi?id=1491799

So please try with this new one.
Thanks!

Comment 16 Vasishta 2017-09-14 17:58:32 UTC
Created attachment 1326152 [details]
Contents of /usr/share/osd-run.sh

Comment 17 seb 2017-09-18 21:46:53 UTC
Do you have the same error with the latest container image?
Thanks!

Comment 18 Vasishta 2017-09-19 13:24:16 UTC
Created attachment 1327945 [details]
File contains contents of OSD journald log snippet

Hi Sebastien,

I was waiting for new container image from https://bugzilla.redhat.com/show_bug.cgi?id=1491799 as you had suggested in Comment 15 . As Fixed In Version was not updated in BZ 1491799 , I was waiting for the same.

I tried using latest image we had for testing - ceph-3.0-rhel-7-docker-candidate-49954-20170915121930

I replaced the image in /usr/share/ceph-osd-run.sh and restarted the daemon after reloading. Still facing same issue

I have added journald log snippet as an attachment, please let me know if you need anything else.

Regards,
Vasishta

Comment 19 seb 2017-09-21 09:52:39 UTC
I can't reproduce.

See an example of osd.yml file:


osd_objectstore: filestore
osd_scenario: non-collocated
devices:
  - /dev/sda
  - /dev/sdb
dedicated_devices:
  - /dev/sdc
  - /dev/sdc
ceph_osd_docker_prepare_env: -e OSD_JOURNAL_SIZE={{ journal_size }} -e OSD_FORCE_ZAP=1


Please make sure to test with the latest container image.

Comment 20 Vasishta 2017-09-26 14:16:00 UTC
Hi Sebastien, 

Today I tried again with latest image [1]. As I have mentioned in Comment 13, Issue is still there Only dmcrypt scenario with similar journald log snippet as in Comment 18. 
Non-dmcrypt scenario is working fine.

[1] - ceph-3.0-rhel-7-docker-candidate-79149-20170925173725

Regards,
Vasishta

Comment 21 Sébastien Han 2017-09-26 14:46:08 UTC
We are currently building a new image, sorry for the inconvenience. See: https://bugzilla.redhat.com/show_bug.cgi?id=1495979

Comment 22 Sébastien Han 2017-09-26 15:33:06 UTC
ceph-3.0-rhel-7-docker-candidate-37847-20170926144235 is ready, please retest with this one, thanks!

Comment 23 Giulio Fidente 2017-09-26 19:46:16 UTC
Seb, I think we need this feature with Jewel too, are there updated container images for Ceph 2.x as well?

Comment 24 Sébastien Han 2017-09-26 22:01:30 UTC
@Giulio, it's a ceph-ansible patch only, there is nothing to do in the Jewel container.

Comment 25 Sébastien Han 2017-09-27 17:03:06 UTC
To clarify https://bugzilla.redhat.com/show_bug.cgi?id=1475820#c22 means ceph-3.0-rhel-7-docker-candidate-37847-20170926144235 fixes all the non-dmcrypt scenarios.

Comment 26 Vasishta 2017-09-28 14:18:52 UTC
Created attachment 1332021 [details]
File contains OSD journald log snippet

Hi,

Initialization of OSD with <dedicated + dmcrypt> scenario is still not working with latest container image [1]. I have attached journald log snippet of OSD.

I'm moving back the BZ to ASSIGNED state, please let me know if there are any concerns.

[1] ceph-3.0-rhel-7-docker-candidate-19625-20170928024408


Regards,
Vasishta

Comment 27 Sébastien Han 2017-09-28 21:53:04 UTC
Can I access this machine because I can not reproduce your issue?
Thanks.

Comment 28 Sébastien Han 2017-09-28 22:51:06 UTC
I also pushed a new version based on : https://github.com/ceph/ceph-docker/pull/791
Please try with that new image.

Comment 34 Ken Dreyer (Red Hat) 2017-10-02 15:32:06 UTC
ceph-ansible PR 1971 is not in any tagged version upstream

Comment 35 Sébastien Han 2017-10-02 15:45:06 UTC
It's in https://github.com/ceph/ceph-ansible/releases/tag/v3.0.0rc14

Comment 42 errata-xmlrpc 2017-12-05 23:38:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3387