Bug 1475820 - allow multi dedicated journals for container deployment
allow multi dedicated journals for container deployment
Status: CLOSED ERRATA
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Ceph-Ansible (Show other bugs)
2.4
Unspecified Unspecified
high Severity high
: rc
: 3.0
Assigned To: leseb
Vasishta
:
: 1484466 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-27 08:34 EDT by seb
Modified: 2018-01-29 09:29 EST (History)
17 users (show)

See Also:
Fixed In Version: RHEL: ceph-ansible-3.0.0-0.1.rc14.el7cp Ubuntu: ceph-ansible_3.0.0~rc14-2redhat1
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-12-05 18:38:03 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
File contains contents of OSD journald log snippet, all.yml contents, ansible-playbook log and inventory file (618.58 KB, text/plain)
2017-09-13 06:03 EDT, Vasishta
no flags Details
File contains contents of OSD journald log snippet, all.yml contents, ansible-playbook log and inventory file (635.65 KB, text/plain)
2017-09-13 06:43 EDT, Vasishta
no flags Details
File contains contents of OSD journald log snippet (21.69 KB, text/plain)
2017-09-14 11:36 EDT, Vasishta
no flags Details
Contents of /usr/share/osd-run.sh (2.19 KB, text/plain)
2017-09-14 13:58 EDT, Vasishta
no flags Details
File contains contents of OSD journald log snippet (17.67 KB, text/plain)
2017-09-19 09:24 EDT, Vasishta
no flags Details
File contains OSD journald log snippet (19.85 KB, text/plain)
2017-09-28 10:18 EDT, Vasishta
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Github ceph/ceph-ansible/pull/1724 None None None 2017-08-02 05:21 EDT
Github ceph/ceph-ansible/pull/1971 None None None 2017-09-29 14:32 EDT

  None (edit)
Description seb 2017-07-27 08:34:38 EDT
Description of problem:

We currently only support a single dedicated device to act as a journal when deploying ceph in containers with ceph-ansible.
We need to unlock this limitation.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Comment 2 Harish NV Rao 2017-08-02 05:15:11 EDT
@Seb, can you please let us know the customer use case for this enhancement? How many max dedicated journals are supported?
Comment 3 seb 2017-08-02 05:21:48 EDT
Harish, the use case is the same as a non-containerized deployment, users need to be able to use multiple dedicated devices to store their osd journal on, not only one. This is currently a huge limitation.

Work in progress here.
Comment 4 seb 2017-08-25 07:25:03 EDT
*** Bug 1484466 has been marked as a duplicate of this bug. ***
Comment 5 Ken Dreyer (Red Hat) 2017-08-30 16:00:17 EDT
Would you please tag and announce a new release of ceph-ansible upstream with this change?
Comment 8 Vasishta 2017-09-12 08:52:15 EDT
Hi Sebastien,

I couldn't get information anywhere to set variable 'dedicated_devices'. Can you please let me know how to set the variable ? Do I need to set any other variable with this one ?

As per my knowledge I think it can be initialized as :

dedicated_devices:
- - journal_device1
  - journal_device2
- - journal_device3
  - journal_device4

Please let me know whether if I'm right.

Thanks,
Vasishta
Comment 9 seb 2017-09-12 11:52:07 EDT
This is correct, you have to set 'devices' for the OSD data and 'dedicated_devices' for journals. And yes this is the same way as the non-containerized scenario.
Does that help?
Comment 10 Vasishta 2017-09-13 06:03 EDT
Created attachment 1325304 [details]
File contains contents of OSD journald log snippet, all.yml contents, ansible-playbook log and inventory file

Hi Sebastien, 

Thanks a lot for the info.
I tried it today but OSD activation failed, OSD journald log had below lines (Above attachment contains larger log snippet)

raise Error('%s does not exist' % args.path)
ceph-osd-run.sh[10379]: ceph_disk.main.Error: Error: /dev/sdb1 does not exist

I think I have hit the same issue as in BZ 1489835.

Contents of osds.yml -

$ cat group_vars/osds.yml | egrep -v ^# | grep -v ^$
---
dummy:
devices:
  - /dev/sdb
dedicated_devices:
  - - /dev/sdc
    - /dev/sdd
ceph_osd_docker_prepare_env: -e CLUSTER={{ cluster }} -e OSD_JOURNAL_SIZE={{ journal_size }} -e OSD_FORCE_ZAP=1 -e OSD_JOURNAL={{ dedicated_devices[0] }} -e OSD_FILESTORE=1
ceph_osd_docker_extra_env: -e CLUSTER={{ cluster }} -e CEPH_DAEMON=OSD_CEPH_DISK_ACTIVATE -e OSD_JOURNAL_SIZE={{ journal_size }} -e OSD_FILESTORE=1
------------------------------

Please let me know if I have missed anything.

Regards,
Vasishta
Comment 11 Vasishta 2017-09-13 06:43 EDT
Created attachment 1325308 [details]
File contains contents of OSD journald log snippet, all.yml contents, ansible-playbook log and inventory file
Comment 12 seb 2017-09-13 11:41:08 EDT
You need to set:

osd_scenario: non-collocated
devices:
  - /dev/sdb
  - /dev/sdc

dedicated_devices:
  - - /dev/sdd
    - /dev/sdd

Also leave ceph_osd_docker_extra_env empty and set ceph_osd_docker_prepare_env: -e OSD_JOURNAL_SIZE={{ journal_size }}

Thanks!
Comment 13 Vasishta 2017-09-14 11:36 EDT
Created attachment 1326116 [details]
File contains contents of OSD journald log snippet

Hi Sebastien,

Initially I was confused that it was 2 journal devices dedicated for single data disk. 
I tried today following your previous comment - having single dedicated journal devices for two data disks of different OSDs. Please Let me know if my inference is wrong.

It worked for non-dmcrypt scenario, but failed to activate OSD for dmcrypt scenario.

I have attached OSD journald logs as an attachment.

$ cat /etc/ansible/hosts |grep non-collocated
magna015 osd_scenario=non-collocated devices="['/dev/sdb','/dev/sdc']" dedicated_devices="['/dev/sdd','/dev/sdd']" ceph_osd_docker_prepare_env="-e OSD_JOURNAL_SIZE={{ journal_size }}"

magna020 osd_scenario=non-collocated dmcrypt=true devices="['/dev/sdb','/dev/sdc']" dedicated_devices="['/dev/sdd','/dev/sdd']" ceph_osd_docker_prepare_env="-e OSD_JOURNAL_SIZE={{ journal_size }}"


Regards,
Vasishta
Comment 14 seb 2017-09-14 13:27:37 EDT
Please show me you /usr/share/ceph-osd-run.sh for dmcrypt
Comment 15 seb 2017-09-14 13:35:05 EDT
We should also get a new container image from https://bugzilla.redhat.com/show_bug.cgi?id=1491799

So please try with this new one.
Thanks!
Comment 16 Vasishta 2017-09-14 13:58 EDT
Created attachment 1326152 [details]
Contents of /usr/share/osd-run.sh
Comment 17 seb 2017-09-18 17:46:53 EDT
Do you have the same error with the latest container image?
Thanks!
Comment 18 Vasishta 2017-09-19 09:24 EDT
Created attachment 1327945 [details]
File contains contents of OSD journald log snippet

Hi Sebastien,

I was waiting for new container image from https://bugzilla.redhat.com/show_bug.cgi?id=1491799 as you had suggested in Comment 15 . As Fixed In Version was not updated in BZ 1491799 , I was waiting for the same.

I tried using latest image we had for testing - ceph-3.0-rhel-7-docker-candidate-49954-20170915121930

I replaced the image in /usr/share/ceph-osd-run.sh and restarted the daemon after reloading. Still facing same issue

I have added journald log snippet as an attachment, please let me know if you need anything else.

Regards,
Vasishta
Comment 19 seb 2017-09-21 05:52:39 EDT
I can't reproduce.

See an example of osd.yml file:


osd_objectstore: filestore
osd_scenario: non-collocated
devices:
  - /dev/sda
  - /dev/sdb
dedicated_devices:
  - /dev/sdc
  - /dev/sdc
ceph_osd_docker_prepare_env: -e OSD_JOURNAL_SIZE={{ journal_size }} -e OSD_FORCE_ZAP=1


Please make sure to test with the latest container image.
Comment 20 Vasishta 2017-09-26 10:16:00 EDT
Hi Sebastien, 

Today I tried again with latest image [1]. As I have mentioned in Comment 13, Issue is still there Only dmcrypt scenario with similar journald log snippet as in Comment 18. 
Non-dmcrypt scenario is working fine.

[1] - ceph-3.0-rhel-7-docker-candidate-79149-20170925173725

Regards,
Vasishta
Comment 21 leseb 2017-09-26 10:46:08 EDT
We are currently building a new image, sorry for the inconvenience. See: https://bugzilla.redhat.com/show_bug.cgi?id=1495979
Comment 22 leseb 2017-09-26 11:33:06 EDT
ceph-3.0-rhel-7-docker-candidate-37847-20170926144235 is ready, please retest with this one, thanks!
Comment 23 Giulio Fidente 2017-09-26 15:46:16 EDT
Seb, I think we need this feature with Jewel too, are there updated container images for Ceph 2.x as well?
Comment 24 leseb 2017-09-26 18:01:30 EDT
@Giulio, it's a ceph-ansible patch only, there is nothing to do in the Jewel container.
Comment 25 leseb 2017-09-27 13:03:06 EDT
To clarify https://bugzilla.redhat.com/show_bug.cgi?id=1475820#c22 means ceph-3.0-rhel-7-docker-candidate-37847-20170926144235 fixes all the non-dmcrypt scenarios.
Comment 26 Vasishta 2017-09-28 10:18 EDT
Created attachment 1332021 [details]
File contains OSD journald log snippet

Hi,

Initialization of OSD with <dedicated + dmcrypt> scenario is still not working with latest container image [1]. I have attached journald log snippet of OSD.

I'm moving back the BZ to ASSIGNED state, please let me know if there are any concerns.

[1] ceph-3.0-rhel-7-docker-candidate-19625-20170928024408


Regards,
Vasishta
Comment 27 leseb 2017-09-28 17:53:04 EDT
Can I access this machine because I can not reproduce your issue?
Thanks.
Comment 28 leseb 2017-09-28 18:51:06 EDT
I also pushed a new version based on : https://github.com/ceph/ceph-docker/pull/791
Please try with that new image.
Comment 34 Ken Dreyer (Red Hat) 2017-10-02 11:32:06 EDT
ceph-ansible PR 1971 is not in any tagged version upstream
Comment 42 errata-xmlrpc 2017-12-05 18:38:03 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3387

Note You need to log in before you can comment on or make changes to this bug.