1590526 – OSPd unable to deploy RHCS3.0 (Bluestore) Error : bluestore mkfs fsck found fatal error: (5) Input/output error

Bug 1590526 - OSPd unable to deploy RHCS3.0 (Bluestore) Error : bluestore mkfs fsck found fatal error: (5) Input/output error

Summary: OSPd unable to deploy RHCS3.0 (Bluestore) Error : bluestore mkfs fsck found f...

Keywords:
Status:	CLOSED DUPLICATE of bug 1608946
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Ansible
Sub Component:
Version:	3.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	rc
Target Release:	3.*
Assignee:	Sébastien Han
QA Contact:	ceph-qe-bugs
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1585482 (view as bug list)
Depends On:	1608946
Blocks:	1594251
TreeView+	depends on / blocked

Reported:	2018-06-12 19:09 UTC by karan singh
Modified:	2019-05-15 18:45 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-07-27 20:25:09 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description karan singh 2018-06-12 19:09:44 UTC

Description of problem:

The last BZ:1585482 which I raised was related to the exact same error but with OSP-12 + RHCS 3.0 version. I was adviced that RHCS 3.0 is not supported/tested with OSP-12.

So i repeated my experiment with OSP-13 and RHCS 3.0 (a supported/tested configuration). After deploying OSP-13 using OSPd, OpenStack deployment was clean 

## Ceph OSDs are DOWN

[heat-admin@controller-0 ~]$ ceph -s
  cluster:
    id:     ce7bd88c-6a9c-11e8-a882-2047478ccfaa
    health: HEALTH_WARN
            no active mgr

  services:
    mon: 1 daemons, quorum controller-0
    mgr: no daemons active
    osd: 60 osds: 0 up, 0 in

  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 bytes
    usage:   0 kB used, 0 kB / 0 kB avail
    pgs:


## OSDs are flapping

[heat-admin@ceph-storage-0 ~]$ sudo docker ps
CONTAINER ID        IMAGE                                                   COMMAND             CREATED             STATUS                  PORTS               NAMES
f259f333d2a9        192.168.120.1:8787/rhceph/rhceph-3-rhel7:latest         "/entrypoint.sh"    1 second ago        Up Less than a second                       ceph-osd-ceph-storage-0-sdf
55f519615632        192.168.120.1:8787/rhceph/rhceph-3-rhel7:latest         "/entrypoint.sh"    2 seconds ago       Up 2 seconds                                ceph-osd-ceph-storage-0-sdg
ac148f97e891        192.168.120.1:8787/rhceph/rhceph-3-rhel7:latest         "/entrypoint.sh"    4 seconds ago       Up 4 seconds                                ceph-osd-ceph-storage-0-sdj
55ba302877ee        192.168.120.1:8787/rhceph/rhceph-3-rhel7:latest         "/entrypoint.sh"    6 seconds ago       Up 6 seconds                                ceph-osd-ceph-storage-0-sdd
89c860a5b291        192.168.120.1:8787/rhceph/rhceph-3-rhel7:latest         "/entrypoint.sh"    8 seconds ago       Up 7 seconds                                ceph-osd-ceph-storage-0-sdk
6be986fc3049        192.168.120.1:8787/rhceph/rhceph-3-rhel7:latest         "/entrypoint.sh"    10 seconds ago      Up 9 seconds                                ceph-osd-ceph-storage-0-sdh
ff233ee5104a        192.168.120.1:8787/rhceph/rhceph-3-rhel7:latest         "/entrypoint.sh"    15 seconds ago      Up 14 seconds                               ceph-osd-ceph-storage-0-sde
a8f8f9a97e3e        192.168.120.1:8787/rhosp13-beta/openstack-cron:latest   "kolla_start"       2 hours ago         Up 2 hours                                  logrotate_crond
[heat-admin@ceph-storage-0 ~]$
[heat-admin@ceph-storage-0 ~]$
[heat-admin@ceph-storage-0 ~]$ sudo docker ps
CONTAINER ID        IMAGE                                                   COMMAND             CREATED             STATUS                  PORTS               NAMES
c93cb93e6794        192.168.120.1:8787/rhceph/rhceph-3-rhel7:latest         "/entrypoint.sh"    1 second ago        Up Less than a second                       ceph-osd-ceph-storage-0-sdj
1c4c8e94b7f4        192.168.120.1:8787/rhceph/rhceph-3-rhel7:latest         "/entrypoint.sh"    3 seconds ago       Up 1 second                                 ceph-osd-ceph-storage-0-sdd
82e9ca7b314e        192.168.120.1:8787/rhceph/rhceph-3-rhel7:latest         "/entrypoint.sh"    5 seconds ago       Up 3 seconds                                ceph-osd-ceph-storage-0-sdk
2299a9229ee7        192.168.120.1:8787/rhceph/rhceph-3-rhel7:latest         "/entrypoint.sh"    7 seconds ago       Up 5 seconds                                ceph-osd-ceph-storage-0-sdh
7cb42e826804        192.168.120.1:8787/rhceph/rhceph-3-rhel7:latest         "/entrypoint.sh"    12 seconds ago      Up 10 seconds                               ceph-osd-ceph-storage-0-sde
04ff29e7ec5c        192.168.120.1:8787/rhceph/rhceph-3-rhel7:latest         "/entrypoint.sh"    13 seconds ago      Up 12 seconds                               ceph-osd-ceph-storage-0-sdm
6d400d81f198        192.168.120.1:8787/rhceph/rhceph-3-rhel7:latest         "/entrypoint.sh"    15 seconds ago      Up 14 seconds                               ceph-osd-ceph-storage-0-sdi
a8f8f9a97e3e        192.168.120.1:8787/rhosp13-beta/openstack-cron:latest   "kolla_start"       2 hours ago         Up 2 hours                                  logrotate_crond
[heat-admin@ceph-storage-0 ~]$

## Logs from journalctl -u ceph-osd@<HDD>

Jun 12 17:01:27 ceph-storage-0 systemd[1]: Started Ceph OSD.
Jun 12 17:01:28 ceph-storage-0 ceph-osd-run.sh[762534]: Error response from daemon: No such container: expose_partitions_sdm
Jun 12 17:01:32 ceph-storage-0 ceph-osd-run.sh[762534]: 2018-06-12 17:01:32  /entrypoint.sh: static: does not generate config
Jun 12 17:01:33 ceph-storage-0 ceph-osd-run.sh[762534]: main_activate: path = /dev/sdm1
Jun 12 17:01:34 ceph-storage-0 ceph-osd-run.sh[762534]: get_dm_uuid: get_dm_uuid /dev/sdm1 uuid path is /sys/dev/block/8:193/dm/uuid
Jun 12 17:01:34 ceph-storage-0 ceph-osd-run.sh[762534]: command: Running command: /usr/sbin/blkid -o udev -p /dev/sdm1
Jun 12 17:01:34 ceph-storage-0 ceph-osd-run.sh[762534]: command: Running command: /sbin/blkid -p -s TYPE -o value -- /dev/sdm1
Jun 12 17:01:34 ceph-storage-0 ceph-osd-run.sh[762534]: command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
Jun 12 17:01:35 ceph-storage-0 ceph-osd-run.sh[762534]: command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs
Jun 12 17:01:35 ceph-storage-0 ceph-osd-run.sh[762534]: mount: Mounting /dev/sdm1 on /var/lib/ceph/tmp/mnt.81tYoO with options noatime,inode64
Jun 12 17:01:35 ceph-storage-0 ceph-osd-run.sh[762534]: command_check_call: Running command: /usr/bin/mount -t xfs -o noatime,inode64 -- /dev/sdm1 /var/lib/ceph/tmp/mnt.81tYoO
Jun 12 17:01:35 ceph-storage-0 ceph-osd-run.sh[762534]: command: Running command: /usr/sbin/restorecon /var/lib/ceph/tmp/mnt.81tYoO
Jun 12 17:01:35 ceph-storage-0 ceph-osd-run.sh[762534]: activate: Cluster uuid is ce7bd88c-6a9c-11e8-a882-2047478ccfaa
Jun 12 17:01:35 ceph-storage-0 ceph-osd-run.sh[762534]: command: Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid
Jun 12 17:01:35 ceph-storage-0 ceph-osd-run.sh[762534]: activate: Cluster name is ceph
Jun 12 17:01:35 ceph-storage-0 ceph-osd-run.sh[762534]: activate: OSD uuid is fed4806b-b65c-4cbf-8b6a-9ae2399875b6
Jun 12 17:01:35 ceph-storage-0 ceph-osd-run.sh[762534]: activate: OSD id is 42
Jun 12 17:01:35 ceph-storage-0 ceph-osd-run.sh[762534]: activate: Initializing OSD...
Jun 12 17:01:35 ceph-storage-0 ceph-osd-run.sh[762534]: command_check_call: Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/tmp/mnt.81tYoO/activate.monmap
Jun 12 17:01:35 ceph-storage-0 ceph-osd-run.sh[762534]: got monmap epoch 1
Jun 12 17:01:35 ceph-storage-0 ceph-osd-run.sh[762534]: command_check_call: Running command: /usr/bin/ceph-osd --cluster ceph --mkfs -i 42 --monmap /var/lib/ceph/tmp/mnt.81tYoO/activate.monmap --osd-data /var/lib/ceph/tmp/mnt.81tYoO --osd-uuid fed4806b-b65c-4cbf-8b6a-9ae2399875b6 --set
Jun 12 17:01:35 ceph-storage-0 ceph-osd-run.sh[762534]: 2018-06-12 17:01:35.994819 7fc24e71ed80 -1 bluestore(/var/lib/ceph/tmp/mnt.81tYoO/block) _check_or_set_bdev_label bdev /var/lib/ceph/tmp/mnt.81tYoO/block fsid b8ceb78b-4766-4cc3-8496-7bed005d3769 does not match our fsid fed4806b-b
Jun 12 17:01:36 ceph-storage-0 ceph-osd-run.sh[762534]: 2018-06-12 17:01:36.251825 7fc24e71ed80 -1 bluestore(/var/lib/ceph/tmp/mnt.81tYoO) mkfs fsck found fatal error: (5) Input/output error
Jun 12 17:01:36 ceph-storage-0 ceph-osd-run.sh[762534]: 2018-06-12 17:01:36.251860 7fc24e71ed80 -1 OSD::mkfs: ObjectStore::mkfs failed with error (5) Input/output error
Jun 12 17:01:36 ceph-storage-0 ceph-osd-run.sh[762534]: 2018-06-12 17:01:36.251978 7fc24e71ed80 -1  ** ERROR: error creating empty object store in /var/lib/ceph/tmp/mnt.81tYoO: (5) Input/output error
Jun 12 17:01:36 ceph-storage-0 ceph-osd-run.sh[762534]: mount_activate: Failed to activate
Jun 12 17:01:36 ceph-storage-0 ceph-osd-run.sh[762534]: unmount: Unmounting /var/lib/ceph/tmp/mnt.81tYoO
Jun 12 17:01:36 ceph-storage-0 ceph-osd-run.sh[762534]: command_check_call: Running command: /bin/umount -- /var/lib/ceph/tmp/mnt.81tYoO
Jun 12 17:01:36 ceph-storage-0 ceph-osd-run.sh[762534]: Traceback (most recent call last):
Jun 12 17:01:36 ceph-storage-0 ceph-osd-run.sh[762534]: File "/usr/sbin/ceph-disk", line 9, in <module>
Jun 12 17:01:36 ceph-storage-0 ceph-osd-run.sh[762534]: load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
Jun 12 17:01:36 ceph-storage-0 ceph-osd-run.sh[762534]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5735, in run
Jun 12 17:01:36 ceph-storage-0 ceph-osd-run.sh[762534]: main(sys.argv[1:])
Jun 12 17:01:36 ceph-storage-0 ceph-osd-run.sh[762534]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5686, in main
Jun 12 17:01:36 ceph-storage-0 ceph-osd-run.sh[762534]: args.func(args)
Jun 12 17:01:36 ceph-storage-0 ceph-osd-run.sh[762534]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3776, in main_activate
Jun 12 17:01:36 ceph-storage-0 ceph-osd-run.sh[762534]: reactivate=args.reactivate,
Jun 12 17:01:36 ceph-storage-0 ceph-osd-run.sh[762534]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3539, in mount_activate
Jun 12 17:01:36 ceph-storage-0 ceph-osd-run.sh[762534]: (osd_id, cluster) = activate(path, activate_key_template, init)
Jun 12 17:01:36 ceph-storage-0 ceph-osd-run.sh[762534]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3716, in activate
Jun 12 17:01:36 ceph-storage-0 ceph-osd-run.sh[762534]: keyring=keyring,
Jun 12 17:01:36 ceph-storage-0 ceph-osd-run.sh[762534]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3168, in mkfs
Jun 12 17:01:36 ceph-storage-0 ceph-osd-run.sh[762534]: '--setgroup', get_ceph_group(),
Jun 12 17:01:36 ceph-storage-0 ceph-osd-run.sh[762534]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 566, in command_check_call
Jun 12 17:01:36 ceph-storage-0 ceph-osd-run.sh[762534]: return subprocess.check_call(arguments)
Jun 12 17:01:36 ceph-storage-0 ceph-osd-run.sh[762534]: File "/usr/lib64/python2.7/subprocess.py", line 542, in check_call
Jun 12 17:01:36 ceph-storage-0 ceph-osd-run.sh[762534]: raise CalledProcessError(retcode, cmd)
Jun 12 17:01:36 ceph-storage-0 ceph-osd-run.sh[762534]: subprocess.CalledProcessError: Command '['/usr/bin/ceph-osd', '--cluster', 'ceph', '--mkfs', '-i', u'42', '--monmap', '/var/lib/ceph/tmp/mnt.81tYoO/activate.monmap', '--osd-data', '/var/lib/ceph/tmp/mnt.81tYoO', '--osd-uuid', u'fe
Jun 12 17:01:36 ceph-storage-0 systemd[1]: ceph-osd: main process exited, code=exited, status=1/FAILURE
Jun 12 17:01:36 ceph-storage-0 docker[787214]: Error response from daemon: No such container: ceph-osd-ceph-storage-0-sdm
Jun 12 17:01:36 ceph-storage-0 systemd[1]: Unit ceph-osd entered failed state.
Jun 12 17:01:36 ceph-storage-0 systemd[1]: ceph-osd failed.


Version-Release number of selected component (if applicable):

(undercloud) [stack@refarch-r220-02 ~]$ rpm -qa | grep -i openstack
openstack-ironic-common-10.1.2-3.el7ost.noarch
openstack-tripleo-heat-templates-8.0.2-14.el7ost.noarch
openstack-mistral-common-6.0.2-1.el7ost.noarch
openstack-nova-api-17.0.3-0.20180420001138.el7ost.noarch
openstack-nova-scheduler-17.0.3-0.20180420001138.el7ost.noarch
puppet-openstack_extras-12.4.1-0.20180413042250.2634296.el7ost.noarch
openstack-nova-compute-17.0.3-0.20180420001138.el7ost.noarch
openstack-heat-api-10.0.1-0.20180411125639.825731d.el7ost.noarch
openstack-mistral-executor-6.0.2-1.el7ost.noarch
openstack-swift-object-2.17.1-0.20180314165245.caeeb54.el7ost.noarch
openstack-tripleo-common-containers-8.6.1-7.el7ost.noarch
openstack-selinux-0.8.14-5.el7ost.noarch
openstack-keystone-13.0.1-0.20180420194847.7bd6454.el7ost.noarch
openstack-neutron-openvswitch-12.0.2-0.20180421011358.0ec54fd.el7ost.noarch
openstack-swift-proxy-2.17.1-0.20180314165245.caeeb54.el7ost.noarch
openstack-heat-common-10.0.1-0.20180411125639.825731d.el7ost.noarch
openstack-ironic-conductor-10.1.2-3.el7ost.noarch
openstack-mistral-engine-6.0.2-1.el7ost.noarch
openstack-nova-placement-api-17.0.3-0.20180420001138.el7ost.noarch
openstack-nova-common-17.0.3-0.20180420001138.el7ost.noarch
openstack-swift-container-2.17.1-0.20180314165245.caeeb54.el7ost.noarch
openstack-tripleo-common-8.6.1-7.el7ost.noarch
python-openstackclient-lang-3.14.1-1.el7ost.noarch
openstack-heat-engine-10.0.1-0.20180411125639.825731d.el7ost.noarch
openstack-ironic-api-10.1.2-3.el7ost.noarch
openstack-ironic-inspector-7.2.1-0.20180409163359.2435d97.el7ost.noarch
openstack-tempest-18.0.0-2.el7ost.noarch
openstack-mistral-api-6.0.2-1.el7ost.noarch
openstack-tripleo-validations-8.4.1-4.el7ost.noarch
openstack-glance-16.0.1-2.el7ost.noarch
openstack-swift-account-2.17.1-0.20180314165245.caeeb54.el7ost.noarch
openstack-neutron-12.0.2-0.20180421011358.0ec54fd.el7ost.noarch
openstack-tripleo-puppet-elements-8.0.0-2.el7ost.noarch
openstack-heat-api-cfn-10.0.1-0.20180411125639.825731d.el7ost.noarch
openstack-ironic-staging-drivers-0.9.0-4.el7ost.noarch
python2-openstacksdk-0.11.3-1.el7ost.noarch
python2-openstackclient-3.14.1-1.el7ost.noarch
openstack-tripleo-ui-8.3.1-2.el7ost.noarch
openstack-zaqar-6.0.1-1.el7ost.noarch
puppet-openstacklib-12.4.0-0.20180329042555.4b30e6f.el7ost.noarch
openstack-neutron-common-12.0.2-0.20180421011358.0ec54fd.el7ost.noarch
openstack-neutron-ml2-12.0.2-0.20180421011358.0ec54fd.el7ost.noarch
openstack-tripleo-image-elements-8.0.1-1.el7ost.noarch
openstack-nova-conductor-17.0.3-0.20180420001138.el7ost.noarch
(undercloud) [stack@refarch-r220-02 ~]$


(undercloud) [stack@refarch-r220-02 ~]$ rpm -qa | grep -i  ceph-ansible
ceph-ansible-3.1.0-0.1.beta8.el7cp.noarch
(undercloud) [stack@refarch-r220-02 ~]$

How reproducible:
Deploy OSP-13 with RHCS 3

Steps to Reproduce:
1. Create osp-13 undercloud
2. Deploy osp-13 overcloud using OSPd and let OSPd deploy RHCS 3.0


Actual results:

Jun 12 17:01:35 ceph-storage-0 ceph-osd-run.sh[762534]: 2018-06-12 17:01:35.994819 7fc24e71ed80 -1 bluestore(/var/lib/ceph/tmp/mnt.81tYoO/block) _check_or_set_bdev_label bdev /var/lib/ceph/tmp/mnt.81tYoO/block fsid b8ceb78b-4766-4cc3-8496-7bed005d3769 does not match our fsid fed4806b-b
Jun 12 17:01:36 ceph-storage-0 ceph-osd-run.sh[762534]: 2018-06-12 17:01:36.251825 7fc24e71ed80 -1 bluestore(/var/lib/ceph/tmp/mnt.81tYoO) mkfs fsck found fatal error: (5) Input/output error
Jun 12 17:01:36 ceph-storage-0 ceph-osd-run.sh[762534]: 2018-06-12 17:01:36.251860 7fc24e71ed80 -1 OSD::mkfs: ObjectStore::mkfs failed with error (5) Input/output error
Jun 12 17:01:36 ceph-storage-0 ceph-osd-run.sh[762534]: 2018-06-12 17:01:36.251978 7fc24e71ed80 -1  ** ERROR: error creating empty object store in /var/lib/ceph/tmp/mnt.81tYoO: (5) Input/output error
Jun 12 17:01:36 ceph-storage-0 ceph-osd-run.sh[762534]: mount_activate: Failed to activate

Expected results:

Both OSP and Ceph cluster should be up and running. Able to launch nova vms backed by Ceph

Additional info:

ceph parameters that i am using : https://github.com/ksingh7/OSP-12_RHCS_Deployment_Guide/blob/master/templates-part-2-test/ceph-config-bluestore.yaml

Comment 1 karan singh 2018-06-12 19:12:09 UTC

BTW Ironic is configured to clean the nodes, so everytime when nodes are moved from Manage >> Available state, they are cleaned by ironic.

Comment 11 John Fulton 2018-06-28 13:28:59 UTC

If we use an inventory like the following (which TripleO set up for me) with ceph-ansible alone, then can we reproduce this problem?

[root@undercloud ansible-mistral-actionA6bbkK]# cat inventory.yaml 
all:
  vars:
    admin_secret: AQBDCTNbAAAAABAA9FqF71dP9ASdoCkO4eipRA==
    ceph_conf_overrides:
      global: {bluestore block db size: 67108864, bluestore block size: 5368709120,
        bluestore block wal size: 134217728, bluestore fsck on mount: true, enable experimental unrecoverable data corrupting features: bluestore
          rocksdb, osd_pool_default_pg_num: 32, osd_pool_default_pgp_num: 32, osd_pool_default_size: 1,
        rgw_keystone_accepted_roles: 'Member, admin', rgw_keystone_admin_domain: default,
        rgw_keystone_admin_password: j2CwCGDHbWMw2NnWbasAFZkjR, rgw_keystone_admin_project: service,
        rgw_keystone_admin_user: swift, rgw_keystone_api_version: 3, rgw_keystone_implicit_tenants: 'true',
        rgw_keystone_revocation_interval: '0', rgw_keystone_url: 'http://192.168.24.14:5000',
        rgw_s3_auth_use_keystone: 'true'}
    ceph_docker_image: ceph/daemon
    ceph_docker_image_tag: v3.0.3-stable-3.0-luminous-centos-7-x86_64
    ceph_docker_registry: 192.168.24.1:8787
    ceph_mgr_docker_extra_env: -e MGR_DASHBOARD=0
    ceph_origin: distro
    ceph_osd_docker_cpu_limit: 1
    ceph_osd_docker_memory_limit: 5g
    ceph_stable: true
    cephfs: cephfs
    cephfs_data: manila_data
    cephfs_metadata: manila_metadata
    cephfs_pools:
    - {name: manila_data, pgs: 128}
    - {name: manila_metadata, pgs: 128}
    cluster: ceph
    cluster_network: 192.168.24.0/24
    containerized_deployment: true
    dedicated_devices: [/dev/vdd, /dev/vdd]
    devices: [/dev/vdb, /dev/vdc]
    docker: true
    fetch_directory: /tmp/file-mistral-actionYYpjBm
    fsid: 14c53142-79bd-11e8-9ec3-006063b643f8
    generate_fsid: false
    ip_version: ipv4
    ireallymeanit: 'yes'
    keys:
    - {key: AQBDCTNbAAAAABAAnD6DBTlqB3S/spEWZLqpkg==, mgr_cap: allow *, mode: '0600',
      mon_cap: allow r, name: client.openstack, osd_cap: 'allow class-read object_prefix
        rbd_children, allow rwx pool=volumes, allow rwx pool=backups, allow rwx pool=vms,
        allow rwx pool=images, allow rwx pool='}
    - {key: AQBDCTNbAAAAABAA0DBho/DrE/c4mhcESAaDsQ==, mds_cap: allow *, mgr_cap: allow
        *, mode: '0600', mon_cap: 'allow r, allow command \"auth del\", allow command
        \"auth caps\", allow command \"auth get\", allow command \"auth get-or-create\"',
      name: client.manila, osd_cap: allow rw}
    - {key: AQBDCTNbAAAAABAAYkwKO/QNv6venujH9OYheA==, mgr_cap: allow *, mode: '0600',
      mon_cap: allow rw, name: client.radosgw, osd_cap: allow rwx}
    monitor_address_block: 192.168.24.0/24
    monitor_secret: AQBDCTNbAAAAABAAMF8wCW5TBMEoZuViTecbuQ==
    ntp_service_enabled: false
    openstack_config: true
    openstack_keys:
    - {key: AQBDCTNbAAAAABAAnD6DBTlqB3S/spEWZLqpkg==, mgr_cap: allow *, mode: '0600',
      mon_cap: allow r, name: client.openstack, osd_cap: 'allow class-read object_prefix
        rbd_children, allow rwx pool=volumes, allow rwx pool=backups, allow rwx pool=vms,
        allow rwx pool=images, allow rwx pool='}
    - {key: AQBDCTNbAAAAABAA0DBho/DrE/c4mhcESAaDsQ==, mds_cap: allow *, mgr_cap: allow
        *, mode: '0600', mon_cap: 'allow r, allow command \"auth del\", allow command
        \"auth caps\", allow command \"auth get\", allow command \"auth get-or-create\"',
      name: client.manila, osd_cap: allow rw}
    - {key: AQBDCTNbAAAAABAAYkwKO/QNv6venujH9OYheA==, mgr_cap: allow *, mode: '0600',
      mon_cap: allow rw, name: client.radosgw, osd_cap: allow rwx}
    openstack_pools:
    - {application: rbd, name: images, pg_num: 32, rule_name: replicated_rule}
    - {application: rbd, name: backups, pg_num: 32, rule_name: replicated_rule}
    - {application: rbd, name: vms, pg_num: 32, rule_name: replicated_rule}
    - {application: rbd, name: volumes, pg_num: 32, rule_name: replicated_rule}
    osd_objectstore: bluestore
    osd_scenario: non-collocated
    pools: []
    public_network: 192.168.24.0/24
    user_config: true
clients:
  hosts:
    192.168.24.6: {}
mdss:
  hosts:
    192.168.24.11: {}
    192.168.24.12: {}
    192.168.24.8: {}
mgrs:
  hosts:
    192.168.24.11: {}
    192.168.24.12: {}
    192.168.24.8: {}
mons:
  hosts:
    192.168.24.11: {}
    192.168.24.12: {}
    192.168.24.8: {}
nfss:
  hosts: {}
osds:
  hosts:
    192.168.24.13: {}
    192.168.24.17: {}
    192.168.24.9: {}
rbdmirrors:
  hosts: {}
rgws:
  hosts: {}
[root@undercloud ansible-mistral-actionA6bbkK]#

Comment 14 Adam Huffman 2018-07-11 09:14:27 UTC

In case it's useful, I'm seeing almost exactly the same problem.

Comment 15 Giulio Fidente 2018-07-18 08:21:48 UTC

*** Bug 1585482 has been marked as a duplicate of this bug. ***

Comment 16 John Fulton 2018-07-18 13:45:40 UTC

Bluestore is working in this ceph-ansible example: 

https://github.com/ceph/ceph-ansible/tree/master/tests/functional/centos/7/bs-osds-container

Use it to figure out what is being passed incorrectly.

Comment 17 John Fulton 2018-07-18 13:50:45 UTC

We should try this next. It should work with Lumionus. 

parameter_defaults:
  CephAnsiblePlaybookVerbosity: 1
  CephAnsibleEnvironmentVariables:
    ANSIBLE_SSH_RETRIES: '6'
  CephAnsibleDisksConfig:
    devices:
      - /dev/vdb
      - /dev/vdc
    dedicated_devices:
      - /dev/vdd
      - /dev/vdd
  CephAnsibleExtraConfig:
    osd_scenario: non-collocated
    osd_objectstore: bluestore
    ceph_osd_docker_memory_limit: 5g
    ceph_osd_docker_cpu_limit: 1
  CephConfigOverrides:
    bluestore block db size: 67108864
    bluestore block wal size: 134217728

Comment 19 John Fulton 2018-07-18 15:22:53 UTC

(In reply to John Fulton from comment #17)
> We should try this next. It should work with Lumionus. 

 https://review.openstack.org/#/c/547682/

Comment 21 John Fulton 2018-07-27 20:25:09 UTC

Status update:

An OSP13 deployment which passes 4 additional THT parameters [0] to request bluestore and uses the ceph container rhceph-3-rhel7:3-9 [1] hits the race condition documented in bz 1608946 and the deployment fails. Changing to the rhceph-3-rhel7:3-11 [2] container results in a failed deployment but with 13 of 15 requested OSDs running. The other two OSDs failed because of the same race condition. I don't think 3-11 vs 3-9 was significant as they have the same version of ceph-disk.

The issue seems to be the ceph-disk race condition (1608946). In theory you could deploy, get some percentage of OSDs, workaround the race condition until you have 100% of your OSDs, and then update the deployment so that it finishes and you have OSP13 + bluestore. The workarounds would be the same as the ones documented for a different ceph-disk race condition. It comes down to repeating until you succeed because you win the race. An example is in:

https://bugzilla.redhat.com/show_bug.cgi?id=1494543

The real fix is to avoid this race condition when using bluestore the same way the above was fixed to avoid the race using filestore, thus I'm marking this as a duplicate of 1608946 which focuses just on ceph-disk, not ceph-ansible's deployment of it. This bug also mentions OSP13, but OSP13 doesn't support bluestore.

[0] https://review.openstack.org/#/c/547682/3/ci/environments/scenario001-multinode-containers.yaml

[1] https://access.redhat.com/containers/#/registry.access.redhat.com/rhceph/rhceph-3-rhel7/images/3-9

[2] https://access.redhat.com/containers/#/registry.access.redhat.com/rhceph/rhceph-3-rhel7/images/3-11

*** This bug has been marked as a duplicate of bug 1608946 ***

Comment 22 John Fulton 2019-05-15 18:42:42 UTC

Do not incorrectly conclude that OSP13 will not work with Bluestore as a result of this bug. 
This bug is about RHCS 3.1 not supporting bluestore.
If you hit this Ceph (not openstack) bug just use RHCS 3.2
You can use RHCS3.2 with Bluestore and OpenStack documents have been updated to recommend you do this

https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html-single/deploying_an_overcloud_with_containerized_red_hat_ceph/index#using_bluestore_in_ceph_3_2_and_later

Comment 23 John Fulton 2019-05-15 18:45:13 UTC

(In reply to John Fulton from comment #21)
> The real fix is to avoid this race condition when using bluestore the same
> way the above was fixed to avoid the race using filestore, thus I'm marking
> this as a duplicate of 1608946 which focuses just on ceph-disk, not
> ceph-ansible's deployment of it. This bug also mentions OSP13, but OSP13
> doesn't support bluestore.

This should say "OSP13 doesn't support bluestore WITH RHCS3.1". It was true in July 2018 because RHCS3.2 was not yet released.

Note You need to log in before you can comment on or make changes to this bug.