Description of problem: An overcloud deployment ran and ended without an error when Ceph's OSDs are not running in the Ceph-storage nodes. The error messages in the Ceph-storage node are: Sep 11 08:53:10 ceph-0 dockerd-current[15732]: command: Running command: /usr/sbin/blkid -o udev -p /dev/vdb1 Sep 11 08:53:10 ceph-0 ceph-osd-run.sh[58017]: command: Running command: /sbin/blkid -p -s TYPE -o value -- /dev/vdb1 Sep 11 08:53:10 ceph-0 dockerd-current[15732]: command: Running command: /sbin/blkid -p -s TYPE -o value -- /dev/vdb1 Sep 11 08:53:10 ceph-0 ceph-osd-run.sh[58017]: command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs Sep 11 08:53:10 ceph-0 dockerd-current[15732]: command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs Sep 11 08:53:10 ceph-0 ceph-osd-run.sh[58017]: mount: Mounting /dev/vdb1 on /var/lib/ceph/tmp/mnt.ZlKvTe with options noatime,largeio,inode64,swalloc Sep 11 08:53:10 ceph-0 dockerd-current[15732]: mount: Mounting /dev/vdb1 on /var/lib/ceph/tmp/mnt.ZlKvTe with options noatime,largeio,inode64,swalloc Sep 11 08:53:10 ceph-0 dockerd-current[15732]: command_check_call: Running command: /usr/bin/mount -t xfs -o noatime,largeio,inode64,swalloc -- /dev/vdb1 /var/lib/ceph/tmp/mnt.ZlKvTe Sep 11 08:53:10 ceph-0 ceph-osd-run.sh[58017]: command_check_call: Running command: /usr/bin/mount -t xfs -o noatime,largeio,inode64,swalloc -- /dev/vdb1 /var/lib/ceph/tmp/mnt.ZlKvTe Sep 11 08:53:10 ceph-0 kernel: XFS (vdb1): Mounting V5 Filesystem Sep 11 08:53:10 ceph-0 kernel: XFS (vdb1): Ending clean mount Sep 11 08:53:10 ceph-0 ceph-osd-run.sh[58017]: command: Running command: /usr/sbin/restorecon /var/lib/ceph/tmp/mnt.ZlKvTe Sep 11 08:53:10 ceph-0 dockerd-current[15732]: command: Running command: /usr/sbin/restorecon /var/lib/ceph/tmp/mnt.ZlKvTe Sep 11 08:53:10 ceph-0 dockerd-current[15732]: activate: Cluster uuid is 2a57f5e2-94d2-11e7-897c-52540015ce25 Sep 11 08:53:10 ceph-0 ceph-osd-run.sh[58017]: activate: Cluster uuid is 2a57f5e2-94d2-11e7-897c-52540015ce25 Sep 11 08:53:10 ceph-0 dockerd-current[15732]: command: Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid Sep 11 08:53:10 ceph-0 ceph-osd-run.sh[58017]: command: Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid Sep 11 08:53:10 ceph-0 ceph-osd-run.sh[58017]: activate: Cluster name is ceph Sep 11 08:53:10 ceph-0 dockerd-current[15732]: activate: Cluster name is ceph Sep 11 08:53:10 ceph-0 dockerd-current[15732]: activate: OSD uuid is 9aba92c8-102f-4c62-8dfa-9bc393b35f50 Sep 11 08:53:10 ceph-0 ceph-osd-run.sh[58017]: activate: OSD uuid is 9aba92c8-102f-4c62-8dfa-9bc393b35f50 Sep 11 08:53:10 ceph-0 ceph-osd-run.sh[58017]: activate: OSD id is 0 Sep 11 08:53:10 ceph-0 dockerd-current[15732]: activate: OSD id is 0 Sep 11 08:53:10 ceph-0 dockerd-current[15732]: command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup init Sep 11 08:53:10 ceph-0 ceph-osd-run.sh[58017]: command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup init Sep 11 08:53:10 ceph-0 ceph-osd-run.sh[58017]: command: Running command: /usr/bin/ceph-detect-init --default sysvinit Sep 11 08:53:10 ceph-0 dockerd-current[15732]: command: Running command: /usr/bin/ceph-detect-init --default sysvinit Sep 11 08:53:10 ceph-0 ceph-osd-run.sh[58017]: activate: Marking with init system systemd Sep 11 08:53:10 ceph-0 dockerd-current[15732]: activate: Marking with init system systemd Sep 11 08:53:10 ceph-0 dockerd-current[15732]: command: Running command: /usr/sbin/restorecon -R /var/lib/ceph/tmp/mnt.ZlKvTe/systemd Sep 11 08:53:10 ceph-0 ceph-osd-run.sh[58017]: command: Running command: /usr/sbin/restorecon -R /var/lib/ceph/tmp/mnt.ZlKvTe/systemd Sep 11 08:53:10 ceph-0 ceph-osd-run.sh[58017]: command: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.ZlKvTe/systemd Sep 11 08:53:10 ceph-0 dockerd-current[15732]: command: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.ZlKvTe/systemd Sep 11 08:53:10 ceph-0 ceph-osd-run.sh[58017]: activate: ceph osd.0 data dir is ready at /var/lib/ceph/tmp/mnt.ZlKvTe Sep 11 08:53:10 ceph-0 ceph-osd-run.sh[58017]: move_mount: Moving mount to final location... Sep 11 08:53:10 ceph-0 ceph-osd-run.sh[58017]: command_check_call: Running command: /bin/mount -o noatime,largeio,inode64,swalloc -- /dev/vdb1 /var/lib/ceph/osd/ceph-0 Sep 11 08:53:10 ceph-0 dockerd-current[15732]: activate: ceph osd.0 data dir is ready at /var/lib/ceph/tmp/mnt.ZlKvTe Sep 11 08:53:10 ceph-0 dockerd-current[15732]: move_mount: Moving mount to final location... Sep 11 08:53:10 ceph-0 dockerd-current[15732]: command_check_call: Running command: /bin/mount -o noatime,largeio,inode64,swalloc -- /dev/vdb1 /var/lib/ceph/osd/ceph-0 Sep 11 08:53:10 ceph-0 ceph-osd-run.sh[58017]: command_check_call: Running command: /bin/umount -l -- /var/lib/ceph/tmp/mnt.ZlKvTe Sep 11 08:53:10 ceph-0 dockerd-current[15732]: command_check_call: Running command: /bin/umount -l -- /var/lib/ceph/tmp/mnt.ZlKvTe Sep 11 08:53:10 ceph-0 ceph-osd-run.sh[58017]: 2017-09-11 08:53:10.671539 7f0563b82700 0 librados: osd.0 authentication error (1) Operation not permitted Sep 11 08:53:10 ceph-0 dockerd-current[15732]: 2017-09-11 08:53:10.671539 7f0563b82700 0 librados: osd.0 authentication error (1) Operation not permitted Sep 11 08:53:10 ceph-0 ceph-osd-run.sh[58017]: Error connecting to cluster: PermissionError Sep 11 08:53:10 ceph-0 dockerd-current[15732]: Error connecting to cluster: PermissionError Version-Release number of selected component (if applicable): ceph-ansible-3.0.0-0.1.rc4.el7cp.noarch openstack-tripleo-image-elements-7.0.0-0.20170830150703.526772d.el7ost.noarch puppet-tripleo-7.3.1-0.20170831100515.0457aa1.el7ost.noarch openstack-tripleo-heat-templates-7.0.0-0.20170901051303.0rc1.el7ost.noarch python-tripleoclient-7.2.1-0.20170831202445.0bd00bb.el7ost.noarch openstack-tripleo-common-containers-7.5.1-0.20170831015949.2517e1e.el7ost.1.noarch openstack-tripleo-puppet-elements-7.0.0-0.20170831100659.2094778.el7ost.noarch openstack-tripleo-common-7.5.1-0.20170831015949.2517e1e.el7ost.1.noarch openstack-tripleo-validations-7.3.1-0.20170831052729.67faa39.el7ost.noarch How reproducible: unknown Steps to Reproduce: 1. Run a deployment of an overcloud with containerized Ceph Actual results: The Ceph OSDs are not running in the cluster Expected results: The ceph OSDs are running inside containers Additional info:
Yogev, what version of ceph-ansible are you using on the undercloud? Are you sure the disks are cleared from previous deployments and there are no pre-existing ceph partitions on them? You can zap the disks manually with sgdisk -Z or with ironic. If they are clean, can you see if adding 'osd_objectore: filestore' and 'osd_scenario: collocated' to CephAnsibleDisksConfig helps?
(In reply to Yogev Rabl from comment #4) > (In reply to Giulio Fidente from comment #2) > > Yogev, what version of ceph-ansible are you using on the undercloud? > > > > Are you sure the disks are cleared from previous deployments and there are > > no pre-existing ceph partitions on them? You can zap the disks manually with > > sgdisk -Z or with ironic. > > > > If they are clean, can you see if adding 'osd_objectore: filestore' and > > 'osd_scenario: collocated' to CephAnsibleDisksConfig helps? > > Giulio, I think that the disks were not clean. I removed the blocker flag from this bug. It is still a bug but with a much lower priority
Yogev, Would you please share your heat templates? Can you confirm that you intended to deploy with the disk /dev/vdb and with no separate journal disk, i.e. with a colocated journal? FWIW: I reproduced an error like this in a quickstart env using: - site-docker-tripleo.yml.sample (from today) - docker/services/ceph-ansible/ceph-osd.yaml [1] has params from comment #2 (our docs need an update for this) However, I was using a separate journal disk and not collocating so the default in docker/services/ceph-ansible/ceph-osd.yaml was wrong. I then updated one of my env files to override the default with the following: + CephAnsibleExtraConfig: + osd_objectstore: filestore + osd_scenario: non-collocated I then re-ran my openstack overcloud deploy command to update. The playbook re-ran and I got a HEALTH_OK ceph cluster. I agree that the deploy should have failed in the first place however if the OSDs were not running. John [1] (undercloud) [stack@undercloud templates]$ grep -B 4 -A 4 osd_ docker/services/ceph-ansible/ceph-osd.yaml devices: - /dev/vdb journal_size: 512 journal_collocation: true osd_scenario: collocated resources: CephBase: type: ./ceph-base.yaml -- - tripleo.ceph_osd.firewall_rules: '111 ceph_osd': dport: - '6800-7300' - ceph_osd_ansible_vars: map_merge: - {get_attr: [CephBase, role_data, config_settings, ceph_common_ansible_vars]} - osd_objectstore: filestore - {get_param: CephAnsibleDisksConfig} (undercloud) [stack@undercloud templates]$ [2] [root@overcloud-cephstorage-0 ~]# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 9a0f4d156e8e tripleoupstream/centos-binary-cron:latest "kolla_start" 42 hours ago Up 42 hours logrotate_crond fd2811f3020d docker.io/ceph/daemon:tag-build-master-jewel-centos-7 "/usr/bin/ceph --vers" 43 hours ago Exited (0) 43 hours ago stoic_archimedes [root@overcloud-cephstorage-0 ~]#
I am adjusting upstream docs to cover the params for different journal scenarios: https://review.openstack.org/#/c/502557 Also, you need the following: https://review.openstack.org/#/c/501983/ I am re-testing both scenarios (shared journal and separate journal).
Update on re-testing: 1. shared journal disk scenario using tht from doc: new deploy succeeds and ceph HEALTH_OK 2. separate journal disk scenario using tht from doc: new deploy failed on the following ceph-ansible task: https://github.com/ceph/ceph-ansible/blob/master/roles/ceph-osd/tasks/check_devices.yml#L50-L60 2017-09-11 22:00:34,924 p=24485 u=mistral | TASK [ceph-osd : create gpt disk label of the journal device(s)] *************** 2017-09-11 22:00:35,215 p=24485 u=mistral | failed: [192.168.24.11] (item=[{'_ansible_parsed': True, 'stderr_lines': [], '_ansible_item_result': True, u'end': u'2017-09-11 22:00:33.913658', '_ansible_no_log': False, u'stdout': u'', u'cmd': u'parted --script /dev/vdb print > /dev/null 2>&1', u'rc': 1, 'item': [{'_ansible_parsed': True, 'stderr_lines': [], '_ansible_item_result': True, u'end': u'2017-09-11 22:00:33.333649', '_ansible_no_log': False, u'stdout': u'', u'cmd': u"readlink -f /dev/vdb | egrep '/dev/([hsv]d[a-z]{1,2}|cciss/c[0-9]d[0-9]p|nvme[0-9]n[0-9]p)[0-9]{1,2}|fio[a-z]{1,2}[0-9]{1,2}$'", u'rc': 1, 'item': u'/dev/vdb', u'delta': u'0:00:00.004784', u'stderr': u'', u'change None, u'_uses_shell': True, u'_raw_params': u"readlink -f /dev/vdb | egrep '/dev/([hsv]d[a-z]{1,2}|cciss/c[0-9]d[0-9]p|nvme[0-9]n[0-9]p)[0-9]{1,2}|fio[a-z]{1,2}[0-9]{1,2}$'", u'removes': None, u'creates': None, u'chdir': None}}, 'stdout_lines': [], 'failed_when_result': False, u'start': u'2017-09-11 22:00:33.328865', 'failed': False}, u'/dev/vdb'], u'delta': u'0:00:00.008560', u'stderr': u'', u'changed': False, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': True, u'_raw_params': u'parted --script /dev/vdb print > /dev/null 2>&1', u'removes': None, u'creates': None, u'chdir': None}}, 'stdout_lines': [], 'failed_when_result': False, u'start': u'2017-09-11 22:00:33.905098', 'failed': False}, None]) => {"changed": false, "cmd": ["parted", "--script", "mklabel", "gpt"], "delta": "0:00:00.003180", "end": "2017-09-11 22:00:35.951058", "failed": true, "item": [{"_ansible_item_result": true, "_ansible_no_log": false, "_ansible_parsed": true, "changed": false, "cmd": "parted --script /dev/vdb print > /dev/null 2>&1", "delta": "0:00:00.008560", "end": "2017-09-11 22:00:33.913658", "failed": false, "failed_when_result": false, "invocation": {"module_args": {"_raw_params": "parted --script /dev/vdb print > /dev/null 2>&1", "_uses_shell": true, "chdir": null, "creates": null, "executable": null, "removes": null, "warn": true}}, "item": [{"_ansible_item_result": true, "_ansible_no_log": false, "_ansible_parsed": true, "changed": false, "cmd": "readlink -f /dev/vdb | egrep '/dev/([hsv]d[a-z]{1,2}|cciss/c[0-9]d[0-9]p|nvme[0-9]n[0-9]p)[0-9]{1,2}|fio[a-z]{1,2}[0-9]{1,2}$'", "delta": "0:00:00.004784", "end": "2017-09-11 22:00:33.333649", "failed": false, "failed_when_result": false, "invocation": {"module_args": {"_raw_params": "readlink -f /dev/vdb | egrep '/dev/([hsv]d[a-z]{1,2}|cciss/c[0-9]d[0-9]p|nvme[0-9]n[0-9]p)[0-9]{1,2}|fio[a-z]{1,2}[0-9]{1,2}$'", "_uses_shell": true, "chdir": null, "creates": null, "executable": null, "removes": null, "warn": true}}, "item": "/dev/vdb", "rc": 1, "start": "2017-09-11 22:00:33.328865", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}, "/dev/vdb"], "rc": 1, "start": "2017-09-11 22:00:33.905098", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}, null], "rc": 1, "start": "2017-09-11 22:00:35.947878", "stderr": "Error: Could not stat device mklabel - No such file or directory.", "stderr_lines": ["Error: Could not stat device mklabel - No such file or directory."], "stdout": "", "stdout_lines": []} ... 2017-09-11 22:00:35,524 p=24485 u=mistral | failed: [192.168.24.18] (item=[{'_ansible_parsed': True, 'stderr_lines': [], '_ansible_item_result': True, u'end': u'2017-09-11 22:00:34.313839', '_ansible_no_log': False, u'stdout': u'', u'cmd': u'parted --script /dev/vdc print > /dev/null 2>&1', u'rc': 1, 'item': [{'_ansible_parsed': True, 'stderr_lines': [], '_ansible_item_result': True, u'end': u'2017-09-11 22:00:33.696647', '_ansible_no_log': False, u'stdout': u'', u'cmd': u"readlink -f /dev/vdc | egrep '/dev/([hsv]d[a-z]{1,2}|cciss/c[0-9]d[0-9]p|nvme[0-9]n[0-9]p)[0-9]{1,2}|fio[a-z]{1,2}[0-9]{1,2}$'", u'rc': 1, 'item': u'/dev/vdc', u'delta': u'0:00:00.005318', u'stderr': u'', u'changed': False, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': True, u'_raw_params': u"readlink -f /dev/vdc | egrep '/dev/([hsv]d[a-z]{1,2}|cciss/c[0-9]d[0-9]p|nvme[0-9]n[0-9]p)[0-9]{1,2}|fio[a-z]{1,2}[0-9]{1,2}$'", u'removes': None, u'creates': None, u'chdir': None}}, 'stdout_lines': [], 'failed_when_result': False, u'start': u'2017-09-11 22:00:33.691329', 'failed': False}, u'/dev/vdc'], u'delta': u'0:00:00.008033', u'stderr': u'', u'changed': False, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': True, u'_raw_params': u'parted --script /dev/vdc print > /dev/null 2>&1', u'removes': None, u'creates': None, u'chdir': None}}, 'stdout_lines': [], 'failed_when_result': False, u'start': u'2017-09-11 22:00:34.305806', 'failed': False}, None]) => {"changed": false, "cmd": ["parted", "--script", "mklabel", "gpt"], "delta": "0:00:00.004037", "end": "2017-09-11 22:00:36.341092", "failed": true, "item": [{"_ansible_item_result": true, "_ansible_no_log": false, "_ansible_parsed": true, "changed": false, "cmd": "parted --script /dev/vdc print > /dev/null 2>&1", "delta": "0:00:00.008033", "end": "2017-09-11 22:00:34.313839", "failed": false, "failed_when_result": false, "invocation": {"module_args": {"_raw_params": "parted --script /dev/vdc print > /dev/null 2>&1", "_uses_shell": true, "chdir": null, "creates": null, "executable": null, "removes": null, "warn": true}}, "item": [{"_ansible_item_result": true, "_ansible_no_log": false, "_ansible_parsed": true, "changed": false, "cmd": "readlink -f /dev/vdc | egrep '/dev/([hsv]d[a-z]{1,2}|cciss/c[0-9]d[0-9]p|nvme[0-9]n[0-9]p)[0-9]{1,2}|fio[a-z]{1,2}[0-9]{1,2}$'", "delta": "0:00:00.005318", "end": "2017-09-11 22:00:33.696647", "failed": false, "failed_when_result": false, "invocation": {"module_args": {"_raw_params": "readlink -f /dev/vdc | egrep '/dev/([hsv]d[a-z]{1,2}|cciss/c[0-9]d[0-9]p|nvme[0-9]n[0-9]p)[0-9]{1,2}|fio[a-z]{1,2}[0-9]{1,2}$'", "_uses_shell": true, "chdir": null, "creates": null, "executable": null, "removes": null, "warn": true}}, "item": "/dev/vdc", "rc": 1, "start": "2017-09-11 22:00:33.691329", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}, "/dev/vdc"], "rc": 1, "start": "2017-09-11 22:00:34.305806", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}, null], "rc": 1, "start": "2017-09-11 22:00:36.337055", "stderr": "Error: Could not stat device mklabel - No such file or directory.", "stderr_lines": ["Error: Could not stat device mklabel - No such file or directory."], "stdout": "", "stdout_lines": []} 2017-09-11 22:00:35,525 p=24485 u=mistral | RUNNING HANDLER [ceph-defaults : copy mon restart script] ********************** 2017-09-11 22:00:35,525 p=24485 u=mistral | RUNNING HANDLER [ceph-defaults : restart ceph mon daemon(s)] ******************* 2017-09-11 22:00:35,526 p=24485 u=mistral | RUNNING HANDLER [ceph-defaults : copy osd restart script] ********************** 2017-09-11 22:00:35,526 p=24485 u=mistral | RUNNING HANDLER [ceph-defaults : restart containerized ceph osds daemon(s)] **** 2017-09-11 22:00:35,526 p=24485 u=mistral | RUNNING HANDLER [ceph-defaults : restart non-containerized ceph osds daemon(s)] *** 2017-09-11 22:00:35,526 p=24485 u=mistral | RUNNING HANDLER [ceph-defaults : copy mds restart script] ********************** 2017-09-11 22:00:35,526 p=24485 u=mistral | RUNNING HANDLER [ceph-defaults : debug socket mds] ***************************** 2017-09-11 22:00:35,527 p=24485 u=mistral | RUNNING HANDLER [ceph-defaults : restart ceph mds daemon(s)] ******************* 2017-09-11 22:00:35,527 p=24485 u=mistral | RUNNING HANDLER [ceph-defaults : copy rgw restart script] ********************** 2017-09-11 22:00:35,527 p=24485 u=mistral | RUNNING HANDLER [ceph-defaults : restart ceph rgw daemon(s)] ******************* 2017-09-11 22:00:35,528 p=24485 u=mistral | PLAY RECAP ********************************************************************* 2017-09-11 22:00:35,528 p=24485 u=mistral | 192.168.24.11 : ok=45 changed=4 unreachable=0 failed=1 2017-09-11 22:00:35,528 p=24485 u=mistral | 192.168.24.17 : ok=43 changed=3 unreachable=0 failed=1 2017-09-11 22:00:35,528 p=24485 u=mistral | 192.168.24.18 : ok=43 changed=3 unreachable=0 failed=1 2017-09-11 22:00:35,528 p=24485 u=mistral | 192.168.24.20 : ok=56 changed=8 unreachable=0 failed=0 2017-09-11 22:00:35,528 p=24485 u=mistral | 192.168.24.9 : ok=1 changed=0 unreachable=0 failed=0
The dedicated journal scenario has been fixed by the following: https://github.com/ceph/ceph-ansible/pull/1882
Environment: openstack-tripleo-heat-templates-7.0.0-0.20170913050524.0rc2.el7ost.noarch ceph-common-10.2.7-32.el7cp.x86_64 ceph-mon-10.2.7-32.el7cp.x86_64 libcephfs1-10.2.7-32.el7cp.x86_64 python-cephfs-10.2.7-32.el7cp.x86_64 ceph-base-10.2.7-32.el7cp.x86_64 ceph-radosgw-10.2.7-32.el7cp.x86_64 puppet-ceph-2.4.1-0.20170911230204.ebea4b7.el7ost.noarch ceph-mds-10.2.7-32.el7cp.x86_64 ceph-selinux-10.2.7-32.el7cp.x86_64 A deployment failed: Looking for errors in mistral, found these: ceph_disk.main.Error: Error: partition 2 for /dev/vdb does not appear to exist\", \"stderr_lines\": [\"command: Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid\", \"command: Running command: /usr/bin/ceph-osd --check-allows-journal -i 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph --setuser ceph --setgroup ceph\", \"command: Running command: /usr/bin/ceph-osd --check-wants-journal -i 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph --setuser ceph --setgroup ceph\", \"command: Running command: /usr/bin/ceph-osd --check-needs-journal -i 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph --setuser ceph --setgroup ceph\", \"get_dm_uuid: get_dm_uuid /dev/vdb uuid path is /sys/dev/block/252:16/dm/uuid\", \"set_type: Will colocate journal with data on /dev/vdb\", \"command: Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=osd_journal_size\", \"get_dm_uuid: get_dm_uuid /dev/vdb uuid path is /sys/dev/block/252:16/dm/uuid\", \"get_dm_uuid: get_dm_uuid /dev/vdb uuid path is /sys/dev/block/252:16/dm/uuid\", \"get_dm_uuid: get_dm_uuid /dev/vdb uuid path is /sys/dev/block/252:16/dm/uuid\", \"command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_type\", \"command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_type\", \"command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs\", \"command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_xfs\", \"command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs\", \"command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs\", \"get_dm_uuid: get_dm_uuid /dev/vdb uuid path is /sys/dev/block/252:16/dm/uuid\", \"get_dm_uuid: get_dm_uuid /dev/vdb uuid path is /sys/dev/block/252:16/dm/uuid\", \"ptype_tobe_for_name: name = journal\", \"get_dm_uuid: get_dm_uuid /dev/vdb uuid path is /sys/dev/block/252:16/dm/uuid\", \"create_partition: Creating journal partition num 2 size 5120 on /dev/vdb\", \"command_check_call: Running command: /usr/sbin/sgdisk --new=2:0:+5120M --change-name=2:ceph journal --partition-guid=2:6644fd1a-3cf4-4768-b01e-20bab7ece783 --typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/vdb\", \"update_partition: Calling partprobe on created device /dev/vdb\", \"command_check_call: Running command: /usr/bin/udevadm settle --timeout=600\", \"command: Running command: /usr/bin/flock -s /dev/vdb /usr/sbin/partprobe /dev/vdb\", \"command_check_call: Running command: /usr/bin/udevadm settle --timeout=600\", \"get_dm_uuid: get_dm_uuid /dev/vdb uuid path is /sys/dev/block/252:16/dm/uuid\", \"Traceback (most recent call last):\", \" File \\\\\"/usr/sbin/ceph-disk\\\\\", line 9, in <module>\", \" load_entry_point(\\'ceph-disk==1.0.0\\', \\'console_scripts\\', \\'ceph-disk\\')()\", \" File \\\\\"/usr/lib/python2.7/site-packages/ceph_disk/main.py\\\\\", line 5326, in run\", \" main(sys.argv[1:])\", \" File \\\\\"/usr/lib/python2.7/site-packages/ceph_disk/main.py\\\\\", line 5277, in main\", \" args.func(args)\", \" File \\\\\"/usr/lib/python2.7/site-packages/ceph_disk/main.py\\\\\", line 1879, in main\", \" Prepare.factory(args).prepare()\", \" File \\\\\"/usr/lib/python2.7/site-packages/ceph_disk/main.py\\\\\", line 1868, in prepare\", \" self.prepare_locked()\", \" File \\\\\"/usr/lib/python2.7/site-packages/ceph_disk/main.py\\\\\", line 1899, in prepare_locked\", \" self.data.prepare(self.journal)\", \" File \\\\\"/usr/lib/python2.7/site-packages/ceph_disk/main.py\\\\\", line 2566, in prepare\", \" self.prepare_device(*to_prepare_list)\", \" File \\\\\"/usr/lib/python2.7/site-packages/ceph_disk/main.py\\\\\", line 2728, in prepare_device\", \" to_prepare.prepare()\", \" File \\\\\"/usr/lib/python2.7/site-packages/ceph_disk/main.py\\\\\", line 2070, in prepare\", \" self.prepare_device()\", \" File \\\\\"/usr/lib/python2.7/site-packages/ceph_disk/main.py\\\\\", line 2164, in prepare_device\", \" partition = device.get_partition(num)\", \" File \\\\\"/usr/lib/python2.7/site-packages/ceph_disk/main.py\\\\\", line 1644, in get_partition\", \" dev = get_partition_dev(self.path, num)\", \" File \\\\\"/usr/lib/python2.7/site-packages/ceph_disk/main.py\\\\\", line 670, in get_partition_dev\", \" (pnum, dev))\", \"ceph_disk.main.Error: Error: partition 2 for /dev/vdb does not appear to exist\"], \"stdout\": \"2017-09-19 20:15:02 /entrypoint.sh: static: does not generate config\\\ \'Error: /dev/vdb: unrecognised disk label\\', \'Error ENOENT: failed to find client.manila in keyring\\', \'Error ENOENT: failed to find client.openstack in keyring\\'], \'Error ENOENT: failed to find client.radosgw in keyring\\', \'Error response from daemon: No such container: ceph-osd-overcloud-cephstorage-1-devvdb\\'], \'Error response from daemon: No such container: ceph-osd-overcloud-cephstorage-2-devvdb\\'],
(In reply to Alexander Chuzhoy from comment #10) ... > 670, in get_partition_dev\", \" (pnum, dev))\", \"ceph_disk.main.Error: > Error: partition 2 for /dev/vdb does not appear to exist\"], \"stdout\": > \"2017-09-19 20:15:02 /entrypoint.sh: static: does not generate config\\\ This might be https://bugzilla.redhat.com/show_bug.cgi?id=1491780 http://tracker.ceph.com/issues/19428 "At the first iteration, the sdb2 is missing while at the second one (1 sec after) the sdb2 showed up." > \'Error: /dev/vdb: unrecognised disk label\\', > \'Error ENOENT: failed to find client.manila in keyring\\', > \'Error ENOENT: failed to find client.openstack in keyring\\'], > \'Error ENOENT: failed to find client.radosgw in keyring\\', > \'Error response from daemon: No such container: > ceph-osd-overcloud-cephstorage-1-devvdb\\'], > \'Error response from daemon: No such container: > ceph-osd-overcloud-cephstorage-2-devvdb\\'], However the "/dev/vdb: unrecognised disk label" seems to be the other issue under this bug. I did further testing using ceph-ansible directly from the master branch on the same system (sealusa3). I didn't hit the unrecognised disk label this time, but I hit the race condition again... Ansible command: http://ix.io/A2 Ansible run: http://ix.io/A2R failed on osd node .12 on task: 2017-09-19 23:09:40,888 p=25857 u=mistral | TASK [ceph-osd : prepare ceph containerized osd disk collocated] *************** On: Error: partition 2 for /dev/vdb does not appear to exist
(In reply to John Fulton from comment #12) > I did further testing using ceph-ansible directly from the master branch on > the same system (sealusa3). I didn't hit the unrecognised disk label this > time, but I hit the race condition again... > > Ansible command: http://ix.io/A2 > Ansible run: http://ix.io/A2R Correction, that command is at: Ansible command: http://ix.io/A2Q
I see two different issues in this PR: Error: Could not stat device mklabel - No such file or directory." Which makes me believe the device passed doesn't exist. Then I see: ceph_disk.main.Error: Error: partition 2 for /dev/vdb does not appear to exist Which appears to be the race condition. So either it's a NOTABUG or DUP. Please update the status, thanks.
The root cause of this seems to be a ceph-disk race condition as described in the following bug so I am closing this bug as a duplicate of it. https://bugzilla.redhat.com/show_bug.cgi?id=1494543 *** This bug has been marked as a duplicate of bug 1494543 ***
*** Bug 1507823 has been marked as a duplicate of this bug. ***