Description of problem: Task for adding OSD ignore any error from ceph-disk and finish as success, even when the OSD was not properly configured. Version-Release number of selected component (if applicable): ceph-installer-1.0.6-1.el7scon.noarch ceph-ansible-1.0.5-5.el7scon.noarch Actual results: # ceph-installer task e6be2ac4-e1d3-4220-b2bd-c394eca8230a --> endpoint: /api/osd/configure --> succeeded: True --> stdout: PLAY [mons] ******************************************************************* <<truncated>> TASK: [ceph-osd | prepare osd disk(s)] **************************************** failed: [dhcp-126-125.lab.eng.brq.redhat.com] => (item=[{'changed': False, 'end': '2016-05-10 11:21:46.074846', 'failed': False, 'stdout': u'', 'cmd': "parted --script /dev/vdf print | egrep -sq '^ 1.*ceph'", 'rc': 1, 'start': '2016-05-10 11:21:46.058675', 'item': u'/dev/vdf', 'warnings': [], 'delta': '0:00:00.016171', 'invocation': {'module_name': u'shell', 'module_complex_args': {}, 'module_args': u"parted --script /dev/vdf print | egrep -sq '^ 1.*ceph'"}, 'stdout_lines': [], 'failed_when_result': False, 'stderr': u''}, {'changed': False, 'end': '2016-05-10 11:21:33.998193', 'failed': False, 'stdout': u'', 'cmd': "echo '/dev/vdf' | egrep '/dev/([hsv]d[a-z]{1,2}|cciss/c[0-9]d[0-9]p|nvme[0-9]n[0-9]p)[0-9]{1,2}$'", 'rc': 1, 'start': '2016-05-10 11:21:33.988921', 'item': u'/dev/vdf', 'warnings': [], 'delta': '0:00:00.009272', 'invocation': {'module_name': u'shell', 'module_complex_args': {}, 'module_args': u"echo '/dev/vdf' | egrep '/dev/([hsv]d[a-z]{1,2}|cciss/c[0-9]d[0-9]p|nvme[0-9]n[0-9]p)[0-9]{1,2}$'"}, 'stdout_lines': [], 'failed_when_result': False, 'stderr': u''}, u'/dev/vdf', u'/dev/vde']) => {"changed": false, "cmd": ["ceph-disk", "prepare", "--cluster", "ceph", "/dev/vdf", "/dev/vde"], "delta": "0:05:03.077442", "end": "2016-05-10 11:26:51.403133", "item": [{"changed": false, "cmd": "parted --script /dev/vdf print | egrep -sq '^ 1.*ceph'", "delta": "0:00:00.016171", "end": "2016-05-10 11:21:46.074846", "failed": false, "failed_when_result": false, "invocation": {"module_args": "parted --script /dev/vdf print | egrep -sq '^ 1.*ceph'", "module_complex_args": {}, "module_name": "shell"}, "item": "/dev/vdf", "rc": 1, "start": "2016-05-10 11:21:46.058675", "stderr": "", "stdout": "", "stdout_lines": [], "warnings": []}, {"changed": false, "cmd": "echo '/dev/vdf' | egrep '/dev/([hsv]d[a-z]{1,2}|cciss/c[0-9]d[0-9]p|nvme[0-9]n[0-9]p)[0-9]{1,2}$'", "delta": "0:00:00.009272", "end": "2016-05-10 11:21:33.998193", "failed": false, "failed_when_result": false, "invocation": {"module_args": "echo '/dev/vdf' | egrep '/dev/([hsv]d[a-z]{1,2}|cciss/c[0-9]d[0-9]p|nvme[0-9]n[0-9]p)[0-9]{1,2}$'", "module_complex_args": {}, "module_name": "shell"}, "item": "/dev/vdf", "rc": 1, "start": "2016-05-10 11:21:33.988921", "stderr": "", "stdout": "", "stdout_lines": [], "warnings": []}, "/dev/vdf", "/dev/vde"], "rc": 1, "start": "2016-05-10 11:21:48.325691", "warnings": []} stderr: prepare_device: OSD will not be hot-swappable if journal is not the same device as the osd data ceph-disk: Error: partprobe /dev/vde failed : Error: Error informing the kernel about modifications to partition /dev/vde1 -- Device or resource busy. This means Linux won't know about any changes you made to /dev/vde1 until you reboot -- so you shouldn't mount it or use it in any way before rebooting. Error: Failed to add partition 1 (Device or resource busy) stdout: The operation has completed successfully. ...ignoring <<truncated>> --> started: 2016-05-10 11:20:33.678338 --> exit_code: 0 --> ended: 2016-05-10 11:26:58.239219 --> command: /bin/ansible-playbook -v -u ceph-installer /usr/share/ceph-ansible/osd-configure.yml -i /tmp/e6be2ac4-e1d3-4220-b2bd-c394eca8230a_wwLnsE --extra-vars {"raw_journal_devices": ["/dev/vde"], "devices": ["/dev/vdf"], "cluster": "ceph", "ceph_stable_rh_storage_cdn_install": true, "public_network": "10.34.112.0/20", "fetch_directory": "/var/lib/ceph-installer/fetch", "cluster_network": "10.34.112.0/20", "journal_size": 4096, "raw_multi_journal": true, "fsid": "487bab3a-218e-495e-9171-f7de8fde8be1", "ceph_stable_rh_storage": true} --skip-tags package-install --> stderr: --> identifier: e6be2ac4-e1d3-4220-b2bd-c394eca8230a Expected results: ceph-ansible should stop with a failure
This is most probably some timing issue in ceph-ansible, because I've tried to periodically run `partprobe /dev/DEVICE` for all devices on the host and then launch the cluster creation. The result is this: 1) At the beginning, the `partprobe` command pass for all devices. 2) During the OSD creation, `partprobe` command sometimes return the error "Device or resource busy..." (it usually mention the first journal partition /dev/vdb1). 3) When the OSD creation on particular node finish, `partprobe` command pass without any issue.
I am not sure why you would call partprobe on all devices before creating OSDs, this is something that I wouldn't recommend. partprobe should not be called manually at all, and it should just be used by ceph-disk as it might conflict with other async behavior with udev rules.
I was able to reproduce it with the separated command `ceph-disk prepare --cluster ceph ${osd_dev} ${journal_dev}` so it is bug in ceph-disk, can I just rename/reassign this one or is it better to create new one?
Created new bug for ceph-disk - Bug 1336756. Closing this bug as a duplicate of Bug 1335938. *** This bug has been marked as a duplicate of bug 1335938 ***