1335913 – ceph-ansible should stop with a failure when preparing a device

Bug 1335913 - ceph-ansible should stop with a failure when preparing a device

Summary: ceph-ansible should stop with a failure when preparing a device

Keywords:
Status:	CLOSED DUPLICATE of bug 1335938
Alias:	None
Product:	Red Hat Storage Console
Classification:	Red Hat Storage
Component:	ceph-ansible
Sub Component:
Version:	2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	2
Assignee:	Alfredo Deza
QA Contact:	sds-qe-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-05-13 14:18 UTC by Daniel Horák
Modified:	2016-05-17 12:19 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-05-17 12:19:13 UTC
Embargoed:

Attachments	(Terms of Use)

Description Daniel Horák 2016-05-13 14:18:38 UTC

Description of problem:
  Task for adding OSD ignore any error from ceph-disk and finish as success, even when the OSD was not properly configured.

Version-Release number of selected component (if applicable):
  ceph-installer-1.0.6-1.el7scon.noarch
  ceph-ansible-1.0.5-5.el7scon.noarch

Actual results:
# ceph-installer task e6be2ac4-e1d3-4220-b2bd-c394eca8230a
  --> endpoint: /api/osd/configure
  --> succeeded: True
  --> stdout: 
  PLAY [mons] ******************************************************************* 
    <<truncated>>
  TASK: [ceph-osd | prepare osd disk(s)] **************************************** 
  failed: [dhcp-126-125.lab.eng.brq.redhat.com] => (item=[{'changed': False, 'end': '2016-05-10 11:21:46.074846', 'failed': False, 'stdout': u'', 'cmd': "parted --script /dev/vdf print | egrep -sq '^ 1.*ceph'", 'rc': 1, 'start': '2016-05-10 11:21:46.058675', 'item': u'/dev/vdf', 'warnings': [], 'delta': '0:00:00.016171', 'invocation': {'module_name': u'shell', 'module_complex_args': {}, 'module_args': u"parted --script /dev/vdf print | egrep -sq '^ 1.*ceph'"}, 'stdout_lines': [], 'failed_when_result': False, 'stderr': u''}, {'changed': False, 'end': '2016-05-10 11:21:33.998193', 'failed': False, 'stdout': u'', 'cmd': "echo '/dev/vdf' | egrep '/dev/([hsv]d[a-z]{1,2}|cciss/c[0-9]d[0-9]p|nvme[0-9]n[0-9]p)[0-9]{1,2}$'", 'rc': 1, 'start': '2016-05-10 11:21:33.988921', 'item': u'/dev/vdf', 'warnings': [], 'delta': '0:00:00.009272', 'invocation': {'module_name': u'shell', 'module_complex_args': {}, 'module_args': u"echo '/dev/vdf' | egrep '/dev/([hsv]d[a-z]{1,2}|cciss/c[0-9]d[0-9]p|nvme[0-9]n[0-9]p)[0-9]{1,2}$'"}, 'stdout_lines': [], 'failed_when_result': False, 'stderr': u''}, u'/dev/vdf', u'/dev/vde']) => {"changed": false, "cmd": ["ceph-disk", "prepare", "--cluster", "ceph", "/dev/vdf", "/dev/vde"], "delta": "0:05:03.077442", "end": "2016-05-10 11:26:51.403133", "item": [{"changed": false, "cmd": "parted --script /dev/vdf print | egrep -sq '^ 1.*ceph'", "delta": "0:00:00.016171", "end": "2016-05-10 11:21:46.074846", "failed": false, "failed_when_result": false, "invocation": {"module_args": "parted --script /dev/vdf print | egrep -sq '^ 1.*ceph'", "module_complex_args": {}, "module_name": "shell"}, "item": "/dev/vdf", "rc": 1, "start": "2016-05-10 11:21:46.058675", "stderr": "", "stdout": "", "stdout_lines": [], "warnings": []}, {"changed": false, "cmd": "echo '/dev/vdf' | egrep '/dev/([hsv]d[a-z]{1,2}|cciss/c[0-9]d[0-9]p|nvme[0-9]n[0-9]p)[0-9]{1,2}$'", "delta": "0:00:00.009272", "end": "2016-05-10 11:21:33.998193", "failed": false, "failed_when_result": false, "invocation": {"module_args": "echo '/dev/vdf' | egrep '/dev/([hsv]d[a-z]{1,2}|cciss/c[0-9]d[0-9]p|nvme[0-9]n[0-9]p)[0-9]{1,2}$'", "module_complex_args": {}, "module_name": "shell"}, "item": "/dev/vdf", "rc": 1, "start": "2016-05-10 11:21:33.988921", "stderr": "", "stdout": "", "stdout_lines": [], "warnings": []}, "/dev/vdf", "/dev/vde"], "rc": 1, "start": "2016-05-10 11:21:48.325691", "warnings": []}
  stderr: prepare_device: OSD will not be hot-swappable if journal is not the same device as the osd data
  ceph-disk: Error: partprobe /dev/vde failed : Error: Error informing the kernel about modifications to partition /dev/vde1 -- Device or resource busy.  This means Linux won't know about any changes you made to /dev/vde1 until you reboot -- so you shouldn't mount it or use it in any way before rebooting.
  Error: Failed to add partition 1 (Device or resource busy)
  stdout: The operation has completed successfully.
  ...ignoring
    <<truncated>>
  --> started: 2016-05-10 11:20:33.678338
  --> exit_code: 0
  --> ended: 2016-05-10 11:26:58.239219
  --> command: /bin/ansible-playbook -v -u ceph-installer /usr/share/ceph-ansible/osd-configure.yml -i /tmp/e6be2ac4-e1d3-4220-b2bd-c394eca8230a_wwLnsE --extra-vars {"raw_journal_devices": ["/dev/vde"], "devices": ["/dev/vdf"], "cluster": "ceph", "ceph_stable_rh_storage_cdn_install": true, "public_network": "10.34.112.0/20", "fetch_directory": "/var/lib/ceph-installer/fetch", "cluster_network": "10.34.112.0/20", "journal_size": 4096, "raw_multi_journal": true, "fsid": "487bab3a-218e-495e-9171-f7de8fde8be1", "ceph_stable_rh_storage": true} --skip-tags package-install
  --> stderr: 
  --> identifier: e6be2ac4-e1d3-4220-b2bd-c394eca8230a

Expected results:
  ceph-ansible should stop with a failure

Comment 1 Daniel Horák 2016-05-16 09:34:42 UTC

This is most probably some timing issue in ceph-ansible, because I've tried to periodically run `partprobe /dev/DEVICE` for all devices on the host and then launch the cluster creation. The result is this:

1) At the beginning, the `partprobe` command pass for all devices.
2) During the OSD creation, `partprobe` command sometimes return the error "Device or resource busy..." (it usually mention the first journal partition /dev/vdb1).
3) When the OSD creation on particular node finish, `partprobe` command pass without any issue.

Comment 2 Alfredo Deza 2016-05-16 12:58:47 UTC

I am not sure why you would call partprobe on all devices before creating OSDs, this is something that I wouldn't recommend.

partprobe should not be called manually at all, and it should just be used by ceph-disk as it might conflict with other async behavior with udev rules.

Comment 3 Daniel Horák 2016-05-17 09:27:47 UTC

I was able to reproduce it with the separated command 
  `ceph-disk prepare --cluster ceph ${osd_dev} ${journal_dev}`
so it is bug in ceph-disk, can I just rename/reassign  this one or is it better to create new one?

Comment 4 Daniel Horák 2016-05-17 12:19:13 UTC

Created new bug for ceph-disk - Bug 1336756.

Closing this bug as a duplicate of Bug 1335938.

*** This bug has been marked as a duplicate of bug 1335938 ***

Note You need to log in before you can comment on or make changes to this bug.