1336756 – ceph-disk prepare: Error: partprobe /dev/vdb failed : Error: Error informing the kernel about modifications to partition /dev/vdb1 -- Device or resource busy.

Bug 1336756 - ceph-disk prepare: Error: partprobe /dev/vdb failed : Error: Error informing the kernel about modifications to partition /dev/vdb1 -- Device or resource busy.

Summary: ceph-disk prepare: Error: partprobe /dev/vdb failed : Error: Error informing ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Disk
Sub Component:
Version:	2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	2.0
Assignee:	Ken Dreyer (Red Hat)
QA Contact:	Daniel Horák
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1339705
TreeView+	depends on / blocked

Reported:	2016-05-17 12:07 UTC by Daniel Horák
Modified:	2019-12-16 05:48 UTC (History)
CC List:	7 users (show)
Fixed In Version:	parted-3.1-26.el7 ceph-10.2.1-11.el7cp
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1339705 (view as bug list)
Environment:
Last Closed:	2016-08-23 19:38:42 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	15176	None	None	None	2016-05-23 08:47:05 UTC
Red Hat Bugzilla	1339675	urgent	CLOSED	RFE: race condition in parted causes ceph installation to fail	2021-02-22 00:41:40 UTC
Red Hat Product Errata	RHBA-2016:1755	normal	SHIPPED_LIVE	Red Hat Ceph Storage 2.0 bug fix and enhancement update	2016-08-23 23:23:52 UTC

Internal Links: 1339675

Description Daniel Horák 2016-05-17 12:07:23 UTC

Description of problem:
Command `ceph-disk prepare ...` sometimes fails to prepare disk for a Ceph OSD with following error:
ceph-disk: Error: partprobe /dev/vdb failed : Error: Error informing the kernel about modifications to partition /dev/vdb1 -- Device or resource busy. This means Linux won't know about any changes you made to /dev/vdb1 until you reboot -- so you shouldn't mount it or use it in any way before rebooting.
Error: Failed to add partition 1 (Device or resource busy)

Version-Release number of selected component (if applicable):
ceph-base-10.2.0-1.el7cp.x86_64
ceph-common-10.2.0-1.el7cp.x86_64
ceph-osd-10.2.0-1.el7cp.x86_64
ceph-selinux-10.2.0-1.el7cp.x86_64
libcephfs1-10.2.0-1.el7cp.x86_64
python-cephfs-10.2.0-1.el7cp.x86_64

How reproducible:
40% on our VMs

Steps to Reproduce:
1. Create and install node for Ceph OSD with at least two spare disks.

2. Run command for disk preparation for a Ceph OSD.
Device /dev/vdb is targeted for journal, /dev/vdc for OSD data. If you have more spare disks, you might try to repeat this command for each "OSD data" device.

# ceph-disk prepare --cluster ceph /dev/vdc /dev/vdb

3. Before trying again, clean up both the journal and OSD data devices:
# sgdisk --zap-all --clear --mbrtogpt -g -- /dev/vdb
# sgdisk --zap-all --clear --mbrtogpt -g -- /dev/vdc

Actual results:
Sometimes the ceph-disk command fails with following (or similar) error:
# ceph-disk prepare --cluster ceph /dev/vdc /dev/vdb
prepare_device: OSD will not be hot-swappable if journal is not the same device as the osd data
The operation has completed successfully.
ceph-disk: Error: partprobe /dev/vdb failed : Error: Error informing the kernel about modifications to partition /dev/vdb1 -- Device or resource busy. This means Linux won't know about any changes you made to /dev/vdb1 until you reboot -- so you shouldn't mount it or use it in any way before rebooting.
Error: Failed to add partition 1 (Device or resource busy)
# echo $?
1

Expected results:
Command ceph-disk should properly prepare the disk for Ceph OSD.

Additional info:
I discovered this issue while testing USM.

Comment 3 Loic Dachary 2016-05-18 07:28:03 UTC

Thanks for the steps to reproduce, that's very helpful. I think I have all the information I need now.

Comment 4 Daniel Horák 2016-05-18 07:51:19 UTC

small update/note: I didn't see this issue and wasn't able to reproduce it on VMs in OpenStack environment, but I saw it (and can reproduce it) on kvm VMs running on our (different) physical servers.

Comment 5 Daniel Horák 2016-05-19 07:25:42 UTC

I've tried to update parted from Fedora 22 (to version parted-3.2-16.fc22) - as it was suggested in the upstream issue[1] - and I can confirm that it fixes the issue.

Originally with parted-3.1-23.el7.x86_64 it was failing.

[1] http://tracker.ceph.com/issues/15918

Comment 6 Ken Dreyer (Red Hat) 2016-05-19 19:16:28 UTC

https://github.com/ceph/ceph/pull/9195 is the PR to master; still undergoing review upstream.

Comment 7 Loic Dachary 2016-05-24 07:35:27 UTC

Hi Daniel,

Would you be so kind as to provide me with access to a machine where I can reproduce the problem. I've collected enough expertise now to make use of it. And I can't seem to reproduce it on a CentOS 7.2 (VM or bare metal).

Thanks !

Comment 8 monti lawrence 2016-05-24 15:54:37 UTC

See comment 7. This BZ needs to be resolved ASAP as it is a blocker for Beta 1 (5/31).

Comment 13 Brian Lane 2016-05-25 16:40:03 UTC

I've opened bug 1339705 against parted to track an improvement in partprobe.

Comment 18 Harish NV Rao 2016-06-13 11:19:32 UTC

qa_ack given

Comment 19 Daniel Horák 2016-06-20 12:53:49 UTC

Tested and VERIFIED on VMs accordingly to comment 0 and comment 9 with following packages:

# rpm -qa parted ceph-osd
  ceph-osd-10.2.2-5.el7cp.x86_64
  parted-3.1-26.el7.x86_64

I'll try to retest it also on real HW.

Comment 20 Daniel Horák 2016-06-21 08:14:17 UTC

Tested on physical HW server without any problem.
With following packages:
  parted-3.1-26.el7.x86_64
  ceph-osd-10.2.2-2.el7cp.x86_64

>> VERIFIED

Comment 22 errata-xmlrpc 2016-08-23 19:38:42 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1755.html

Note You need to log in before you can comment on or make changes to this bug.