Bug 1125273 - bcache device registration may temporarily fail (device busy)
Summary: bcache device registration may temporarily fail (device busy)
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: bcache-tools
Version: 20
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Rolf Fokkens
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-31 13:19 UTC by Ian Pilcher
Modified: 2014-10-04 06:44 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-10-04 06:44:23 UTC


Attachments (Terms of Use)
Make bcache_register distinguish between fatal and (possibly) non-fatal errors (1.03 KB, patch)
2014-07-31 13:19 UTC, Ian Pilcher
no flags Details | Diff
C program to register a bcache device, trying again when appropriate (1.63 KB, text/plain)
2014-07-31 13:20 UTC, Ian Pilcher
no flags Details
Update bcache-register to use C helper (65 bytes, text/plain)
2014-07-31 13:22 UTC, Ian Pilcher
no flags Details

Description Ian Pilcher 2014-07-31 13:19:22 UTC
Created attachment 922915 [details]
Make bcache_register distinguish between fatal and (possibly) non-fatal errors

This is (probably) my last attempt at finding someone who cares that this is broken ...

TL;DR version is that registration of a bcache device (backing or cache device) will fail if the in-kernel registration function (register_bcache) cannot get exclusive access to the device.  This can happen, for example, when the device is a partition on an MD RAID device (e.g. /dev/md126p5); udev starts a number of helper processes when a new RAID partition appears, and this can prevent register_bcache from getting exclusive access to the device.

Currently, the result is that register_bcache returns -EINVAL, the echo command in /usr/lib/udev/bcache-register exits with a status of 1, the device is not registered, and the bcache device does not get created.  This is particularly troublesome if one's root filesystem happens to reside on said bcache device.

Fixing this will require changes in 2 places:

* The kernel function (register_bcache) needs to return a different error code
  when the device is (possibly temporarily) busy.  (It currently returns -EINVAL
  for all errors.)  -EBUSY seems like the obvious candidate.

* Userspace needs to use this information and try again in the "device busy"
  case.

I am going to attach several files to this bug that show a *possible* implementation of the approach outlined above.  I want to be very, very clear that I have no intention of playing the kernel patch bikeshed game.  My hope is that there is sufficient interest in making this work in Fedora that someone with sufficient "street cred" can get a response out of the folks on the linux-bcache mailing list.

Comment 1 Ian Pilcher 2014-07-31 13:20:50 UTC
Created attachment 922917 [details]
C program to register a bcache device, trying again when appropriate

Comment 2 Ian Pilcher 2014-07-31 13:22:00 UTC
Created attachment 922929 [details]
Update bcache-register to use C helper

Comment 3 Rolf Fokkens 2014-08-13 19:52:36 UTC
Your suggested changes affect both the kernel and the userspace part. Bcache tools only covers the userspace part. So I agree with you it should be addressed upstream first.

I found your upstream post:
http://thread.gmane.org/gmane.linux.kernel.bcache.devel/2594

I must have received your post in the email, but it ascaped my attention. I'll join the discussion over there first, because reproducing the problem should be the first step.

Comment 4 Rolf Fokkens 2014-09-06 08:15:11 UTC
https://github.com/g2p/bcache-tools/issues/14

Comment 5 Rolf Fokkens 2014-10-04 06:44:23 UTC
Upstream needs to respond, but doesn't.


Note You need to log in before you can comment on or make changes to this bug.