Bug 1125273

Summary: bcache device registration may temporarily fail (device busy)
Product: [Fedora] Fedora Reporter: Ian Pilcher <ipilcher>
Component: bcache-toolsAssignee: Rolf Fokkens <rolf>
Status: CLOSED CANTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 20CC: ignatenko, jreznik, rolf
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-10-04 06:44:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Make bcache_register distinguish between fatal and (possibly) non-fatal errors
none
C program to register a bcache device, trying again when appropriate
none
Update bcache-register to use C helper none

Description Ian Pilcher 2014-07-31 13:19:22 UTC
Created attachment 922915 [details]
Make bcache_register distinguish between fatal and (possibly) non-fatal errors

This is (probably) my last attempt at finding someone who cares that this is broken ...

TL;DR version is that registration of a bcache device (backing or cache device) will fail if the in-kernel registration function (register_bcache) cannot get exclusive access to the device.  This can happen, for example, when the device is a partition on an MD RAID device (e.g. /dev/md126p5); udev starts a number of helper processes when a new RAID partition appears, and this can prevent register_bcache from getting exclusive access to the device.

Currently, the result is that register_bcache returns -EINVAL, the echo command in /usr/lib/udev/bcache-register exits with a status of 1, the device is not registered, and the bcache device does not get created.  This is particularly troublesome if one's root filesystem happens to reside on said bcache device.

Fixing this will require changes in 2 places:

* The kernel function (register_bcache) needs to return a different error code
  when the device is (possibly temporarily) busy.  (It currently returns -EINVAL
  for all errors.)  -EBUSY seems like the obvious candidate.

* Userspace needs to use this information and try again in the "device busy"
  case.

I am going to attach several files to this bug that show a *possible* implementation of the approach outlined above.  I want to be very, very clear that I have no intention of playing the kernel patch bikeshed game.  My hope is that there is sufficient interest in making this work in Fedora that someone with sufficient "street cred" can get a response out of the folks on the linux-bcache mailing list.

Comment 1 Ian Pilcher 2014-07-31 13:20:50 UTC
Created attachment 922917 [details]
C program to register a bcache device, trying again when appropriate

Comment 2 Ian Pilcher 2014-07-31 13:22:00 UTC
Created attachment 922929 [details]
Update bcache-register to use C helper

Comment 3 Rolf Fokkens 2014-08-13 19:52:36 UTC
Your suggested changes affect both the kernel and the userspace part. Bcache tools only covers the userspace part. So I agree with you it should be addressed upstream first.

I found your upstream post:
http://thread.gmane.org/gmane.linux.kernel.bcache.devel/2594

I must have received your post in the email, but it ascaped my attention. I'll join the discussion over there first, because reproducing the problem should be the first step.

Comment 4 Rolf Fokkens 2014-09-06 08:15:11 UTC
https://github.com/g2p/bcache-tools/issues/14

Comment 5 Rolf Fokkens 2014-10-04 06:44:23 UTC
Upstream needs to respond, but doesn't.