Created attachment 922915 [details]
Make bcache_register distinguish between fatal and (possibly) non-fatal errors
This is (probably) my last attempt at finding someone who cares that this is broken ...
TL;DR version is that registration of a bcache device (backing or cache device) will fail if the in-kernel registration function (register_bcache) cannot get exclusive access to the device. This can happen, for example, when the device is a partition on an MD RAID device (e.g. /dev/md126p5); udev starts a number of helper processes when a new RAID partition appears, and this can prevent register_bcache from getting exclusive access to the device.
Currently, the result is that register_bcache returns -EINVAL, the echo command in /usr/lib/udev/bcache-register exits with a status of 1, the device is not registered, and the bcache device does not get created. This is particularly troublesome if one's root filesystem happens to reside on said bcache device.
Fixing this will require changes in 2 places:
* The kernel function (register_bcache) needs to return a different error code
when the device is (possibly temporarily) busy. (It currently returns -EINVAL
for all errors.) -EBUSY seems like the obvious candidate.
* Userspace needs to use this information and try again in the "device busy"
I am going to attach several files to this bug that show a *possible* implementation of the approach outlined above. I want to be very, very clear that I have no intention of playing the kernel patch bikeshed game. My hope is that there is sufficient interest in making this work in Fedora that someone with sufficient "street cred" can get a response out of the folks on the linux-bcache mailing list.
Created attachment 922917 [details]
C program to register a bcache device, trying again when appropriate
Created attachment 922929 [details]
Update bcache-register to use C helper
Your suggested changes affect both the kernel and the userspace part. Bcache tools only covers the userspace part. So I agree with you it should be addressed upstream first.
I found your upstream post:
I must have received your post in the email, but it ascaped my attention. I'll join the discussion over there first, because reproducing the problem should be the first step.
Upstream needs to respond, but doesn't.