Hide Forgot
Created attachment 835884 [details] Reproducer I cannot create MD RAID with new blivet. 'b.devicetree.processActions(dryRun=False)' creates the RAID, but subsequent 'b.reset()' destroys it. I can see in the log 'ERROR:blivet: failed to scan md array blivet00' and blivet then calls 'mdadm -S'. See attachment for complete reproducer and log. Version-Release number of selected component (if applicable): python-blivet-0.31-1.fc21.noarch How reproducible: always The same code was working a couple of weeks ago, I suspect it's caused by recent RAID rewrites.
Created attachment 835885 [details] blivet log
INFO:program: Running... mdadm --examine --export /dev/sdc1 INFO:program: MD_LEVEL=raid6 INFO:program: MD_DEVICES=4 INFO:program: MD_NAME=rawhide:blivet00 INFO:program: MD_ARRAY_SIZE=195.04MB INFO:program: MD_UUID=2bcefded:e757bcb3:05b3cd2c:e0337c3b INFO:program: MD_UPDATE_TIME=1386864551 INFO:program: MD_DEV_UUID=e3923673:09396bb5:66a9dd4f:eafaa2f0 INFO:program: MD_EVENTS=1 DEBUG:program: Return code: 0 INFO:program: Running... mdadm --examine --brief /dev/sdc1 INFO:program: ARRAY /dev/md/blivet00 metadata=1.2 UUID=2bcefded:e757bcb3:05b3cd2c:e0337c3b name=rawhide:blivet00 DEBUG:program: Return code: 0 <snip> ERROR:blivet: failed to create md array: memberDevices cannot be greater than totalDevices The way we find things is such that we instantiate an MDRaidArrayDevice with just one member device, then we add the other members as we find them.
This behaviour in MDRaidArrayDevice constructor was introduced in commit 1256be6ee0795ade15edcd75bfcdf5eee6cda9d5, in all probability, but that doesn't mean the behaviour is wrong. When would it ever be right for memberDevices to be larger than totalDevices? I think what we should really do is make it right in constructor and make a few of those semi-redundant parameters optional. The problem is that this constructor gets called in two completely distinct contexts...when setting up a plan for an md raid device and when finding out the current state of the device and expectations on parameters need to be different in those two different contexts. There's a parameter that distinguishes between those two conditions, it's the exists parameter. So, some checks in constructor should vary whether exists is True or not.
On today's rawhide (python-blivet-0.40-1.fc21.noarch) the reproducer script works well and creates MD RAID, So for me the bug is fixed. I never pass memberDevices higher than totalDevices (see the reproducer), blivet must have messed it up on its own and that has been fixed.
I'm not convinced that this good situation will last. In handleUdevMDMemberFormat() the MDRaidArrayDevice constructor is invoked. But at that point there is insufficient information available to the function to determine totalDevices, just memberDevices. I think that handleUdevMDMemberFormat() should be beefed up so that totalDevices information is available. I believe that this can be done by using mdadm --detail and think that this approach should be explored further. I'm quite clear that mdadm --detail will yield that information...I've already written the code and tests for that. It's just using it properly in handleUdevMDMemberFormat() that isn't quite so obvious. The MDRaidArrayDevice constructor should be beefed up as well.
Ok. I'm going to close this but hold on to the code in case this comes up again. It looks to me like what must be happening is that the array is being found by getDeviceByUuid() so the code that had the error last time is never reached. If you could grab the logs for the working code next time something like this happens that would be much appreciated. Thanks! - mulhern