Hide Forgot
Description of problem: Customer ran yum update and after updating to the latest mdadm package the interface via imsm for the Intel Matrix Storage Manager metadata format via -e imsm breaks. Version-Release number of selected component (if applicable): The version updated to was mdadm-3.4-14.el7.x86_64. Reverting back to the older version of mdadm-3.3.2-7.el7.x86_64 allows the imsm functionality to once again work. How reproducible: Was ahard failure after update, no recovery or access to the mdadm volume Steps to Reproduce: 1.Update to the latest mdadm package mdadm-3.4-14.el7.x86_64 2.Attempt to assemble the raid 3.Fails with mdadm -A /dev/md/imsm0 /dev/nvme[0-7]n1 -e imsm mdadm: /dev/nvme0n1 is not attached to Intel(R) RAID controller. mdadm: No OROM/EFI properties for /dev/nvme0n1 mdadm: no RAID superblock on /dev/nvme0n1 mdadm: /dev/nvme0n1 has no superblock - assembly aborted Actual results: Fails to assemble Expected results: Works as it did prior with the mdadm-3.3.2-7.el7.x86_64 package. Additional info: Reverting back this was resolved # rpm -Uvh --force --nodeps mdadm-3.3.2-7.el7.x86_64.rpm Preparing... ################################# [100%] Updating / installing... 1:mdadm-3.3.2-7.el7 ################################# [ 50%] Cleaning up / removing... 2:mdadm-3.4-14.el7 ################################# [100%] # mdadm --assemble /dev/md/imsm0 /dev/nvme[0-7]n1 mdadm: /dev/nvme0n1 is busy - skipping mdadm: /dev/nvme1n1 is busy - skipping mdadm: /dev/nvme2n1 is busy - skipping mdadm: /dev/nvme3n1 is busy - skipping mdadm: /dev/nvme4n1 is busy - skipping mdadm: /dev/nvme5n1 is busy - skipping mdadm: /dev/nvme6n1 is busy - skipping mdadm: /dev/nvme7n1 is busy - skipping # mdadm --assemble --scan # mount /dev/md0 /mnt/fast/ Checking the diffs between the new and prior we have a lot of imsm changes.
Please provide more details - is this their boot device or a data device? What kernel are they running? Did they update the initramfs after updating? dracut -f ? Jes
Hello Jes, This is a data md device. they are not booting from it. This is a very special case of Intel IMSM compatibility here where the newer mdadm is somehow no longer compatible with the IMSM metadata format. XiaoNi, does your system have the Intel Matrix Storage Manager interface that makes use of the IMSM metadata format. Kernel is 3.10.0-327.22.2.el7.x86_64 Working mdadm is mdadm-3.3.2-7.el7,x86_64 Failing mdadm is mdadm-3.4-14.el7.x86_64
I attached a diff here of the prior (working) and newer failing mdadm changes. As I had mentioned already, there are significant IMSM changes in the newer version. Thanks Laurence
Created attachment 1220428 [details] Patch of diff beween mdadm-3.3.2-7 and mdadm-3.4-14 Man IMSM changes in the newer version. Likley one or many of these are responsible for the new compatibility issue.
Laurence, I don't see your point of the data format having changed - most likely related to the NVME OROM detection. Please don't attach a diff of the entire package - that really serves no purpose. If you want to look at specific changes, look at the git log. Jes
Hello Jes, I am not specifically saying the format has changed. What I am saying is the changes to the new version somehow no longer seemed to allow mdadm to parse the imsm metadata. After update customer could no longer assemble his existing very large data raid device built with the prior version. mdadm -A /dev/md/imsm0 /dev/nvme[0-7]n1 -e imsm mdadm: /dev/nvme0n1 is not attached to Intel(R) RAID controller. mdadm: No OROM/EFI properties for /dev/nvme0n1 mdadm: no RAID superblock on /dev/nvme0n1 ****** Note mdadm: /dev/nvme0n1 has no superblock - assembly aborted ***** Note Downgrade JUST the mdadm package, nothing else changed and the superblock is found and the md device is now assembled # mdadm --assemble /dev/md/imsm0 /dev/nvme[0-7]n1 # mdadm --assemble --scan # mount /dev/md0 /mnt/fast/ Thanks Laurence
Laurence, A blank diff of the entire set of changes since 3.3.2 adds zero value. If you are worried about a specific git commit, post that, but not a diff of several years of work!!! Upstream users use mdadm-3.4 with NVME devices just fine, so it's not the changes from 3.3.2 to 3.4 that simply breaks things. Jes
If their system doesn't have an IMSM compatible BIOS and they created an array using IMSM then they are running an unsupported configuration. I don't know that system, you'll have to provide that information.
I have reached out to the customer asking about the reason they used the -e imsm option when creating the array. Specifically if the R930 they have required that option or not. I will post the update I get from them in the BZ Thanks Laurence
Customers response suggests they were advised to use the -e imsm when creating the array at inception. "Laurence, It's been on the R930 since inception. I don’t know if it was specifically required that we use the -e imsm option but it was in the information that we were given in order to get this created. I'll get more on this to you once I find it. Regards, Jonathan " I am still looking into specifically what the R930 has for IMSM support. Going through the SPECS so far has not called that out. Thanks Laurence
Systems that dont have the imsm support would fail the array creation from what I can see here. [root@dhcp-49-29 ~]# mdadm --create --verbose /dev/md/imsm0 --level=linear --raid-devices=2 /dev/sdb1 /dev/sdc1 -e imsm mdmon: Cannot create this array on device /dev/sdb1 mdadm: /dev/sdb1 is not suitable for this array. mdmon: Cannot create this array on device /dev/sdc1 mdadm: /dev/sdc1 is not suitable for this array. mdadm: create aborted Repeat with no imsm and it works [root@dhcp-49-29 ~]# mdadm --create --verbose /dev/md/imsm0 --level=linear --raid-devices=2 /dev/sdb1 /dev/sdc1 mdadm: Defaulting to version 1.2 metadata [ 1161.836487] md: bind<sdb1> [ 1161.849927] md: bind<sdc1> [ 1161.864506] md: linear personality registered for level -1 [ 1161.890780] md127: detected capacity change from 0 to 8199927848448 mdadm: array /dev/md/imsm0 started. This seems to confirm by lab testing that the DEll R930 does indeed support "-e imsm" or it would have failed when the array was initially created by the customer. Thanks Laurence
I have reached out to the customer and asked that they let us know when I can log in to their system and review and possibly re-test the process they are using. I want to make sure we capture all the specifics here because we now have multiple tests in-house where we are unable to reproduce this assemble error after updating mdadm. Granted we have not been testing with NVME targets here and I am unable to do so in my lab, neither can Xiao. I have two SSD drives, but that does not represent the nuances of NVME. Thanks everybody for all your assistance so far, its much appreciated. Regards Laurence
My apologies. When I read a few of Laurence's questions a few comments back, I jumped in with a few answers I knew to give. I didn't read the original problem description, I was merely answering the questions asked. Now I have read the original problem description, and I don't think we have a bug here, I think this is "working as designed". Specifically, I mentioned in one of my earlier posts that imsm raid arrays had unique restrictions, like it must use SATA ports all on the same chipset, so if you have two SATA controllers, you can not allow your disks in an imsm array to span the SATA ports from one controller to the other. At the time I mentioned that I wasn't sure how the NVMe devices were falling under those chipsets. I had my doubts that they were, but I left it at that. After reading the problem description, I think the issue is that the old mdadm ignored the fact that the NVMe devices were not actually attached to a SATA controller the BIOS recognizes (and therefore can utilize at BIOS time to turn into a RAID device) while the new mdadm has much more stringent checks to make sure that any possible device passed in as an imsm device actually meets the requirements for the BIOS to be able to access it and utilize it. After all, if the BIOS can't access it, then using the imsm format is not required at all, and can be misleading to people who think if it's in imsm format it should be visible to the machine's BIOS. In the event that the BIOS can't recognize it, for whatever reason, the imsm format should be rejected and the user should end up using the native md format instead. I think what's happening here is the user created an imsm array themselves on the NVMe devices, it worked with the old mdadm because it was lenient about the non-conformant device, and the new mdadm is kicking it out as invalid. If you diff the changes between mdadm versions, I'm betting that you'll find some additional checks relating to the parent device and the constituent devices, and if you disable one or more of those, things will work again. You could possibly add a "allow imsm on non-BIOS connected devices" flag to mdadm to get around this problem, but the real answer might be to tell the customer "you shouldn't have done that, can you please back up your data, reformat using mdadm native md raid array format, restore your data, then carry on?"
Hello Doug Yes, that's in line with my original thinking that the newer mdadm is not backwards compatible with weaknesses in the original mdadm allowing this to have happened. This is a multi-terabyte array they created, about 16TB if I recall so they are unable to backup and restore. In addition I think others may have fallen into this same trap in how they originally created the arrays so we should add the compatibility flag in fixing mdadm and make customers aware. Thanks Laurence
Thanks for following-up on this Doug - still catching up from Plumbers travel. If the customer did something this broken by mistake, then we are not going to add any flags to work around this. Doing so means we would have to carry support for this indefinitely into the future, something which we neither want to nor have the capacity to do. There is nothing in the documentation stating that a customer should use -e imsm. They have come up with this on their own. This is the only case of this I have ever seen, and said customer will have to convert their data eventually in either case. Jes
Jes I follow that thought but here is my concern from the CEE view. Clearly one was able to create the array and assemble it with the older mdadm without any warnings around the use of imsm when it was not actually needed. This means that no matter if they did something wrong, the prior version did not enforce them doing the right thing. Most customers would consider this a backwards compatibility issue and would not see it the way you described, even though I am in agreement with you. What is still not fully captured here is how the customers NVME devices connect to the R930. I am still waiting on that. If their NVME devices plug into a backplane that feeds to ports on the BIOS embedded matrix array controller then that would have been correct to create them with imsm metadata. If not, then what Doug described likely happened here. I am pressing the customer for more detailed information. Thanks Laurence
Laurence, It's really simple, if the BIOS sees the drives, they would have created the arrays there and used the IMSM format. The fact that they dug out the imsm switch themselves, despite it not being said anywhere that they should do so, is their problem. It is not listed as a recommended flag for this use case anywhere. Adding a broken backwards compatibility switch to help one customer who did something wrong means carrying the support for this for close to a decade. This will cause issues and cost for devel and QE and raise the risk of other bugs being introduced, to work around the case that one customer did something wrong. The only case you can argue for this would be if they had created the RAID in a system with IMSM support and then moved them to one without. However I am fairly certain Intel's documentation states that isn't supported. Jes
Hello Jes, OK, I will work at getting the customer to recreate their array. I did check lspci and its seems to support that these are non-BIOS connected NVME cards so (as always) Doug was correct. [loberman@dhcp-33-21 sosreport-icronus.01724820-20161109090030]$ grep "Non-Volatile memory controller" lspci 0d:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller 172X [144d:a821] (rev 01) (prog-if 02 [NVM Express]) 0e:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller 172X [144d:a821] (rev 01) (prog-if 02 [NVM Express]) 0f:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller 172X [144d:a821] (rev 01) (prog-if 02 [NVM Express]) 41:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller 172X [144d:a821] (rev 01) (prog-if 02 [NVM Express]) 82:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller 172X [144d:a821] (rev 01) (prog-if 02 [NVM Express]) c1:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller 172X [144d:a821] (rev 01) (prog-if 02 [NVM Express]) c4:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller 172X [144d:a821] (rev 01) (prog-if 02 [NVM Express]) c5:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller 172X [144d:a821] (rev 01) (prog-if 02 [NVM Express]) I will close as notabug. Regards Laurence
It is worth nothing that it *might* be possible to help the customer out in this situation. If a person has eight drives and creats an imsm array on them using the older mdadm, then it might be possible to find a combination of settings that allowed you to create a new md raid array over the top of the existing imsm array such that the new mdadm based array would come up and all their data would be intact. For imsm devices, the superblock is at the end of the device, so you would have to use mdadm array superblock version 1.0, and you would have to use the entire block device just like imsm does. You would then need to match the array type/level, the disk ordering, the chunk size, and finally you would need to pass the --assume-clean flag on the --create command to cause it to just write out the superblock and to not try to do any sort of parity or other data verification on the device. You could then assemble the device readonly, and attempt a read only mount of the device to verify the data. If the data is not intact, you can unmount the device, stop the device, then recreate with different settings. One word of warning here though, if you can't find a set of settings for mdadm that will create the array intact, I don't know for certain if -e imsm will accept --assume-clean. That should be verified before even attempting this with the customer. We can check to see if this is even possible if you can get the following information: Have the customer downgrade to the old mdadm and assemble their array, then collect the following specific information: cat /proc/mdstat cat /etc/mdadm.conf ls -l /dev/nvme* mdadm -D /dev/md0 mdadm -D /dev/md/* mdadm -E /dev/nvme* That should give us enough to determine if we can help them upgrade to an mdadm native array without loosing their data.
Doug, They are already on the older mdadm because they cannot assemble with the newer version. :) I will get the additional information from them. Thanks Laurence
Created attachment 1221299 [details] Data requested by Doug Data Requested by Doug.