Bug 1394017 - Latest update to mdadm-3.4-14 breaks the imsm Intel Raid interoperability
Summary: Latest update to mdadm-3.4-14 breaks the imsm Intel Raid interoperability
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: mdadm
Version: 7.2
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Jes Sorensen
QA Contact: Storage QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-10 20:18 UTC by loberman
Modified: 2020-01-17 16:09 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-24 11:34:42 UTC
Target Upstream Version:


Attachments (Terms of Use)
Data requested by Doug (24.19 KB, text/plain)
2016-11-16 18:07 UTC, loberman
no flags Details

Description loberman 2016-11-10 20:18:41 UTC
Description of problem:
Customer ran yum update and after updating to the latest mdadm package the interface via imsm for the Intel Matrix Storage Manager metadata format via -e imsm breaks.

Version-Release number of selected component (if applicable):
The version updated to was mdadm-3.4-14.el7.x86_64.

Reverting back to the older version of mdadm-3.3.2-7.el7.x86_64 allows the imsm functionality to once again work.

How reproducible:
Was ahard failure after update, no recovery or access to the mdadm volume

Steps to Reproduce:
1.Update to the latest mdadm package mdadm-3.4-14.el7.x86_64
2.Attempt to assemble the raid
3.Fails with 
mdadm -A /dev/md/imsm0 /dev/nvme[0-7]n1 -e imsm
mdadm: /dev/nvme0n1 is not attached to Intel(R) RAID controller.
mdadm: No OROM/EFI properties for /dev/nvme0n1
mdadm: no RAID superblock on /dev/nvme0n1
mdadm: /dev/nvme0n1 has no superblock - assembly aborted

Actual results:
Fails to assemble

Expected results:
Works as it did prior with the mdadm-3.3.2-7.el7.x86_64 package.

Additional info:
Reverting back this was resolved
# rpm -Uvh --force --nodeps mdadm-3.3.2-7.el7.x86_64.rpm
Preparing...                          ################################# [100%]
Updating / installing...
   1:mdadm-3.3.2-7.el7                ################################# [ 50%]
Cleaning up / removing...
   2:mdadm-3.4-14.el7                 ################################# [100%]
#  mdadm --assemble /dev/md/imsm0 /dev/nvme[0-7]n1
mdadm: /dev/nvme0n1 is busy - skipping
mdadm: /dev/nvme1n1 is busy - skipping
mdadm: /dev/nvme2n1 is busy - skipping
mdadm: /dev/nvme3n1 is busy - skipping
mdadm: /dev/nvme4n1 is busy - skipping
mdadm: /dev/nvme5n1 is busy - skipping
mdadm: /dev/nvme6n1 is busy - skipping
mdadm: /dev/nvme7n1 is busy - skipping
# mdadm --assemble --scan
# mount /dev/md0 /mnt/fast/

Checking the diffs between the new and prior we have a lot of imsm changes.

Comment 2 Jes Sorensen 2016-11-14 13:30:30 UTC
Please provide more details - is this their boot device or a data device?

What kernel are they running?

Did they update the initramfs after updating? dracut -f ?

Jes

Comment 3 loberman 2016-11-14 13:50:45 UTC
Hello Jes,

This is a data md device. they are not booting from it.
This is a very special case of Intel IMSM compatibility here where the newer mdadm is somehow no longer compatible with the IMSM metadata format.

XiaoNi, does your system have the Intel Matrix Storage Manager interface that makes use of the IMSM metadata format.

Kernel is 3.10.0-327.22.2.el7.x86_64

Working mdadm is mdadm-3.3.2-7.el7,x86_64
Failing mdadm is mdadm-3.4-14.el7.x86_64

Comment 4 loberman 2016-11-14 13:52:45 UTC
I attached a diff here of the prior (working) and newer failing mdadm changes.
As I had mentioned already, there are significant IMSM changes in the newer version.

Thanks
Laurence

Comment 5 loberman 2016-11-14 13:55:04 UTC
Created attachment 1220428 [details]
Patch of diff beween mdadm-3.3.2-7 and mdadm-3.4-14

Man IMSM changes in the newer version. Likley one or many of these are responsible for the new compatibility issue.

Comment 6 Jes Sorensen 2016-11-14 14:02:18 UTC
Laurence,

I don't see your point of the data format having changed - most likely
related to the NVME OROM detection.

Please don't attach a diff of the entire package - that really serves no
purpose. If you want to look at specific changes, look at the git log.

Jes

Comment 7 loberman 2016-11-14 14:07:26 UTC
Hello Jes,

I am not specifically saying the format has changed. What I am saying is the changes to the new version somehow no longer seemed to allow mdadm to parse the imsm metadata.

After update customer could no longer assemble his existing very large data raid device built with the prior version.

mdadm -A /dev/md/imsm0 /dev/nvme[0-7]n1 -e imsm
mdadm: /dev/nvme0n1 is not attached to Intel(R) RAID controller.
mdadm: No OROM/EFI properties for /dev/nvme0n1
mdadm: no RAID superblock on /dev/nvme0n1      ****** Note
mdadm: /dev/nvme0n1 has no superblock - assembly aborted ***** Note

Downgrade JUST the mdadm package, nothing else changed and the superblock is found and the md device is now assembled

#  mdadm --assemble /dev/md/imsm0 /dev/nvme[0-7]n1
# mdadm --assemble --scan
# mount /dev/md0 /mnt/fast/

Thanks
Laurence

Comment 9 Jes Sorensen 2016-11-14 14:23:02 UTC
Laurence,

A blank diff of the entire set of changes since 3.3.2 adds zero value. If
you are worried about a specific git commit, post that, but not a diff of
several years of work!!!

Upstream users use mdadm-3.4 with NVME devices just fine, so it's not the
changes from 3.3.2 to 3.4 that simply breaks things.

Jes

Comment 12 Jes Sorensen 2016-11-14 14:28:38 UTC
If their system doesn't have an IMSM compatible BIOS and they created an 
array using IMSM then they are running an unsupported configuration.

I don't know that system, you'll have to provide that information.

Comment 13 loberman 2016-11-14 14:36:57 UTC
I have reached out to the customer asking about the reason they used the -e imsm option when creating the array.
Specifically if the R930 they have required that option or not.

I will post the update I get from them in the BZ

Thanks
Laurence

Comment 14 loberman 2016-11-14 14:44:42 UTC
Customers response suggests they were advised to use the -e imsm when creating the array at inception.

"Laurence,
It's been on the R930 since inception.
I don’t know if it was specifically required that we use the -e imsm option but it was in the information that we were given in order to get this created.
I'll get more on this to you once I find it.
Regards,
Jonathan
"

I am still looking into specifically what the R930 has for IMSM support. Going through the SPECS so far has not called that out.

Thanks
Laurence

Comment 15 loberman 2016-11-14 16:01:23 UTC
Systems that dont have the imsm support would fail the array creation from what I can see here.

[root@dhcp-49-29 ~]#  mdadm --create --verbose /dev/md/imsm0  --level=linear --raid-devices=2 /dev/sdb1 /dev/sdc1 -e imsm

mdmon: Cannot create this array on device /dev/sdb1
mdadm: /dev/sdb1 is not suitable for this array.
mdmon: Cannot create this array on device /dev/sdc1
mdadm: /dev/sdc1 is not suitable for this array.
mdadm: create aborted

Repeat with no imsm and it works

[root@dhcp-49-29 ~]#  mdadm --create --verbose /dev/md/imsm0  --level=linear --raid-devices=2 /dev/sdb1 /dev/sdc1 

mdadm: Defaulting to version 1.2 metadata
[ 1161.836487] md: bind<sdb1>
[ 1161.849927] md: bind<sdc1>
[ 1161.864506] md: linear personality registered for level -1
[ 1161.890780] md127: detected capacity change from 0 to 8199927848448
mdadm: array /dev/md/imsm0 started.

This seems to confirm by lab testing that the DEll R930 does indeed support "-e imsm" or it would have failed when the array was initially created by the customer.

Thanks
Laurence

Comment 34 loberman 2016-11-16 14:44:33 UTC
I have reached out to the customer and asked that they let us know when I can log in to their system and review and possibly re-test the process they are using.

I want to make sure we capture all the specifics here because we now have multiple tests in-house where we are unable to reproduce this assemble error after updating mdadm.

Granted we have not been testing with NVME targets here and I am unable to do so in my lab, neither can Xiao.

I have two SSD drives, but that does not represent the nuances of NVME.

Thanks everybody for all your assistance so far, its much appreciated.

Regards
Laurence

Comment 35 Doug Ledford 2016-11-16 15:37:49 UTC
My apologies.  When I read a few of Laurence's questions a few comments back, I jumped in with a few answers I knew to give.  I didn't read the original problem description, I was merely answering the questions asked.  Now I have read the original problem description, and I don't think we have a bug here, I think this is "working as designed".

Specifically, I mentioned in one of my earlier posts that imsm raid arrays had unique restrictions, like it must use SATA ports all on the same chipset, so if you have two SATA controllers, you can not allow your disks in an imsm array to span the SATA ports from one controller to the other.  At the time I mentioned that I wasn't sure how the NVMe devices were falling under those chipsets.  I had my doubts that they were, but I left it at that.  After reading the problem description, I think the issue is that the old mdadm ignored the fact that the NVMe devices were not actually attached to a SATA controller the BIOS recognizes (and therefore can utilize at BIOS time to turn into a RAID device) while the new mdadm has much more stringent checks to make sure that any possible device passed in as an imsm device actually meets the requirements for the BIOS to be able to access it and utilize it.  After all, if the BIOS can't access it, then using the imsm format is not required at all, and can be misleading to people who think if it's in imsm format it should be visible to the machine's BIOS.  In the event that the BIOS can't recognize it, for whatever reason, the imsm format should be rejected and the user should end up using the native md format instead.  I think what's happening here is the user created an imsm array themselves on the NVMe devices, it worked with the old mdadm because it was lenient about the non-conformant device, and the new mdadm is kicking it out as invalid.  If you diff the changes between mdadm versions, I'm betting that you'll find some additional checks relating to the parent device and the constituent devices, and if you disable one or more of those, things will work again.  You could possibly add a "allow imsm on non-BIOS connected devices" flag to mdadm to get around this problem, but the real answer might be to tell the customer "you shouldn't have done that, can you please back up your data, reformat using mdadm native md raid array format, restore your data, then carry on?"

Comment 36 loberman 2016-11-16 15:49:44 UTC
Hello Doug

Yes, that's in line with my original thinking that the newer mdadm is not backwards compatible with weaknesses in the original mdadm allowing this to have happened.

This is a multi-terabyte array they created, about 16TB if I recall so they are unable to backup and restore.

In addition I think others may have fallen into this same trap in how they originally created the arrays so we should add the compatibility flag in fixing mdadm and make customers aware.

Thanks
Laurence

Comment 37 Jes Sorensen 2016-11-16 15:57:06 UTC
Thanks for following-up on this Doug - still catching up from Plumbers travel.

If the customer did something this broken by mistake, then we are not going
to add any flags to work around this. Doing so means we would have to carry
support for this indefinitely into the future, something which we neither want
to nor have the capacity to do.

There is nothing in the documentation stating that a customer should use
-e imsm. They have come up with this on their own.

This is the only case of this I have ever seen, and said customer will have
to convert their data eventually in either case.

Jes

Comment 38 loberman 2016-11-16 16:09:31 UTC
Jes I follow that thought but here is my concern from the CEE view.

Clearly one was able to create the array and assemble it with the older mdadm without any warnings around the use of imsm when it was not actually needed.

This means that no matter if they did something wrong, the prior version did not enforce them doing the right thing.

Most customers would consider this a backwards compatibility issue and would not see it the way you described, even though I am in agreement with you.

What is still not fully captured here is how the customers NVME devices connect to the R930. I am still waiting on that.

If their NVME devices plug into a backplane that feeds to ports on the BIOS embedded matrix array controller then that would have been correct to create them with imsm metadata.

If not, then what Doug described likely happened here.

I am pressing the customer for more detailed information.

Thanks
Laurence

Comment 39 Jes Sorensen 2016-11-16 16:18:41 UTC
Laurence,

It's really simple, if the BIOS sees the drives, they would have created the
arrays there and used the IMSM format.

The fact that they dug out the imsm switch themselves, despite it not being
said anywhere that they should do so, is their problem. It is not listed as
a recommended flag for this use case anywhere.

Adding a broken backwards compatibility switch to help one customer who did
something wrong means carrying the support for this for close to a decade.
This will cause issues and cost for devel and QE and raise the risk of other
bugs being introduced, to work around the case that one customer did something
wrong.

The only case you can argue for this would be if they had created the RAID
in a system with IMSM support and then moved them to one without. However I am
fairly certain Intel's documentation states that isn't supported.

Jes

Comment 40 loberman 2016-11-16 16:20:26 UTC
Hello Jes,

OK, I will work at getting the customer to recreate their array.

I did check lspci and its seems to support that these are non-BIOS connected NVME cards so (as always) Doug was correct.

[loberman@dhcp-33-21 sosreport-icronus.01724820-20161109090030]$ grep "Non-Volatile memory controller" lspci
0d:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller 172X [144d:a821] (rev 01) (prog-if 02 [NVM Express])
0e:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller 172X [144d:a821] (rev 01) (prog-if 02 [NVM Express])
0f:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller 172X [144d:a821] (rev 01) (prog-if 02 [NVM Express])
41:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller 172X [144d:a821] (rev 01) (prog-if 02 [NVM Express])
82:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller 172X [144d:a821] (rev 01) (prog-if 02 [NVM Express])
c1:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller 172X [144d:a821] (rev 01) (prog-if 02 [NVM Express])
c4:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller 172X [144d:a821] (rev 01) (prog-if 02 [NVM Express])
c5:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller 172X [144d:a821] (rev 01) (prog-if 02 [NVM Express])

I will close as notabug.

Regards
Laurence

Comment 41 Doug Ledford 2016-11-16 16:59:51 UTC
It is worth nothing that it *might* be possible to help the customer out in this situation.  If a person has eight drives and creats an imsm array on them using the older mdadm, then it might be possible to find a combination of settings that allowed you to create a new md raid array over the top of the existing imsm array such that the new mdadm based array would come up and all their data would be intact.  For imsm devices, the superblock is at the end of the device, so you would have to use mdadm array superblock version 1.0, and you would have to use the entire block device just like imsm does.  You would then need to match the array type/level, the disk ordering, the chunk size, and finally you would need to pass the --assume-clean flag on the --create command to cause it to just write out the superblock and to not try to do any sort of parity or other data verification on the device.  You could then assemble the device readonly, and attempt a read only mount of the device to verify the data.  If the data is not intact, you can unmount the device, stop the device, then recreate with different settings.

One word of warning here though, if you can't find a set of settings for mdadm that will create the array intact, I don't know for certain if -e imsm will accept --assume-clean.  That should be verified before even attempting this with the customer.  We can check to see if this is even possible if you can get the following information:

Have the customer downgrade to the old mdadm and assemble their array, then collect the following specific information:

cat /proc/mdstat
cat /etc/mdadm.conf
ls -l /dev/nvme*
mdadm -D /dev/md0
mdadm -D /dev/md/*
mdadm -E /dev/nvme*

That should give us enough to determine if we can help them upgrade to an mdadm native array without loosing their data.

Comment 42 loberman 2016-11-16 17:14:56 UTC
Doug,

They are already on the older mdadm because they cannot assemble with the newer version. :)

I will get the additional information from them.

Thanks
Laurence

Comment 43 loberman 2016-11-16 18:07:38 UTC
Created attachment 1221299 [details]
Data requested by Doug

Data Requested by Doug.


Note You need to log in before you can comment on or make changes to this bug.