Bug 1380034

Summary: mdadm examine output formatting is inconsistent
Product: [Fedora] Fedora Reporter: David Lehman <dlehman>
Component: mdadmAssignee: Nigel Croxon <ncroxon>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 25CC: agk, awilliam, blivet-maint-list, dledford, dlehman, extras-qa, Jes.Sorensen, ncroxon, robatino, sbueno, vpodzime, xni
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1379865 Environment:
Last Closed: 2017-05-22 14:57:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description David Lehman 2016-09-28 14:24:14 UTC
+++ This bug was initially created as a clone of Bug #1379865 +++

<snip>

--- Additional comment from David Lehman on 2016-09-28 10:21:47 EDT ---

mdadm is inconsistent in output key format across metadata formats. For v1, it uses "Raid Level :". For intel/imsm/isw, it uses "RAID Level :".

This can be worked around in libblockdev, but it should be fixed in mdadm given that the command line tool is the only API mdadm provides.

Comment 1 Jes Sorensen 2016-09-28 15:28:42 UTC
I agree that this is rather messy, however any change like this has to be
addressed on the RAID mailing list linux-raid.org

Before making any chance to this, I want to be sure all distros are aware of
it coming.

Jes

Comment 2 Adam Williamson 2016-10-03 23:53:28 UTC
For the record, here's a comparison of mdadm --examine -E output for a software RAID set (created by anaconda) and an Intel firmware RAID set. Software RAID:

-----

/dev/vdb1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : f3cb52a4:e4f1abec:79e47da7:4fbe0cb6
           Name : localhost.localdomain:root  (local to host localhost.localdomain)
  Creation Time : Mon Oct  3 23:45:22 2016
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 31438848 (14.99 GiB 16.10 GB)
     Array Size : 15719424 (14.99 GiB 16.10 GB)
    Data Offset : 16384 sectors
   Super Offset : 8 sectors
   Unused Space : before=16296 sectors, after=0 sectors
          State : active
    Device UUID : f53c5cc8:577db78e:d7bf784b:1d153ff7

Internal Bitmap : 8 sectors from superblock
    Update Time : Mon Oct  3 23:48:38 2016
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 669ccd86 - correct
         Events : 32


   Device Role : Active device 1
   Array State : AA ('A' == active, '.' == missing, 'R' == replacing)

-----

Intel firmware RAID:

-----

/dev/sda:
          Magic : Intel Raid ISM Cfg Sig.
        Version : 1.0.00
    Orig Family : 47426418
         Family : 47426418
     Generation : 00000001
     Attributes : All supported
           UUID : 7fe61893:94a3e502:ce92b4f6:4c0e5884
       Checksum : 97fe4ab0 correct
    MPB Sectors : 1
          Disks : 2
   RAID Devices : 1

  Disk00 Serial : 5QM2XY4V
          State : active
             Id : 00000000
    Usable Size : 976768264 (465.76 GiB 500.11 GB)

[Volume0]:
           UUID : 1b60a788:db78aca8:10553d1b:b8a61316
     RAID Level : 0
        Members : 2
          Slots : [UU]
    Failed disk : none
      This Slot : 0
     Array Size : 1953536000 (931.52 GiB 1000.21 GB)
   Per Dev Size : 976768264 (465.76 GiB 500.11 GB)
  Sector Offset : 0
    Num Stripes : 3815500
     Chunk Size : 128 KiB
       Reserved : 0
  Migrate State : idle
      Map State : normal
    Dirty State : clean

  Disk01 Serial : 9VM1BT6B
          State : active
             Id : 00010000
    Usable Size : 976768264 (465.76 GiB 500.11 GB)

-----

I can see how this is a pain for libblockdev to deal with.

Comment 3 Jes Sorensen 2016-10-04 02:07:53 UTC
And I'll put it on the record once more: This is not a Fedora problem, if
you want to see this changed, you have to take it to the upstream lists and
open a discussion about it there.

The Intel Firmware Format has a number of very different properties compared
to the native format. It is not possible to just map those into that of the
mdadm native formats.

Throw DDF into the equation and it gets even more entertaining.

Jes

Comment 4 Jes Sorensen 2016-10-07 03:47:57 UTC
So Anaconda really should be using the --export flag to obtain this
information, like this:

[root@noisybay ~]# mdadm --export --detail /dev/md11
MD_LEVEL=raid5
MD_DEVICES=4
MD_METADATA=1.2
MD_UUID=985dbeef:a881e498:b561c940:63adb2d9
MD_DEVNAME=11
MD_NAME=noisybay.lga.redhat.com:11
MD_DEVICE_sdd2_ROLE=0
MD_DEVICE_sdd2_DEV=/dev/sdd2
MD_DEVICE_sde1_ROLE=spare
MD_DEVICE_sde1_DEV=/dev/sde1
MD_DEVICE_sde2_ROLE=1
MD_DEVICE_sde2_DEV=/dev/sde2
MD_DEVICE_sdf2_ROLE=2
MD_DEVICE_sdf2_DEV=/dev/sdf2
MD_DEVICE_sdg2_ROLE=3
MD_DEVICE_sdg2_DEV=/dev/sdg2
[root@noisybay ~]# mdadm --export --detail /dev/md127
MD_LEVEL=container
MD_DEVICES=2
MD_METADATA=imsm
MD_UUID=5e563e31:bf1b6f18:3aeefc45:49213b67
MD_DEVNAME=imsm0
MD_DEVICE_sdb_ROLE=spare
MD_DEVICE_sdb_DEV=/dev/sdb
MD_DEVICE_sdc_ROLE=spare
MD_DEVICE_sdc_DEV=/dev/sdc
[root@noisybay ~]# mdadm --export --detail /dev/md126
MD_LEVEL=raid1
MD_DEVICES=2
MD_CONTAINER=/dev/md/imsm0
MD_MEMBER=0
MD_UUID=71fce733:487d83ab:a5f2c74e:ae857eee
MD_DEVNAME=IMSM00_0
MD_DEVICE_sdb_ROLE=0
MD_DEVICE_sdb_DEV=/dev/sdb
MD_DEVICE_sdc_ROLE=1
MD_DEVICE_sdc_DEV=/dev/sdc

Does resolve the problem in a reasonable way for Anaconda?

Jes

Comment 5 Adam Williamson 2016-10-07 05:56:58 UTC
It's actually libblockdev, not anaconda. It currently uses three forms of mdadm output to get different data:

mdadm --examine -E
mdadm --detail
mdadm --examine --brief

because no single format seems to provide everything.

--examine and --detail are different things, so let's take them separately. First, --examine. libblockdev tries to get the RAID level, number of devices, name, array size, array UUID, update time, device UUID, number of events, metadata version and chunk size from the --examine -E output. From the --examine --brief output it tries to get the device name (e.g. /dev/md/root or whatever) and the metadata version (again...not sure why it uses both output formats for this, maybe they found that sometimes one has it but the other does not).

Looking at the --examine --export output, at least for my test software RAID set, it has *most* of those values, but it does not have the metadata version and it does not have the device name. So --examine --export could not fully work alone as a replacement, I don't think.

Now looking at --detail, libblockdev reads from that output: version, creation time, raid level, UUID, name, array size, used dev size, raid devices, total devices, active devices, working devices, failed devices, spare devices, and state. Looking at the --detail --export output, that seems to be missing the creation time, array size, 'used dev size', state, and it represents devices rather differently; I'm not sure how equivalent the amount of information it provides it.

So I don't think the --export formats can simply be used as easy replacements for what libblockdev is currently doing, no. I don't know if all the information that's 'missing' from the --export formatted outputs is strictly *required* - there's obvious overlaps between what info libblockdev gets from --examine and what it gets from --detail, and I don't know if it really needs to be able to get all those things from both commands or if that could be rejigged - but it's obviously not just straightforward.

If you want to look at what libblockdev's actually doing with mdadm output, check https://github.com/rhinstaller/libblockdev/blob/master/src/plugins/mdraid.c , functions `get_detail_data_from_table`, `get_examine_data_from_table`, `bd_md_examine` and `bd_md_detail`.

Comment 6 Adam Williamson 2016-10-07 06:24:15 UTC
in fact now i think about it, i'm pretty sure dlehman/vpodzime said something to the effect that they used to use the --export format, but had to switch because it didn't have the chunk size.

Comment 7 Jes Sorensen 2016-10-07 12:33:19 UTC
If something specific is missing from the --export output, please post that
to linux-raid.org and we can look at fixing it. These issues need
to go to the list and not a Fedora bugzilla.

Parsing the non --export format is definitely not the way to go.

Comment 8 Doug Ledford 2016-10-07 16:58:06 UTC
(In reply to Adam Williamson from comment #5)
> It's actually libblockdev, not anaconda. It currently uses three forms of
> mdadm output to get different data:
> 
> mdadm --examine -E
> mdadm --detail
> mdadm --examine --brief
> 
> because no single format seems to provide everything.

So, detail is meant to only be used on running arrays (and as such it queries the kernel about the information for the device passed in on the command line).

Examine is meant to only be used on constituent devices, so it reads its information from the superblock directly.

Examine brief is intended to provide a direct drop in ARRAY line for mdadm.conf that will bring the array in question back just as it is now, including if it is currently running and the name in use does not match the name in the name field of the superblock, then it will spit out a different name on the ARRAY line that will bring the device back as it currently is now (IIRC, it might have changed since I last worked on this code).

I only bring this up because the different modes should produce slightly different output.  For instance, examine will get the device uuid, where as detail only really wants to deal with the whole device and hence only provides the array uuid, and not any of the device uuids.  Stuff like that.

I have no idea what libblockdev is doing with all this information, so I won't comment on whether or not it's truly needed.  But, I do wonder why it uses --examine --brief as there is nothing there that can't be gathered from --examine and a knowledge of how the devices are currently brought up.

I would also parrot Jes' suggestion that we address the deficiencies of mdadm --export upstream.  The export code was written specifically to support programs like libblockdev, but it was obviously written prior to libblockdev's current incarnation and it made assumptions about what it thought anaconda or udev would need.  Those assumptions were wrong and you guys are requesting more/different information, so that just needs to be worked out.

Comment 9 Vratislav Podzimek 2016-10-13 08:58:57 UTC
So, to some this all up a bit:

1. 'mdadm --examine --export' is missing things from 'mdadm --examine' libblockdev needs to gather (e.g. chunk size, but also see below)

2. 'mdadm --examine' is missing information about the MD RAID device (if applicable/activated) as well as metadata version, 'mdadm --examine --brief' provides these values.

3. 'mdadm --examine --export' may or may not provide info about metadata version depending on the technology (provided for IMSM RAID)

4. 'mdadm --examine' output differs significantly for e.g. IMSM RAID and "standard" SW RAID

5. 'mdadm --detail --export' is missing things from 'mdadm --detail' libblockdev needs to gather

A solution would be to add all the missing pieces to the '--export' outputs (for both --examine and --detail). However, what about a different approach which more and more storage-related utilities choose these days -- a JSON output via a newly added '--json' option. Producing cannonicalized and complete structured data would save us a lot of troubles. Now I know I should propose this upstream, but maybe we can discuss it here to reach a consensus between a few people first.?

Comment 10 Jes Sorensen 2016-10-13 13:51:55 UTC
OK, let me state this one more time: 

This needs to be raised on the linux-raid.org mailing list -
these requests do NOT belong in a Fedora bugzilla.

Second I can tell you right now that I will not be adding a JSON output
option to mdadm. The --export option handles it just fine and in a much
nicer way than JSON would do.

Jes

Comment 11 Nigel Croxon 2017-05-22 14:57:37 UTC
I am closing this BZ.

If you have pushed your issues upstream and the patches are ready to pull into Fedora. Then please reopen or open a new BZ.

-Nigel