Bug 1369625

Summary:	mdraid devices got unexpectedly switched over to dmraid devices
Product:	[Fedora] Fedora	Reporter:	Bruno Wolff III <bruno>
Component:	mdadm	Assignee:	Jes Sorensen <Jes.Sorensen>
Status:	CLOSED NOTABUG	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	24	CC:	agk, bruno, dledford, heinzm, Jes.Sorensen, xni
Target Milestone:	---	Flags:	heinzm: needinfo-
Target Release:	---
Hardware:	Unspecified
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-08-24 21:07:16 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Bruno Wolff III 2016-08-24 02:47:20 UTC

User-Agent:       Mozilla/5.0 (X11; Fedora; Linux i686; rv:48.0) Gecko/20100101 Firefox/48.0
Build Identifier: 

I noticed today that an f24 system and an f25 system using mdraid ended up using dmraid for all raid arrays except /boot. I'm starting this at mdadm, but it seems more likely the problem is more likely with dracut or systemd.

I don't know if there is risk of corruption or not, but using the wrong drivers certainly seems dangerous.

I am also most used to fixing arrays using mdadm, but it won't work on dmraid devices. I had a device get failed out of an array and I am not sure how to safely add it back. This is what got me to notice this.

f24:
[root@wolff bruno]# df
Filesystem     1K-blocks      Used Available Use% Mounted on
devtmpfs         1540444         0   1540444   0% /dev
tmpfs            1550340         4   1550336   1% /dev/shm
tmpfs            1550340      1144   1549196   1% /run
tmpfs            1550340         0   1550340   0% /sys/fs/cgroup
/dev/dm-1       82435688  66685684  15649740  81% /
tmpfs            1550340        32   1550308   1% /tmp
/dev/dm-2       34872904  26429028   6649380  80% /qmail
/dev/md11        1015692    147772    850488  15% /boot
/dev/dm-3      213607872 201361028  12013316  95% /home
tmpfs             310072         0    310072   0% /run/user/500
[root@wolff bruno]# cat /proc/mdstat
Personalities : [raid1] 
md14 : active raid1 sda4[1] sdb4[2]
      217148624 blocks super 1.2 [2/2] [UU]
      
md15 : active raid1 sdd1[0] sdc1[1](F)
      35564360 blocks super 1.2 [2/1] [U_]
      
md13 : active raid1 sdb3[2] sda3[1]
      83884984 blocks super 1.2 [2/2] [UU]
      
md12 : active raid1 sda2[1] sdb2[2]
      10484664 blocks super 1.2 [2/2] [UU]
      
md11 : active raid1 sdb1[2] sda1[1]
      1048564 blocks super 1.0 [2/2] [UU]
      
unused devices: <none>

f25:
[bruno@cerberus ~]$ df
Filesystem      1K-blocks      Used  Available Use% Mounted on
devtmpfs         16422820         0   16422820   0% /dev
tmpfs            16435504       116   16435388   1% /dev/shm
tmpfs            16435504      2380   16433124   1% /run
tmpfs            16435504         0   16435504   0% /sys/fs/cgroup
/dev/dm-0       264092676  93179836  157474684  38% /
tmpfs            16435504       244   16435260   1% /tmp
/dev/md125         999288    175472     755004  19% /boot
/dev/dm-2      1591087480 336256604 1173985260  23% /home
tmpfs             3287100        16    3287084   1% /run/user/1000
tmpfs             3287100         0    3287100   0% /run/user/0
[bruno@cerberus ~]$ cat /proc/mdstat
Personalities : [raid1] 
md124 : active raid1 sda5[0] sdb5[1]
      1616586752 blocks super 1.2 [2/2] [UU]
      bitmap: 2/13 pages [8KB], 65536KB chunk

md125 : active raid1 sdb3[1] sda3[0]
      1049536 blocks super 1.0 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md126 : active raid1 sda2[0] sdb2[1]
      67110912 blocks super 1.2 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md127 : active raid1 sda1[0] sdb1[1]
      268437504 blocks super 1.2 [2/2] [UU]
      bitmap: 0/3 pages [0KB], 65536KB chunk

unused devices: <none>


Reproducible: Didn't try

Steps to Reproduce:
I'm not sure how to trigger it or when the issue started.

Comment 1 Jes Sorensen 2016-08-24 15:51:46 UTC

Hi,

The fact that the devices show up in /proc/mdstat makes me think they are
still assembled correctly as mdadm device, but just got renamed during boot?
If dmraid is assembling (it really shouldn't even be able to do so) the arrays
before mdadm gets to them, that is definitely a bug.

Jes

Comment 2 Bruno Wolff III 2016-08-24 17:20:03 UTC

That's actually encouraging as my data is less likely to get toasted. However mdadm doesn't want mess with the devices with those names or something.
[root@cerberus bruno]# mdadm -D /dev/dm-0
mdadm: /dev/dm-0 does not appear to be an md device

Comment 3 Jes Sorensen 2016-08-24 17:42:55 UTC

Ouch!

If they are assembled as dmraid that is a serious bug that needs to be fixed
ASAP.

Do the devices show up in /dev/md/ as well?

Jes

Comment 4 Bruno Wolff III 2016-08-24 18:17:58 UTC

[root@cerberus bruno]# ls /dev/md*
/dev/md124  /dev/md125  /dev/md126  /dev/md127

/dev/md:
boot  home  root  swap

So it does look like they show up, but those names don't show in df.

Comment 5 Jes Sorensen 2016-08-24 18:30:38 UTC

Gotcha - well that is good news at least. The names used for the actual mount
doesn't matter all that much, as long as the major/minor of the device
is the same as the ones found in /dev/md/

Jes

Comment 6 Bruno Wolff III 2016-08-24 19:21:19 UTC

Do you think it is (somewhat) safe to use mdadm on the /dev/md* names or should I wait until we know about what is happening before touching anything?

Comment 7 Jes Sorensen 2016-08-24 19:29:06 UTC

As long as you just examine the device with mdadm -D you should be safe.
I don't think you can do any damage to the devices either if they are
mistakenly assembled by dmraid.

I have never been in the unfortunate situation where my arrays got assembled 
by anything else than mdadm so not sure if it is even possible - hopefully it
isn't :)

Heinz - is it possible that Bruno's arrays were assembled as dmraid, or is it
just a naming bug here?

Cheers,
Jes

Comment 8 Bruno Wolff III 2016-08-24 19:50:20 UTC

The following suggests that they are set up as md devices but incorrectly named. I don't know if mdadm checks for naming patterns before accessing a device and that is why access via the dm name doesn't work or if there is some real diference in accessing it that way.
[root@cerberus bruno]# mdadm -D /dev/md127
/dev/md127:
        Version : 1.2
  Creation Time : Fri Jun  5 18:05:57 2015
     Raid Level : raid1
     Array Size : 268437504 (256.00 GiB 274.88 GB)
  Used Dev Size : 268437504 (256.00 GiB 274.88 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Wed Aug 24 14:47:29 2016
          State : clean 
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           Name : localhost.localdomain:root
           UUID : 7f4fcca0:13b1445f:a91ff455:6bb1ab48
         Events : 455

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
[root@cerberus bruno]# mdadm -D /dev/dm-0
mdadm: /dev/dm-0 does not appear to be an md device

Comment 9 Doug Ledford 2016-08-24 19:52:58 UTC

I think the devices are assembled as LVM managed raid devices where LVM is merely passing the raid task on through to mdadm.  Run an lvscan on the system and see what it says about the dm-? devices.

Comment 10 Jes Sorensen 2016-08-24 20:04:09 UTC

If you can access them via /dev/md/<X> then you should be fine, and it's
almost certainly a naming issue.

It is not impossible mdadm does something silly with the naming here.

Jes

Comment 11 Bruno Wolff III 2016-08-24 20:13:36 UTC

I'm not using lvm. lvscan -a doesn't return any output.

Comment 12 Jes Sorensen 2016-08-24 20:17:36 UTC

Lost Heinz's NEEDINFO request there - so putting it back.

In this case, it sounds like something incorrectly renamed the device at
assembly time then. If you mv the device from /dev/dm-0 to /dev/md<X>
does mdadm start treating it like how you would expect ?

Comment 13 Bruno Wolff III 2016-08-24 20:33:40 UTC

Should I overwrite the existing device of that name or pick some new /dev/mdXXX name?

Comment 14 Jes Sorensen 2016-08-24 20:40:20 UTC

/dev/mdX names should be generated at boot time - so if you simply do
mv /dev/dm-0 /dev/md<X> that ought to be safe.

Jes

Comment 15 Bruno Wolff III 2016-08-24 20:44:12 UTC

[root@cerberus bruno]# mv /dev/dm-0 /dev/md0
You have new mail in /home/bruno/Maildir
[root@cerberus bruno]# mdadm -D /dev/md0
mdadm: /dev/md0 does not appear to be an md device

Comment 16 Jes Sorensen 2016-08-24 20:46:50 UTC

So much for that idea :(

Do you have anything in /dev/mapper/ ?

Comment 17 Bruno Wolff III 2016-08-24 20:51:39 UTC

So this was my screw up. I am using luks and was used to seeing the luks names, not the dm names in df. And I hadn't looked at in a while. When I moved the name, df started showing what I was more used to and I now realize that I was trying to use the luks device name with mdadm not the md device name. So this was a false alarm. I think there was a change to df output and I just got confused by it, though I really shouldn't have been.
Sorry to have bothered you guys.

Comment 18 Jes Sorensen 2016-08-24 20:56:46 UTC

Bruno,

No worries - just to be sure I understand you completely, this is not a bug
after all and we can close it?

Thanks,
Jes

Comment 19 Bruno Wolff III 2016-08-24 21:07:16 UTC

Yes it is not a bug. I thought I closed it as part of my last comment, but apparently I didn't. It should be closed as I submit this.