Bug 1015514 - Check for DM_UDEV_DISABLE_OTHER_RULES_FLAG instead of DM_UDEV_DISABLE_DISK_RULES_FLAG in 65-md-incremental.rules
Check for DM_UDEV_DISABLE_OTHER_RULES_FLAG instead of DM_UDEV_DISABLE_DISK_RU...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: mdadm (Show other bugs)
6.5
All Linux
high Severity high
: rc
: ---
Assigned To: Jes Sorensen
XiaoNi
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-04 08:29 EDT by Peter Rajnoha
Modified: 2013-11-21 07:08 EST (History)
4 users (show)

See Also:
Fixed In Version: mdadm-3.2.6-7.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1015515 1015521 (view as bug list)
Environment:
Last Closed: 2013-11-21 07:08:09 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Check for DM_UDEV_DISABLE_OTHER_RULES_FLAG (655 bytes, patch)
2013-10-04 08:29 EDT, Peter Rajnoha
no flags Details | Diff

  None (edit)
Description Peter Rajnoha 2013-10-04 08:29:51 EDT
Created attachment 807607 [details]
Check for DM_UDEV_DISABLE_OTHER_RULES_FLAG

There's DM_UDEV_DISABLE_DISK_RULES_FLAG used in /lib/udev/rules.d/65-md-incremental.rules to avoid scanning of inappropriate device-mapper devices. Though this flag should be used only in 13-dm-disk.rules (so this flag is for internal device-mapper's use as these rules belong to device-mapper).

Any external rules should check for DM_UDEV_DISABLE_OTHER_RULES_FLAG for proper coordination with device-mapper devices. Please, see attached patch.
Comment 4 XiaoNi 2013-10-21 03:39:41 EDT
Hi Jes

   I have checked, the patch is already in mdadm-3.2.6-7.el6. But I have no good idea to reproduce it, could you give me some suggestion about reproducing? Thanks very much.
Comment 5 Jes Sorensen 2013-11-06 04:43:35 EST
Xiao,

This was requested by Peter for the DM tools, the flags are set on their side
and I don't directly deal with them on the mdadm side, so I think we need to
ask Peter for recommendations here.

Thanks,
Jes
Comment 6 Peter Rajnoha 2013-11-06 07:02:42 EST
Sorry for the delay and no answer for comment #3, I forgot to write back as I was busy with other things...

Well, for example, this may be used to test the functionality:

---> STEP 1 - create PV/VG and 2 LVs

[root@rhel6-a ~]# pvcreate /dev/sda
  Physical volume "/dev/sda" successfully created

[root@rhel6-a ~]# vgcreate vg /dev/sda
  Volume group "vg" successfully created

[root@rhel6-a ~]# lvcreate -L512m vg
  Logical volume "lvol0" created

[root@rhel6-a ~]# lvcreate -L512m vg
  Logical volume "lvol1" created


---> STEP 2 - create an MD array on top of LVs, e.g. a mirror. Use 1.1 metadata format (so we're sure it's at the beginning of the LVs, wipefs will show that in the "offset" as well

[root@rhel6-a ~]# mdadm --create /dev/md0 --metadata="1.1" --level=1 --raid-devices=2 /dev/vg/lvol0 /dev/vg/lvol1
mdadm: array /dev/md0 started.

[root@rhel6-a ~]# lsblk /dev/sda
NAME              MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda                 8:0    0     4G  0 disk  
|-vg-lvol0 (dm-2) 253:2    0   512M  0 lvm   
| `-md0             9:0    0 511.7M  0 raid1 
`-vg-lvol1 (dm-3) 253:3    0   512M  0 lvm   
  `-md0             9:0    0 511.7M  0 raid1 



[root@rhel6-a ~]# wipefs /dev/vg/lvol0
offset               type
----------------------------------------------------------------
0x0                  linux_raid_member   [raid]
                     LABEL: rhel6-a:0
                     UUID:  81189229-3816-fda3-8bed-9f6e3a5d45e5

[root@rhel6-a ~]# wipefs /dev/vg/lvol1
offset               type
----------------------------------------------------------------
0x0                  linux_raid_member   [raid]
                     LABEL: rhel6-a:0
                     UUID:  81189229-3816-fda3-8bed-9f6e3a5d45e5


---> STEP 3 - stop the MD array and remove LVs

[root@rhel6-a ~]# mdadm -S /dev/md0
mdadm: stopped /dev/md0

[root@rhel6-a ~]# lvremove -ff vg/lvol0 vg/lvol1
  Logical volume "lvol0" successfully removed
  Logical volume "lvol1" successfully removed

[root@rhel6-a ~]# lsblk /dev/sda
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda    8:0    0   4G  0 disk 


---> STEP 4 - kill running udevd daemon and run it in debug mode (so we can check what it's doing exactly)

[root@rhel6-a ~]# killall udevd 

[root@rhel6-a ~]# udevd --debug &> udev_log


---> STEP 5 - create the same LVs again - they will fit exactly on the original LVs (the same offsets...)

[root@rhel6-a ~]# lvcreate -L512M vg
  Logical volume "lvol0" created

[root@rhel6-a ~]# lvcreate -L512M vg
  Logical volume "lvol1" created

[root@rhel6-a ~]# lsblk /dev/sda
NAME              MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                 8:0    0    4G  0 disk 
|-vg-lvol0 (dm-2) 253:2    0  512M  0 lvm  
`-vg-lvol1 (dm-3) 253:3    0  512M  0 lvm 

---> STEP 6 - if everything works correctly, LVM should have a chance to wipe the MD array signature during "LV zeroing" (wiping part of the LV so it's clear and ready for use). To prove this, there should be no scan done in udev as the MD array signature is cleared by LVM *before* udev has a chance to see the signature and therefore there's no interference with LVM and recreating the old MD array (from any stale metadata that should be ignored)

[root@rhel6-a ~]# grep mdadm udev_log

[root@rhel6-a ~]# echo $?
1

---> no mdadm call found, which is what we want!

====

Now, to prove that it *does* interfere if the DM_UDEV_DISABLE_* flags are not used correctly, you can do:

---> do STEP 3

---> edit /lib/udev/rules.d/13-dm-disk.rules and comment out this line like this:

  #ENV{DM_NOSCAN}=="1", GOTO="dm_watch"

---> also, put back the DM_UDEV_DISABLE_DISK_RULES back instead of the correct DM_UDEV_DISABLE_OTHER_RULES in /lib/udev/rules.d/65-md-incremental.rules:

  ENV{DM_UDEV_DISABLE_OTHER_RULES_FLAG}=="1", GOTO="dm_change_end"
    changed to original and incorrect
  ENV{DM_UDEV_DISABLE_DISK_RULES_FLAG}=="1",GOTO="dm_change_end"

---> do STEP 4, STEP 5 and STEP 6

---> now, STEP 6 should show interference with mdadm:

[root@rhel6-a ~]# grep mdadm udev_log
1383737656.891739 [3119] udev_rules_apply_to_event: RUN '/sbin/mdadm -I $env{DEVNAME}' /lib/udev/rules.d/65-md-incremental.rules:28
1383737656.894028 [3119] util_run_program: '/sbin/mdadm -I /dev/dm-2' started
1383737656.907670 [3119] util_run_program: '/sbin/mdadm' (stderr) 'mdadm: /dev/dm-2 attached to /dev/md/0, not enough to start safely.'
1383737656.907798 [3119] util_run_program: '/sbin/mdadm -I /dev/dm-2' returned with exitcode 0
1383737656.920539 [3119] udev_rules_apply_to_event: RUN '/sbin/mdadm -I $env{DEVNAME}' /lib/udev/rules.d/65-md-incremental.rules:56
1383737656.921442 [3119] util_run_program: '/sbin/mdadm -I /dev/dm-2' started
1383737656.922217 [3119] util_run_program: '/sbin/mdadm' (stderr) 'mdadm: cannot open /dev/dm-2: Device or resource busy.'
1383737656.922303 [3119] util_run_program: '/sbin/mdadm -I /dev/dm-2' returned with exitcode 1
1383737658.030467 [3118] udev_rules_apply_to_event: RUN '/sbin/mdadm -I $env{DEVNAME}' /lib/udev/rules.d/65-md-incremental.rules:28
1383737658.031526 [3118] util_run_program: '/sbin/mdadm -I /dev/dm-3' started
1383737658.041573 [3118] util_run_program: '/sbin/mdadm' (stderr) 'mdadm: metadata mismatch between /dev/dm-3 and chosen array /dev/md/0'
1383737658.041728 [3118] util_run_program: '/sbin/mdadm -I /dev/dm-3' returned with exitcode 2
1383737658.046832 [3118] udev_rules_apply_to_event: RUN '/sbin/mdadm -I $env{DEVNAME}' /lib/udev/rules.d/65-md-incremental.rules:56
1383737658.047541 [3118] util_run_program: '/sbin/mdadm -I /dev/dm-3' started
1383737658.055847 [3118] util_run_program: '/sbin/mdadm' (stderr) 'mdadm: metadata mismatch between /dev/dm-3 and chosen array /dev/md/0'
1383737658.056052 [3118] util_run_program: '/sbin/mdadm -I /dev/dm-3' returned with exitcode 2

[root@rhel6-a ~]# echo $?
0

And this is wrong as mdadm is touching the LV before it has a chance to wipe its start and remove any stale metadata of any other block subsystems or whatever else might be there... So that's the other way round - to also prove that before the patch, these things worked incorrectly.

Hope that helps, if anything, feel free to ask... Again, sorry for the delay.
Comment 7 Peter Rajnoha 2013-11-06 07:12:11 EST
(this is just simple and a bit artificial example, other tests might be quite complex, but this is the gist of the solution that should be tested - if it works here, it works in any other and more comple situations)
Comment 8 errata-xmlrpc 2013-11-21 07:08:09 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1643.html

Note You need to log in before you can comment on or make changes to this bug.