Bug 1069597

Summary: blivet.reset() activates MD RAID on multipath devices
Product: Red Hat Enterprise Linux 7 Reporter: Jan Safranek <jsafrane>
Component: python-blivetAssignee: David Lehman <dlehman>
Status: CLOSED ERRATA QA Contact: Release Test Team <release-test-team-automation>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.0CC: jstodola
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: python-blivet-0.61.15.9-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1070062 (view as bug list) Environment:
Last Closed: 2015-11-19 08:44:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1070062    
Attachments:
Description Flags
reproducer (basically blivet.reset() caller)
none
good blivet log (md127 is not created)
none
bad blivet log (md127 is created) none

Description Jan Safranek 2014-02-25 11:18:54 UTC
Description of problem:
On machine with a deactivated MD RAID array on multipath devices (see below),
'lmi storage list', finishes with traceback:

Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/cmpi_pywbem_bindings.py", line 82, in __call__
    return self.meth(*args, **kwds)
  File "/usr/lib64/python2.7/site-packages/cmpi_pywbem_bindings.py", line 507, in get_instance
    pinst = self.proxy.MI_getInstance(env, op, plist)
  File "/usr/lib/python2.7/site-packages/pywbem/cim_provider2.py", line 1802, in MI_getInstance
    propertyList)
  File "/usr/lib/python2.7/site-packages/pywbem/cim_provider2.py", line 551, in MI_getInstance
    rval = self.get_instance(env=env, model=model)
  File "/usr/lib/python2.7/site-packages/lmi/providers/cmpi_logging.py", line 266, in _wrapper
    result = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/lmi/storage/LMI_MDRAIDFormatProvider.py", line 60, in get_instance
    model[\'MDUUID\'] = fmt.mdUuid
AttributeError: \'MultipathMember\' object has no attribute \'mdUuid\'
'

'lmi storage list' calls GetInstance() on LMI_MDRAIDFormat internally.



Version-Release number of selected component (if applicable):
openlmi-storage-0.7.1-5.el7.noarch
python-blivet-0.18.27-1.el7.noarch

How reproducible:
always

Steps to Reproduce:
1. create two multipath devices
2. using lmi, create a raid with these two multipath devices:
   lmi storage raid create 0 mpatha mpathb

3. deactivate the array *outside OpenLMI*, don't remove its metadata
   mdadm -S /dev/md127

4. restart OpenLMI provider:
   service tog-pegasus restart

5. notice that OpenLMI provider *activated* the array:
   lmi storage list
   mdadm -D  /dev/md127

6. go to 3 and repeat

Actual results:
step 5 for the first time:
- lmi storage list does not list the MD RAID
- but mdadm -D shows it

step 5 for the second time:
- lmi storage list throws a traceback


Expected results:
- the RAID is *not* activated at step 5 
- no traceback

Comment 2 Jan Safranek 2014-02-25 12:44:45 UTC
Something is wrong in blivet or below. Simple blivet.reset() activates MD RAID on a multipath devices, while I do not see any mdadm --create nor --assemble call in its log (nor strace).

I _guess_ something triggers multipathd, which triggers MD RAID creation, just as like the multipath devices were just assembled.

Reproducer:

1) get two multipath devices

2) create raid:
   mdadm -C -l 0 -n 2 /dev/md127 /dev/mapper/mpath{a,b}

3) stop the raid:
   mdadm -S /dev/md127

4) initialize blivet
   python blivet_init.py

5) see that the raid is active
   mdadm -D /dev/md127

6) go to 4) until the bug it reproduced (= MD RAID gets magically activated), usually one or two rounds are needed.

In 'udevadm monitor' I can see:

- b.reset() triggers change events on my iSCSI devices:
KERNEL[12499.159123] change   /devices/platform/host3/session1/target3:0:0/3:0:0:1/block/sdg (block)
UDEV  [12499.182739] change   /devices/platform/host3/session1/target3:0:0/3:0:0:1/block/sdg (block)
KERNEL[12499.184022] change   /devices/platform/host3/session1/target3:0:0/3:0:0:2/block/sdh (block)
UDEV  [12499.199825] change   /devices/platform/host3/session1/target3:0:0/3:0:0:2/block/sdh (block)
KERNEL[12499.213178] change   /devices/platform/host4/session2/target4:0:0/4:0:0:1/block/sde (block)
UDEV  [12499.229363] change   /devices/platform/host4/session2/target4:0:0/4:0:0:1/block/sde (block)
KERNEL[12499.254153] change   /devices/platform/host4/session2/target4:0:0/4:0:0:2/block/sdf (block)
UDEV  [12499.269382] change   /devices/platform/host4/session2/target4:0:0/4:0:0:2/block/sdf (block)

- md127 gets added and removed:
KERNEL[12499.359647] add      /devices/virtual/bdi/9:127 (bdi)
UDEV  [12499.360280] add      /devices/virtual/bdi/9:127 (bdi)
KERNEL[12499.360497] add      /devices/virtual/block/md127 (block)
UDEV  [12499.362262] change   /devices/virtual/block/dm-0 (block)
KERNEL[12499.374738] change   /devices/virtual/block/md127 (block)
UDEV  [12499.408290] add      /devices/virtual/block/md127 (block)
UDEV  [12499.420455] change   /devices/virtual/block/md127 (block)
UDEV  [12499.575393] change   /devices/virtual/block/dm-1 (block)
KERNEL[12499.593315] change   /devices/virtual/block/md127 (block)
KERNEL[12499.615377] change   /devices/virtual/block/md127 (block)
KERNEL[12499.615392] change   /devices/virtual/block/md127 (block)
KERNEL[12499.628675] remove   /devices/virtual/bdi/9:127 (bdi)
UDEV  [12499.628689] remove   /devices/virtual/bdi/9:127 (bdi)
KERNEL[12499.628734] remove   /devices/virtual/block/md127 (block)
UDEV  [12499.652990] change   /devices/virtual/block/md127 (block)
UDEV  [12499.664355] change   /devices/virtual/block/md127 (block)
UDEV  [12499.684930] change   /devices/virtual/block/md127 (block)
UDEV  [12499.688831] remove   /devices/virtual/block/md127 (block)


- and sometimes this md127 is only added and not removed:
KERNEL[13003.512605] change   /devices/virtual/block/dm-0 (block)
KERNEL[13003.542639] change   /devices/virtual/block/dm-1 (block)
KERNEL[13003.569595] add      /devices/virtual/bdi/9:127 (bdi)
UDEV  [13003.570605] add      /devices/virtual/bdi/9:127 (bdi)
KERNEL[13003.570834] add      /devices/virtual/block/md127 (block)
UDEV  [13003.573644] change   /devices/virtual/block/dm-0 (block)
KERNEL[13003.598806] change   /devices/virtual/block/md127 (block)
UDEV  [13003.602232] add      /devices/virtual/block/md127 (block)
UDEV  [13003.639707] change   /devices/virtual/block/md127 (block)
UDEV  [13003.799472] change   /devices/virtual/block/dm-1 (block)

Now, what creates this md127 device?

Comment 3 Jan Safranek 2014-02-25 12:47:00 UTC
Created attachment 867383 [details]
reproducer (basically blivet.reset() caller)

Comment 4 Jan Safranek 2014-02-25 12:49:10 UTC
Created attachment 867384 [details]
good blivet log (md127 is not created)

Comment 5 Jan Safranek 2014-02-25 13:00:04 UTC
Created attachment 867400 [details]
bad blivet log (md127 is created)

Comment 6 David Lehman 2014-02-25 17:35:56 UTC
Those change events are generated any time a read-write file descriptor to the mpath device is closed, which happens every time you reset a Blivet instance (which deletes/closes a libparted reference to the disk). I have no idea how this can be avoided. Something that opens the device read-only until it is asked to write to it would have to happen in libparted, and I think it would be a contentious topic.

Comment 7 Jan Safranek 2014-02-26 07:54:22 UTC
Ok, so let's accept that the RAID gets activated for now. I let this bug open so libparted can be eventually fixed and I'm going to clone this bug to fix OpenLMI - it should not throw traceback when this happens.

Comment 8 Jan Safranek 2014-02-26 08:01:14 UTC
Wait, why is blivet/libparted opening iscsi drives /dev/sdg - sdf? They are multipath members, any write operation on them can be harmful. It should scan the multipath device only. And if they weren't scanned, maybe the raid wouldn't be activated.

Comment 10 David Lehman 2015-06-11 18:53:29 UTC
(In reply to Jan Safranek from comment #8)
> Wait, why is blivet/libparted opening iscsi drives /dev/sdg - sdf? They are
> multipath members, any write operation on them can be harmful. It should
> scan the multipath device only. And if they weren't scanned, maybe the raid
> wouldn't be activated.

Before the days of auto-activation via udev, blivet began using parted.Device instances to provide several bits of information about block devices. Some examples are size, read/write status, vendor, and model. On blivet's master branch I have changed this as part of the preparation for uevent handling.

Comment 11 David Lehman 2015-06-19 13:20:51 UTC
(In reply to Jan Safranek from comment #8)
> Wait, why is blivet/libparted opening iscsi drives /dev/sdg - sdf? They are
> multipath members, any write operation on them can be harmful. It should
> scan the multipath device only. And if they weren't scanned, maybe the raid
> wouldn't be activated.

I am fairly certain that udev synthesizing change events on the mpath devices themselves is what is causing the md arrays to get activated. There is no reason to believe otherwise unless /proc/mdstat shows the members to be on SCSI instead of device-mapper.

Comment 13 Jan Stodola 2015-08-20 09:17:39 UTC
Retested with python-blivet-0.61.15.22-1.el7, this issue is no longer reproducible, RAID array is not activated.

Moving to VERIFIED.

Comment 14 errata-xmlrpc 2015-11-19 08:44:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2232.html