1196666 – MDRaidError: mdexamine failed for /dev/mapper/mpathe1: 1

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1196666 - MDRaidError: mdexamine failed for /dev/mapper/mpathe1: 1

Summary: MDRaidError: mdexamine failed for /dev/mapper/mpathe1: 1

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	python-blivet
Sub Component:
Version:	7.1
Hardware:	ppc64
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	David Lehman
QA Contact:	Release Test Team
Docs Contact:
URL:
Whiteboard:	abrt_hash:4c5461c881f21ed253e37710e17...
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-02-26 13:45 UTC by Jaromír Cápík
Modified:	2021-09-03 14:07 UTC (History)
CC List:	4 users (show)
Fixed In Version:	python-blivet-0.61.15.48-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-11-03 23:49:33 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
File: anaconda-tb (1.58 MB, text/plain) 2015-02-26 13:46 UTC, Jaromír Cápík	no flags	Details
File: anaconda.log (57.02 KB, text/plain) 2015-02-26 13:46 UTC, Jaromír Cápík	no flags	Details
File: environ (404 bytes, text/plain) 2015-02-26 13:46 UTC, Jaromír Cápík	no flags	Details
File: lsblk_output (5.96 KB, text/plain) 2015-02-26 13:46 UTC, Jaromír Cápík	no flags	Details
File: nmcli_dev_list (2.80 KB, text/plain) 2015-02-26 13:46 UTC, Jaromír Cápík	no flags	Details
File: os_info (510 bytes, text/plain) 2015-02-26 13:46 UTC, Jaromír Cápík	no flags	Details
File: program.log (50.17 KB, text/plain) 2015-02-26 13:46 UTC, Jaromír Cápík	no flags	Details
File: storage.log (1.00 MB, text/plain) 2015-02-26 13:46 UTC, Jaromír Cápík	no flags	Details
File: syslog (113.55 KB, text/plain) 2015-02-26 13:46 UTC, Jaromír Cápík	no flags	Details
File: ifcfg.log (9.88 KB, text/plain) 2015-02-26 13:46 UTC, Jaromír Cápík	no flags	Details
File: packaging.log (185.85 KB, text/plain) 2015-02-26 13:46 UTC, Jaromír Cápík	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:2168	0	normal	SHIPPED_LIVE	python-blivet bug fix and enhancement update	2016-11-03 13:15:34 UTC

Description Jaromír Cápík 2015-02-26 13:45:52 UTC

Description of problem:
I switched to the shell and created one primary partition (5GB) on mpathe and mpathf, then did a rescan.

Version-Release number of selected component:
anaconda-19.31.111-1

The following was filed automatically by anaconda:
anaconda 19.31.111-1 exception report
Traceback (most recent call first):
  File "/usr/lib/python2.7/site-packages/blivet/devicelibs/mdraid.py", line 280, in mdexamine
    raise MDRaidError("mdexamine failed for %s: %s" % (device, e))
  File "/usr/lib/python2.7/site-packages/blivet/devicetree.py", line 1809, in handleUdevDeviceFormat
    info.update(mdraid.mdexamine(device.path))
  File "/usr/lib/python2.7/site-packages/blivet/devicetree.py", line 1232, in addUdevDevice
    self.handleUdevDeviceFormat(info, device)
  File "/usr/lib/python2.7/site-packages/blivet/devicetree.py", line 2169, in _populate
    self.addUdevDevice(dev)
  File "/usr/lib/python2.7/site-packages/blivet/devicetree.py", line 2104, in populate
    self._populate()
  File "/usr/lib/python2.7/site-packages/blivet/__init__.py", line 483, in reset
    self.devicetree.populate(cleanupOnly=cleanupOnly)
  File "/usr/lib/python2.7/site-packages/blivet/__init__.py", line 186, in storageInitialize
    storage.reset()
  File "/usr/lib64/python2.7/threading.py", line 764, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib64/python2.7/site-packages/pyanaconda/threads.py", line 211, in run
    threading.Thread.run(self, *args, **kwargs)
MDRaidError: mdexamine failed for /dev/mapper/mpathe1: 1

Additional info:
cmdline:        /usr/bin/python  /sbin/anaconda
cmdline_file:   ro vnc noeject
executable:     /sbin/anaconda
hashmarkername: anaconda
kernel:         3.10.0-210.el7.ppc64
product:        Red Hat Enterprise Linux 7"
release:        Red Hat Enterprise Linux ComputeNode release 7.1 Beta (Maipo)
release_type:   pre-release
type:           anaconda
version:        Red Hat Enterprise Linux ComputeNode"

Comment 1 Jaromír Cápík 2015-02-26 13:46:01 UTC

Created attachment 995593 [details]
File: anaconda-tb

Comment 2 Jaromír Cápík 2015-02-26 13:46:03 UTC

Created attachment 995594 [details]
File: anaconda.log

Comment 3 Jaromír Cápík 2015-02-26 13:46:04 UTC

Created attachment 995595 [details]
File: environ

Comment 4 Jaromír Cápík 2015-02-26 13:46:05 UTC

Created attachment 995596 [details]
File: lsblk_output

Comment 5 Jaromír Cápík 2015-02-26 13:46:07 UTC

Created attachment 995597 [details]
File: nmcli_dev_list

Comment 6 Jaromír Cápík 2015-02-26 13:46:08 UTC

Created attachment 995598 [details]
File: os_info

Comment 7 Jaromír Cápík 2015-02-26 13:46:10 UTC

Created attachment 995599 [details]
File: program.log

Comment 8 Jaromír Cápík 2015-02-26 13:46:16 UTC

Created attachment 995600 [details]
File: storage.log

Comment 9 Jaromír Cápík 2015-02-26 13:46:19 UTC

Created attachment 995601 [details]
File: syslog

Comment 10 Jaromír Cápík 2015-02-26 13:46:20 UTC

Created attachment 995602 [details]
File: ifcfg.log

Comment 11 Jaromír Cápík 2015-02-26 13:46:23 UTC

Created attachment 995603 [details]
File: packaging.log

Comment 13 mulhern 2015-04-07 14:09:19 UTC

The problem is here:

13:31:55,056 INFO program: Running... mdadm --examine --export /dev/mapper/mpathe1
13:31:55,067 INFO program: mdadm: No md superblock detected on /dev/mapper/mpathe1.
13:31:55,067 DEBUG program: Return code: 1

Please give specifics how you created the partition.

We should handle the mdexamine failure more elegantly, certainly.

But, if there is a good reason why mdexamine fails...then it is impossible to handle the format and ultimately to model the array device.

Comment 14 Jaromír Cápík 2015-06-16 15:02:07 UTC

Hello Anne.

I did so many re-installs that it is hard to recall all the details. But I believe I just switched to the shell and created two primary partitions on the mentioned devices using fdisk. The tests were done on IBM Power7R2 with dual-port SAS drives (that's why they're reachable as multipath devices).
Unfortunately I don't remember whether there was a software RAID that I had to clean prior making the partitions. The thing is that anaconda is often unable to clean existing RAID stuff and doing manual tasks from console is the only way how to make a free space.

Please, let me know whether this info is sufficient.

Thanks,
Jaromir.

Comment 15 mulhern 2015-06-22 17:58:58 UTC

The call to mdraid.mdexamine has moved to handleUdevMDMemberFormat().

If mdexamine fails then we have no information about the array device of which this device is a member. Also, we probably should not count it as an array member even if udev says so, if it has no superblock. So, cleanup around this problem could be awfully tricky.

Reassigning...

Comment 16 David Lehman 2015-06-23 13:59:40 UTC

In general, if you make a mess of your system by improperly removing old metadata, you will have problems. Based on comment 14 I think this is a problem of poor system administration or just one of the hazards of doing QA on the installer. There are too many real problems for my team to be spending time chasing self-inflicted ones like this.

Comment 17 Jaromír Cápík 2015-06-23 19:41:39 UTC

Hello David.

What do you mean with "improperly removing old metadata"? Is mdadm --zero-superblock proper or improper? I always do it when removing raid partitions manually. I'm not doing QA, I just need to install Fedora often and this is what I experienced. If the installer allows people to rescan, it should be able to handle all situations or at least explain what's wrong with the drive instead of crashing. Especially when you need to use a hardware that was previously used by a different user and the harddrives can contain any kind of unspecified mess. I'm really not trying to test all corner scenarios. These are all common use-cases.

Regards,
Jaromir.

Comment 18 David Lehman 2015-06-24 13:11:48 UTC

(In reply to Jaromír Cápík from comment #17)
> Hello David.
> 
> What do you mean with "improperly removing old metadata"? Is mdadm
> --zero-superblock proper or improper? I always do it when removing raid
> partitions manually. I'm not doing QA, I just need to install Fedora often

That followed by 'wipefs -a' would be much better and would have avoided this bug but I see, after realizing that you removed the superblock, that this is actually more of a straightforward bug than I thought. We should not be crashing if mdadm returns a non-zero exit code when examining an md member. We should simply move on.

> and this is what I experienced. If the installer allows people to rescan, it
> should be able to handle all situations

This is not possible.

> or at least explain what's wrong
> with the drive instead of crashing. Especially when you need to use a

It is not possible to identify what is wrong with the drive in many cases. The problem could lie on the disk itself, in udev, in blkid, or in the kernel, among other places.

> hardware that was previously used by a different user and the harddrives can
> contain any kind of unspecified mess. I'm really not trying to test all
> corner scenarios. These are all common use-cases.

It is common for people to try to clear disks that contain partitions, lvm, and even more by simply running 'dd if=/dev/zero of=<disk> bs=1M count=1', which leaves metadata all over the disk and causes myriad problems down the road. Common does not equal acceptable.

> 
> Regards,
> Jaromir.

Comment 19 Jaromír Cápík 2015-06-24 16:59:38 UTC

(In reply to David Lehman from comment #18)
> (In reply to Jaromír Cápík from comment #17)
> > Hello David.
> > 
> > What do you mean with "improperly removing old metadata"? Is mdadm
> > --zero-superblock proper or improper? I always do it when removing raid
> > partitions manually. I'm not doing QA, I just need to install Fedora often
> 
> That followed by 'wipefs -a' would be much better and would have avoided
> this bug but I see, after realizing that you removed the superblock, that
> this is actually more of a straightforward bug than I thought. We should not
> be crashing if mdadm returns a non-zero exit code when examining an md
> member. We should simply move on.
> 
> > and this is what I experienced. If the installer allows people to rescan, it
> > should be able to handle all situations
> 
> This is not possible.

Right :]

 
> > or at least explain what's wrong
> > with the drive instead of crashing. Especially when you need to use a
> 
> It is not possible to identify what is wrong with the drive in many cases.
> The problem could lie on the disk itself, in udev, in blkid, or in the
> kernel, among other places.

The users probably do not care where the problem lies. I admit the complexity is high here, but there must be a way out of this. In the worst case when everything fails, the installer should at least say "Sorry, there's something rotten/unexpected that we're unable to handle" and offer users to decompose all the currently active devices, then clean the drives completely and start from scratch instead of crashing and forcing the user to reboot and repeat the whole procedure again. That is quite hostile and discouraging for all sorts of users, doesn't matter whether they're experienced or not.


> > hardware that was previously used by a different user and the harddrives can
> > contain any kind of unspecified mess. I'm really not trying to test all
> > corner scenarios. These are all common use-cases.
> 
> It is common for people to try to clear disks that contain partitions, lvm,
> and even more by simply running 'dd if=/dev/zero of=<disk> bs=1M count=1',
> which leaves metadata all over the disk and causes myriad problems down the
> road. Common does not equal acceptable.

Well, like stated above, users often take the hardware over from someone and with undefined content. They should not be punished for mistakes of the previous user. I still believe we need to be able to handle a mess on the drives more gracefully.

Jaromir.

Comment 20 David Lehman 2015-06-30 16:18:49 UTC

I tend to think that if someone gives you a hard drive it is reasonable to expect you to sanitize its contents before trying to use it. That is not anaconda's job.

Comment 21 Jaromír Cápík 2015-07-02 17:43:15 UTC

(In reply to David Lehman from comment #20)
> I tend to think that if someone gives you a hard drive it is reasonable to
> expect you to sanitize its contents before trying to use it.

Why just harddrive? People often get the whole computer and you then need some software for cleaning the content. Why can't we do that for the user when it means much lower overhead in thinks he/she needs to do? Disc cleaning is offered by almost all Linux distributions and I fully understand and support the reasons why it's there.


> That is not anaconda's job.

Why not? If it's only about cleaning the drive, then it's a simple task. Why would we tell users to search for another software for cleaning the drive when we can do that for the user at once. The opposite would make the whole process a bit cumbersome. the user would have to download a different ISO file with Parted Magic ??? And that isn't available for all architectures. You can argue, that the user can switch to the console and clean it by hand. But that isn't very comfortable and easy way for non-skilled users.

Comment 22 Jaromír Cápík 2015-07-02 17:44:29 UTC

s|thinks|things

Comment 23 David Lehman 2015-07-10 21:03:48 UTC

A few examples off the top of my head illustrating that the problem is not as simple as you suggest:

1. something's really wrong here. do you want me to clear this whole disk or just the partition(s) we think are problematic? (Problem here is that you cannot meaningfully clear the entire disk unless you can successfully recognize and assemble the full stack(s) of devices on it. How do you resolve this?)
2. The problem could be thing A or thing B. Which is it?
3. I see that this device might be formatted as a thing X or a thing Y. Which one should I pay attention to and which should I clear off? Or should I just leave them both for the next installer for have problems with as well?

These are all situations that arise, generally from poor administration. The funny thing is that people don't properly administer their systems because they feel sure that the OS installer will clean up their messes for them. The only other possible explanation is sheer laziness (or carelessness, for the defensive).

Comment 25 David Lehman 2016-05-27 17:34:57 UTC

https://github.com/rhinstaller/blivet/pull/430

Comment 33 errata-xmlrpc 2016-11-03 23:49:33 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2168.html

Note You need to log in before you can comment on or make changes to this bug.