Red Hat Bugzilla – Bug 629719
random failures when notifying kernel of changes to partitioned md (isw) device's partition table
Last modified: 2010-09-22 00:09:16 EDT
The following was filed automatically by anaconda:
anaconda 14.15 exception report
Traceback (most recent call first):
File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/formats/__init__.py", line 266, in create
raise FormatCreateError("invalid device specification", self.device)
File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/formats/fs.py", line 849, in create
DeviceFormat.create(self, *args, **kwargs)
File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/deviceaction.py", line 290, in execute
File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/devicetree.py", line 700, in processActions
File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/__init__.py", line 313, in doIt
File "/usr/lib64/python2.7/site-packages/pyanaconda/packages.py", line 109, in turnOnFilesystems
File "/usr/lib64/python2.7/site-packages/pyanaconda/dispatch.py", line 212, in moveStep
rc = stepFunc(self.anaconda)
File "/usr/lib64/python2.7/site-packages/pyanaconda/dispatch.py", line 131, in gotoNext
File "/usr/lib64/python2.7/site-packages/pyanaconda/gui.py", line 1174, in nextClicked
FormatCreateError: ('invalid device specification', '/dev/md127p3')
Created an attachment (id=442710)
Attached traceback automatically from anaconda.
I'm not sure I can reproduce this. I just tried and noticed /dev/md127p3 was of type 'unknown'. So I'm not sure if it was left like that by the previous installation (which generated the traceback above) or the installation before that (which ended in a traceback caused with the iscsi target).
I now deleted all linux partitions and will see whether I can reproduce this with no unknown-type partitions :)
*** Bug 629730 has been marked as a duplicate of this bug. ***
I first tried deleting the unknown-type partition within anaconda with fdisk and then just continuing - this led to the same error and I was not sure whether the fdisk-changed were being reflected.
So I rebooted and gave it another try - and the unknown-type partition was back (I figure this bug causes that type of partitions). I went ahead and deleted the unknown-type partition in the anaconda partitioning tool - but the same happened anyway, see the duplicate bug.
By the way, this is Fedora 14 Alpha RC4 x86_64 with 4 SATA-II disks in a RAID 10 :)
Adding Fedora 14 Beta Blocker according to https://fedoraproject.org/wiki/QA:Testcase_Install_to_BIOS_RAID which has Beta release level on https://fedoraproject.org/wiki/Test_Results:Fedora_14_Alpha_RC4_Install#General_Tests
I encountered this with F14-Alpha x86_64 on 2 RAID 1 disks. The partitioning also failed on /md127p3.
Partition Mount FS
md127p1 /boot ext2
md127p2 -- swap
md127p3 / ext4
md127p4 -- extended
md127p5 /home ext4
Discussed at today's blocker review meeting. Accepted as a Beta blocker under the criterion "The installer must be able to create and install to software, hardware or BIOS RAID-0, RAID-1 or RAID-5 partitions for anything except /boot".
What are the plans for addressing and assessing this bug? We'd love to have more information before the next blocker meeting this Friday.
(In reply to comment #7)
> I encountered this with F14-Alpha x86_64 on 2 RAID 1 disks. The partitioning
> also failed on /md127p3.
Ben, can you please attach the exception report to this bug so we can use that information to isolate the problem?
Unfortunately, it's my work machine and I had to get it installed. I didn't see a way to save the results from Anaconda before rebooting (well, I suppose a USB stick would have worked, but hind sight is 20-20) and it's not really up for installing again soon.
I ended up dropping to a TTY while Anaconda asked the earlier questions and using fdisk and mkfs to get the partitioning right. That went without a hitch. I don't have any other RAID instances. Maybe a call out to the devel list will get someone. I can get the specs (harddrive and motherboard) when I get in tomorrow.
Discussed at 2010-09-10 blocker review meeting. No essential change from last week, but given Ben's inability to test further, we will have to close this as INSUFFICIENT_DATA if we cannot reproduce it in TC1 testing.
Is the information attached by Sandro Mathys not sufficient?
I just got it to go through without a crash (other than bug #632799). I imagine anaconda doesn't like the partition followed by a format.
Created attachment 446642 [details]
anaconda traceback (F14 Beta TC1)
Reproduced with F14 Beta TC1 (x86_64 install DVD)
(In reply to comment #4)
> I first tried deleting the unknown-type partition within anaconda with fdisk
> and then just continuing - this led to the same error and I was not sure
> whether the fdisk-changed were being reflected.
Please do not modify the partition table outside of anaconda without forcing anaconda to reset its storage objects. It will not help you.
(In reply to comment #15)
> Reproduced with F14 Beta TC1 (x86_64 install DVD)
Please describe in detail what you did: all partitioning-related operations and choices made in anaconda, as well as any activities carried out on the shell on tty2.
> (In reply to comment #15)
> > Reproduced with F14 Beta TC1 (x86_64 install DVD)
> Please describe in detail what you did: all partitioning-related operations and
> choices made in anaconda, as well as any activities carried out on the shell on
I really didn't do anything fancy. This time I kept it as simple as possible, i.e. didn't really change anything nor use tty2. I used a Swiss German keyboard layout, used the BIOS RAID and no single HDD/SDD, set Europe/Zurich (non-UTC) timezone, chose to review the partitioning layout but didn't change anything and failed.
So, the partitioning layout looked like this:
- md127p1: ntfs (windows hidden system whatever)
- md127p2: ntfs (windows 7 C:\)
- md127p3: ext4 (/boot)
- md127p4: extended
- md127p5: LVM
sandro, when you posted comment #17, are you talking about the attempt logged in comment #13, or did you do a new attempt but not post the logs from that attempt yet?
(In reply to comment #18)
> sandro, when you posted comment #17, are you talking about the attempt logged
> in comment #13, or did you do a new attempt but not post the logs from that
> attempt yet?
comment #17 refers to comment #15, sorry for the confusion.
I found a test machine with the same raid controller, although mine will only do basic striping or mirroring. I ran the f14-alpha installer on it, starting with an uninitialized (unpartitioned/unformatted) mirror array. I specified /boot, /, swap, and /home partitions on the mirror and hit the same error when trying to create the ext4 filesystem on md127p2 for /.
The problem appears to be somewhere below anaconda: pyparted, parted, or the kernel (md). Since this is only happening on these partitioned md devices, my money is on the kernel/md.
Immediately after doing the parted commit to add the second partition, we try to zero out the beginning and end of the new partition to wipe any residual metadata -- this fails with the error -ENOENT. If I switch to the shell on tty2, /dev and /proc/partitions agree that md127 has one partition, while parted shows it having two partitions. The system's still up, so feel free to ask for more data.
Will upload anaconda traceback shortly...
Created attachment 447585 [details]
traceback from reproducer using different case
We are seeing seemingly random failures when parted is trying to notify the kernel of changes to the md device's partition table. This is with intel isw fwraid via md, which is the only fwraid type anaconda uses md for. The others use dmraid.
The failures are also reproducible using the parted utility from the shell -- no errors are displayed but, after exiting, the kernel shows fewer partitions than are actually present.
We are assembling the array using mdadm -I on each of the member disks in turn, in case that matters.
Given that this is a random kernel failure (as I guessed! yay me!), how practical is it for us to say 'just try again' is a workaround? Does anyone who's hit this have a feel for how likely you are to get a success in a couple of tries? Is there any other workaround?
Fedora Bugzappers volunteer triage team
So far I tried to install F14 Alpha/Beta at least 4 times and the issue always stopped me, i.e. trying again doesn't seem to be a valid workaround.
I tried this test case on F-14-Beta-TC1. When I created the RAID set the first time, the install failed. I tried it again and it passed the second time.
Created attachment 447753 [details]
Attaching udevadm info per request from dlehman. This is from a system with a "nVidia Corporation CK804 Serial ATA Controller"
# udevadm info --query=all --name=dm-1 > /tmp/dm-1.udevdb
(In reply to comment #23)
> of tries? Is there any other workaround?
This is only with intel isw bios/fw raid, so we should have the workaround of passing "noiswmd" on boot command line to force dmraid instead of md for these arrays, but we've turned up some bugs in dmraid and/or the device-mapper udev rules that prevent the workaround from being productive.
Created attachment 447757 [details]
(In reply to comment #26)
> Created attachment 447753 [details]
> Attaching udevadm info per request from dlehman. This is from a system with a
> "nVidia Corporation CK804 Serial ATA Controller"
> # udevadm info --query=all --name=dm-1 > /tmp/dm-1.udevdb
Oops, using correct device this time (dm-2)
(In reply to comment #22)
> The failures are also reproducible using the parted utility from the shell --
> no errors are displayed but, after exiting, the kernel shows fewer partitions
> than are actually present.
> We are assembling the array using mdadm -I on each of the member disks in turn,
> in case that matters.
can you give me a list of commands to try and reproduce this ? I'll ask the upstream md maintainer to take a look.
Some insight on this from my side (as the ex anaconda bios raid and the ex parted maintainer). F-14 contains parted-2.3, which has switched from using the partition table reread ioctl which fdisk and parted-2.1 (in F-13) uses, to using blkpg. blkpg allows parted to remove the kernels knowledge of partitions one partition at a time and to add new partitions on the fly even though some existing partitions are busy. Esp. these 2 commits are relevant:
It could very well be that blkpg does not play well together with
mdraid devices. I'll create a parted patch which will make parted use the reread partition table ioctl again for mdraid sets, which can then be used to test
Here is a scratch build which I think might fix this:
Dave Lehman, can you test this? Just drop pingthe libparted.so.0 file in an updates.img should do the trick (for anaconda, for testing from tty2 you need to set LD_LIBRARY_PATH).
I'll attach the 2 patches which I've added to the srpm from which this scratch build was done.
Created attachment 448010 [details]
PATCH 1/2 for parted which might fix / workaround this
Created attachment 448011 [details]
PATCH 2/2 for parted which might fix / workaround this
Thinking more about this the use of blkext majors rather then
the main disk major for partitions (which the kernel does for partitions > 15
on scsi disks and for any and all md partitions), is likely the cause of this
mdraid partition issues.
So we may have a similar issues with normal disks with > 15 partitions.
parted-2.3-2.fc14 has been submitted as an update for Fedora 14.
Ends up this is actually parted, see bug 634980
RC1 is now out and should include a fix for this:
Please test and confirm this. I've tested a case which we think is hitting the same issue - >15 partitions on a single disk - but we need to make sure this RAID issue is actually the same bug, and the fix works for it. Thanks!
Can confirm that this is fixed in Beta RC2 :)
Awesome, thanks for testing!
Also, I'm sure Dave is very sorry for accusing you of modifying things and denying it ;)
Fedora Bugzappers volunteer triage team
Moving to VERIFIED based on comment#38
parted-2.3-2.fc14 has been pushed to the Fedora 14 testing repository. If problems still persist, please make note of it in this bug report.
If you want to test the update, you can install it with
su -c 'yum --enablerepo=updates-testing update parted'. You can provide feedback for this update here: https://admin.fedoraproject.org/updates/parted-2.3-2.fc14
Moving back to VERIFIED, still based on comment#38
Sandro, can you +1 the update in Bodhi? Thanks!
parted-2.3-2.fc14 has been pushed to the Fedora 14 stable repository. If problems still persist, please make note of it in this bug report.