Red Hat Bugzilla – Bug 591949
Segmentation fault during imsm container creation
Last modified: 2010-11-15 09:32:57 EST
Description of problem:
When trying to create imsm container for all raids (5 10 1) segmentation fault appear.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
mdadm --zero-superblock /dev/sdb
mdadm --zero-superblock /dev/sdc
mdadm -C /dev/md/imsm0 -e imsm -n 2 /dev/sdb /dev/sdc -R
Segmentation fault here appears!!!
Container creation failure.
Container creation succeeds.
My short investigation result is that patch "mdadm-3.1.2-container.patch" breaks container creation.
Created attachment 414508 [details]
Fix: Segmentation fault during imsm container creation
Please review this patch and check if it does not break your intention in mdadm-3.1.2-container.patch.
If it is ok, please include it in next snapshot of RHEL 6.0
It most certainly does break the intent of the patch. I'll have to look for a different solution, but the whole point of the patch is to be able to identify how many disks belong to a container whether looking at the container or a subarray. Currently, raid_disks is set to 0 for containers no matter how many disks are actually needed to start cleanly. My patch used the first volume's info to set the container's info. It was based upon the understanding that all members of a container must be members of the subarrays, so the raid_disks should always be the same. However, if the subarray hasn't been created yet, then I can see us returning invalid information. These seems wrong to me. Why don't we always simply set raid_disks to the number of disks in the container at the container level? Then my patch could be dropped and we could just use raid_disks of the container directly.
Created attachment 414794 [details]
Number of disks in a container - patch proposal
Look at my patch proposal.
Maybe it will be enough to set correct raid_disks value?
(In reply to comment #4)
> Created an attachment (id=414794) [details]
> Number of disks in a container - patch proposal
> Look at my patch proposal.
> Maybe it will be enough to set correct raid_disks value?
No, that doesn't look right because that list can have extra/stale disks...
(In reply to comment #3)
> don't we always simply set raid_disks to the number of disks in the container
> at the container level? Then my patch could be dropped and we could just use
> raid_disks of the container directly.
This is difficult with external metadata because we can't trust the metadata until container activation. With native metadata we have the kernel to disambiguate the metadata after each disk is added and a record in the metadata of how many spare disks to expect.
With external metadata we don't get a consistent view of the metadata until the container is activated, and with imsm in particular we never know how many spares might be out there until we count them.
raid_disks needs to be zero for the container-level info because there is no reliable correlation with the number of disks in an inactive container with the number of disks needed to start the array.
There are two cases to handle:
In this case we want all present devices (members, stale devices, spares) to be included in the container. The present behaviour.
In this case we don't care if we miss some future spares as we expect them to be hot added later, we just don't want to activate too early. What I think we need in this case is more smarts in Incremental_container() to get a count of valid active devices from the current container contents. The goal being to have the code do the right thing when we have enough self-consistent drives.
Am I correct in assuming that this is the end goal you are seeking? E.g. to start a 4-disk raid5 array once we have 4 disks that all agree that they are active members, and not wait until after the udev-settle event to start the container?
One more case, mdadm -IRs. In making mdadm -I behave like it does for native arrays (aka, you no longer need the --nodegraded flag), it meant we should only start the array if all devices are present. So, mdadm -I would wait for 5 out of 5 (unless the array was already degraded on last shutdown in which case the same 4 out of 5 that it had on last shutdown would be correct). Then we would start the array in degraded mode even if not all previously available drives are present when we get the call for mdadm -IRs.
The mdadm's package has not been changed from Snap 2.
It contains critical defect for imsm RAID described in this record.
We can not run any MD RAID regression tests in RHEL6.0.
Could you remove affected patch from mdadm code and push new package to the next snapshot?
Yes, there will be an updated mdadm in the next snap build.
Your patch idea would work for container creation because all drives are present during creation (normally), and would work for assembly for the same reason, but totally fails during incremental assembly as it merely counts the number of drives present and not the number of drives needed.
The solution I used turned out to be merely to check super->anchor->num_raid_devices and only call get_imsm_super_info() on a volume if num_raid_devices is > 0, that way we have a volume to get info from. This works fine because during container creation we don't really care about the number of raid disks like we do during incremental assembly, and if we ever tried to assemble a container that has no valid defined arrays then things don't work, but they aren't really expected to either as the drives won't actually be listed as belonging to an array when they aren't used yet.
My testing method was as follows:
mdadm -C /dev/md/imsm0 -n2 -e imsm /dev/sd[cd]
mdadm -S /dev/md/imsm0
mdadm -I /dev/sdc (fails on my system because it is technically considered a spare drive by super-intel.c and as such it is given the spare drive uuid and that matches multiple arrays on my machine, this is considered correct behaviour and wouldn't be a problem if I had created a subarray in the container before I stopped the array)
mdadm -C /dev/md/imsm0 -n2 -e imsm /dev/sd[cd]
mdadm -C /dev/md/raid1 -n2 -l1 /dev/md/imsm0 (actually, /dev/md125 in my case as we don't create the /dev/md/imsm0 link on /dev/md/imsm0 creation like we should, still investigating that, but it's a minor issue and only effects creation of imsm containers and not incremental assembly of imsm containers)
mdadm -S /dev/md/raid1
mdadm -S /dev/md/imsm0
mdadm -I /dev/sdc (here we get notification that 1 drive is added, but the array is not started)
mdadm -I /dev/sdd (here we get notification that we now have 2 drives and that the array is started and that the subarray is started)
(In reply to comment #14)
> mdadm -C /dev/md/imsm0 -n2 -e imsm /dev/sd[cd]
> mdadm -C /dev/md/raid1 -n2 -l1 /dev/md/imsm0 (actually, /dev/md125 in my case
> as we don't create the /dev/md/imsm0 link on /dev/md/imsm0 creation like we
> should, still investigating that, but it's a minor issue and only effects
> creation of imsm containers and not incremental assembly of imsm containers)
> mdadm -S /dev/md/raid1
> mdadm -S /dev/md/imsm0
> mdadm -I /dev/sdc (here we get notification that 1 drive is added, but the
> array is not started)
> mdadm -I /dev/sdd (here we get notification that we now have 2 drives and that
> the array is started and that the subarray is started)
...but this still seems to miss the stale disk case where sdd has knowledge that sdc is stale. I am putting the finishing touches on a patch that revalidates disks in the container as they are added which should handle this case as well.
(In reply to comment #15)
> (In reply to comment #14)
> > mdadm -C /dev/md/imsm0 -n2 -e imsm /dev/sd[cd]
> > mdadm -C /dev/md/raid1 -n2 -l1 /dev/md/imsm0 (actually, /dev/md125 in my case
> > as we don't create the /dev/md/imsm0 link on /dev/md/imsm0 creation like we
> > should, still investigating that, but it's a minor issue and only effects
> > creation of imsm containers and not incremental assembly of imsm containers)
> > mdadm -S /dev/md/raid1
> > mdadm -S /dev/md/imsm0
> > mdadm -I /dev/sdc (here we get notification that 1 drive is added, but the
> > array is not started)
> > mdadm -I /dev/sdd (here we get notification that we now have 2 drives and that
> > the array is started and that the subarray is started)
> ...but this still seems to miss the stale disk case where sdd has knowledge
> that sdc is stale. I am putting the finishing touches on a patch that
> revalidates disks in the container as they are added which should handle this
> case as well.
Yes and no. Yes, it misses that sdc is stale, but at the same time it doesn't count only active disks it counts all disks. Right now the test in Incremental.c is basically if (num_disks >= raid_disks || (runstop > 0 && num_disks >= (raid_disks - max_degraded))) where num_disks is our current disk count and it doesn't matter if the disk is stale or a spare or whatever. This is obviously a trickable test, but it's what's there now.
As far as Incremental.c is concerned, I'm getting raid_disks from the subarray volume info, not num_disks. What you seem to be referring to is changing num_disks in the case that one or more disks that we are currently counting is stale. That's completely different from what I'm doing (and it's a perfectly valid thing to do, I'm just saying it's different from what this patch is doing).
Anyway, this patch is in for Beta2, we can integrate your patch for its improvements when it's ready. We will need a new bug to track your patch though.
Patch is included & applied.
1 bug/problem found
special hw not aviable at the time
Can you verify that this is fixed. If it is fixed, please mark the "Verified" field with the "Intel" value.
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.