Red Hat Bugzilla – Bug 737076
Use after free issue in raid1/raid10 driver
Last modified: 2011-09-15 11:15:29 EDT
Description of problem:
I've been working upstream on this and have been testing a fix. But since the fix might not be in the upstream kernel before beta freeze, I want to submit a Fedora bug to request Nice to Have status.
The upstream bug is:
Unfortunately bugzilla.kernel.org is unavailable due to the kernel.org compromise, right now.
Neil Brown has provided a patch in that bug report that I have tested and it appears fixes the issue. Neil also mentioned that raid 10 has a similar problem and that he will submit fixes for both.
The effect of the bug is kernel crashes after something on the order of hours after booting a system using raid 1. My systems never lasted more than a day with 3.1 kernels before the fix.
The fix did not make it into rc5, but I am hoping it might make it in to rc6.
The last status is that I emailed Neil out of band (since I can't update the bug right now) that the fix seems to resolve the problem as I have had machines running for 3+ days now without a crash.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
(In reply to comment #0)
> Description of problem:
> I've been working upstream on this and have been testing a fix. But since the
> fix might not be in the upstream kernel before beta freeze, I want to submit a
> Fedora bug to request Nice to Have status.
> The upstream bug is:
> Unfortunately bugzilla.kernel.org is unavailable due to the kernel.org
> compromise, right now.
> Neil Brown has provided a patch in that bug report that I have tested and it
> appears fixes the issue. Neil also mentioned that raid 10 has a similar problem
> and that he will submit fixes for both.
Has it been sent to lkml or some other list? As you said, we can't get to the bug so we can't get to the patch so we can't even attempt to include it.
kernel.org being down is really slowing down upstream releases too, so I'm not sure when rc6 will happen and even what will be in it. We basically have until roughly Sept 12 or 13 to get whatever fixes we want built and into updates stable for it to be included in Beta.
The patch has not been posted to lkml and was not in Linus' github repo as of last night. It also don't think it made it to linux-next which Neil was going to try to do. But linux-next has also been adversely affected by the kernel.org issues.
I have a copy of only the raid 1 patch, not the analagous patch for raid 10. The upstream bug also does not include the raid 10 patch.
Created attachment 522359 [details]
raid 1 patch
I am attaching the patch I am running. I don't know if this is going to be the final version that Neil will submit, but it seems to have fixed my issue. Again this only fixes raid 1, not raid 10.
Neil sent a pull request that includes the fix for this:
This is the specific commit, though there are some others also in the pull request:
(In reply to comment #4)
> Neil sent a pull request that includes the fix for this:
> This is the specific commit, though there are some others also in the pull
Yep, I saw that this morning. Linus is pulling in a number of things so hopefully that gets into rc6
Linus has pulled this, so it should end up in rc6.
rc6 has been tagged. Hopefully there is still time to use an rc6 based kernel for beta.
(In reply to comment #7)
> rc6 has been tagged. Hopefully there is still time to use an rc6 based kernel
> for beta.
Please test this build when it completes. It also has the debugging options disabled:
I am now running 3.1.0-0.rc6.git0.0.fc17.i686.PAE and 3.1.0-0.rc6.git0.0.fc16.i686.PAE and didn't see any obvious kernel related regressions after running them for a few minutes.
If both systems are still up tomorrow morning, then likely the raid issue is fixed, but its probably best to wait two days before closing the bug (assuming no raid related crashes).
kernel-3.1.0-0.rc6.git0.0.fc16 has been submitted as an update for Fedora 16.
Both systems made it through the night without crashing, so most likely this bug is fixed, though I'd like to see them make it another day before closing this.
Things are still looking good, so I think we can be confident this issue is fixed and the bug can be closed when the update is in f16 proper.
kernel-3.1.0-0.rc6.git0.0.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report.