Bug 737076 - Use after free issue in raid1/raid10 driver
Summary: Use after free issue in raid1/raid10 driver
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 16
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: F16-accepted, F16FinalFreezeExcept
TreeView+ depends on / blocked
 
Reported: 2011-09-09 14:17 UTC by Bruno Wolff III
Modified: 2011-09-15 15:15 UTC (History)
6 users (show)

Fixed In Version: kernel-3.1.0-0.rc6.git0.0.fc16
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-09-15 15:15:29 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
raid 1 patch (926 bytes, patch)
2011-09-09 16:05 UTC, Bruno Wolff III
no flags Details | Diff

Description Bruno Wolff III 2011-09-09 14:17:52 UTC
Description of problem:
I've been working upstream on this and have been testing a fix. But since the fix might not be in the upstream kernel before beta freeze, I want to submit a Fedora bug to request Nice to Have status.

The upstream bug is:
https://bugzilla.kernel.org/show_bug.cgi?id=41862

Unfortunately bugzilla.kernel.org is unavailable due to the kernel.org compromise, right now.

Neil Brown has provided a patch in that bug report that I have tested and it appears fixes the issue. Neil also mentioned that raid 10 has a similar problem and that he will submit fixes for both.

The effect of the bug is kernel crashes after something on the order of hours after booting a system using raid 1. My systems never lasted more than a day with 3.1 kernels before the fix.

The fix did not make it into rc5, but I am hoping it might make it in to rc6.

The last status is that I emailed Neil out of band (since I can't update the bug right now) that the fix seems to resolve the problem as I have had machines running for 3+ days now without a crash.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Josh Boyer 2011-09-09 14:29:12 UTC
(In reply to comment #0)
> Description of problem:
> I've been working upstream on this and have been testing a fix. But since the
> fix might not be in the upstream kernel before beta freeze, I want to submit a
> Fedora bug to request Nice to Have status.
> 
> The upstream bug is:
> https://bugzilla.kernel.org/show_bug.cgi?id=41862
> 
> Unfortunately bugzilla.kernel.org is unavailable due to the kernel.org
> compromise, right now.
> 
> Neil Brown has provided a patch in that bug report that I have tested and it
> appears fixes the issue. Neil also mentioned that raid 10 has a similar problem
> and that he will submit fixes for both.

Has it been sent to lkml or some other list?  As you said, we can't get to the bug so we can't get to the patch so we can't even attempt to include it.

kernel.org being down is really slowing down upstream releases too, so I'm not sure when rc6 will happen and even what will be in it.  We basically have until roughly Sept 12 or 13 to get whatever fixes we want built and into updates stable for it to be included in Beta.

Comment 2 Bruno Wolff III 2011-09-09 16:03:27 UTC
The patch has not been posted to lkml and was not in Linus' github repo as of last night. It also don't think it made it to linux-next which Neil was going to try to do. But linux-next has also been adversely affected by the kernel.org issues.
I have a copy of only the raid 1 patch, not the analagous patch for raid 10. The upstream bug also does not include the raid 10 patch.

Comment 3 Bruno Wolff III 2011-09-09 16:05:50 UTC
Created attachment 522359 [details]
raid 1 patch

I am attaching the patch I am running. I don't know if this is going to be the final version that Neil will submit, but it seems to have fixed my issue. Again this only fixes raid 1, not raid 10.

Comment 4 Bruno Wolff III 2011-09-10 12:06:33 UTC
Neil sent a pull request that includes the fix for this:
http://lkml.org/lkml/2011/9/10/16
This is the specific commit, though there are some others also in the pull request:
http://neil.brown.name/git?p=md;a=commit;h=079fa166a2874985ae58b2e21e26e1cbc91127d4

Comment 5 Josh Boyer 2011-09-10 13:19:39 UTC
(In reply to comment #4)
> Neil sent a pull request that includes the fix for this:
> http://lkml.org/lkml/2011/9/10/16
> This is the specific commit, though there are some others also in the pull
> request:
> http://neil.brown.name/git?p=md;a=commit;h=079fa166a2874985ae58b2e21e26e1cbc91127d4

Yep, I saw that this morning.  Linus is pulling in a number of things so hopefully that gets into rc6

Comment 6 Bruno Wolff III 2011-09-11 12:43:16 UTC
Linus has pulled this, so it should end up in rc6.

Comment 7 Bruno Wolff III 2011-09-12 21:47:50 UTC
rc6 has been tagged. Hopefully there is still time to use an rc6 based kernel for beta.

Comment 8 Josh Boyer 2011-09-12 22:36:22 UTC
(In reply to comment #7)
> rc6 has been tagged. Hopefully there is still time to use an rc6 based kernel
> for beta.

Please test this build when it completes.  It also has the debugging options disabled:

http://koji.fedoraproject.org/koji/taskinfo?taskID=3346292

Comment 9 Bruno Wolff III 2011-09-13 03:30:19 UTC
I am now running 3.1.0-0.rc6.git0.0.fc17.i686.PAE and 3.1.0-0.rc6.git0.0.fc16.i686.PAE and didn't see any obvious kernel related regressions after running them for a few minutes.
If both systems are still up tomorrow morning, then likely the raid issue is fixed, but its probably best to wait two days before closing the bug (assuming no raid related crashes).

Comment 10 Fedora Update System 2011-09-13 10:19:54 UTC
kernel-3.1.0-0.rc6.git0.0.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/kernel-3.1.0-0.rc6.git0.0.fc16

Comment 11 Bruno Wolff III 2011-09-13 11:42:17 UTC
Both systems made it through the night without crashing, so most likely this bug is fixed, though I'd like to see them make it another day before closing this.

Comment 12 Bruno Wolff III 2011-09-14 20:32:00 UTC
Things are still looking good, so I think we can be confident this issue is fixed and the bug can be closed when the update is in f16 proper.

Comment 13 Fedora Update System 2011-09-15 15:15:04 UTC
kernel-3.1.0-0.rc6.git0.0.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.