Bug 737076 - Use after free issue in raid1/raid10 driver
Use after free issue in raid1/raid10 driver
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
16
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
Depends On:
Blocks: F16-accepted/F16FinalFreezeExcept
  Show dependency treegraph
 
Reported: 2011-09-09 10:17 EDT by Bruno Wolff III
Modified: 2011-09-15 11:15 EDT (History)
6 users (show)

See Also:
Fixed In Version: kernel-3.1.0-0.rc6.git0.0.fc16
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-09-15 11:15:29 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
raid 1 patch (926 bytes, patch)
2011-09-09 12:05 EDT, Bruno Wolff III
no flags Details | Diff

  None (edit)
Description Bruno Wolff III 2011-09-09 10:17:52 EDT
Description of problem:
I've been working upstream on this and have been testing a fix. But since the fix might not be in the upstream kernel before beta freeze, I want to submit a Fedora bug to request Nice to Have status.

The upstream bug is:
https://bugzilla.kernel.org/show_bug.cgi?id=41862

Unfortunately bugzilla.kernel.org is unavailable due to the kernel.org compromise, right now.

Neil Brown has provided a patch in that bug report that I have tested and it appears fixes the issue. Neil also mentioned that raid 10 has a similar problem and that he will submit fixes for both.

The effect of the bug is kernel crashes after something on the order of hours after booting a system using raid 1. My systems never lasted more than a day with 3.1 kernels before the fix.

The fix did not make it into rc5, but I am hoping it might make it in to rc6.

The last status is that I emailed Neil out of band (since I can't update the bug right now) that the fix seems to resolve the problem as I have had machines running for 3+ days now without a crash.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 Josh Boyer 2011-09-09 10:29:12 EDT
(In reply to comment #0)
> Description of problem:
> I've been working upstream on this and have been testing a fix. But since the
> fix might not be in the upstream kernel before beta freeze, I want to submit a
> Fedora bug to request Nice to Have status.
> 
> The upstream bug is:
> https://bugzilla.kernel.org/show_bug.cgi?id=41862
> 
> Unfortunately bugzilla.kernel.org is unavailable due to the kernel.org
> compromise, right now.
> 
> Neil Brown has provided a patch in that bug report that I have tested and it
> appears fixes the issue. Neil also mentioned that raid 10 has a similar problem
> and that he will submit fixes for both.

Has it been sent to lkml or some other list?  As you said, we can't get to the bug so we can't get to the patch so we can't even attempt to include it.

kernel.org being down is really slowing down upstream releases too, so I'm not sure when rc6 will happen and even what will be in it.  We basically have until roughly Sept 12 or 13 to get whatever fixes we want built and into updates stable for it to be included in Beta.
Comment 2 Bruno Wolff III 2011-09-09 12:03:27 EDT
The patch has not been posted to lkml and was not in Linus' github repo as of last night. It also don't think it made it to linux-next which Neil was going to try to do. But linux-next has also been adversely affected by the kernel.org issues.
I have a copy of only the raid 1 patch, not the analagous patch for raid 10. The upstream bug also does not include the raid 10 patch.
Comment 3 Bruno Wolff III 2011-09-09 12:05:50 EDT
Created attachment 522359 [details]
raid 1 patch

I am attaching the patch I am running. I don't know if this is going to be the final version that Neil will submit, but it seems to have fixed my issue. Again this only fixes raid 1, not raid 10.
Comment 4 Bruno Wolff III 2011-09-10 08:06:33 EDT
Neil sent a pull request that includes the fix for this:
http://lkml.org/lkml/2011/9/10/16
This is the specific commit, though there are some others also in the pull request:
http://neil.brown.name/git?p=md;a=commit;h=079fa166a2874985ae58b2e21e26e1cbc91127d4
Comment 5 Josh Boyer 2011-09-10 09:19:39 EDT
(In reply to comment #4)
> Neil sent a pull request that includes the fix for this:
> http://lkml.org/lkml/2011/9/10/16
> This is the specific commit, though there are some others also in the pull
> request:
> http://neil.brown.name/git?p=md;a=commit;h=079fa166a2874985ae58b2e21e26e1cbc91127d4

Yep, I saw that this morning.  Linus is pulling in a number of things so hopefully that gets into rc6
Comment 6 Bruno Wolff III 2011-09-11 08:43:16 EDT
Linus has pulled this, so it should end up in rc6.
Comment 7 Bruno Wolff III 2011-09-12 17:47:50 EDT
rc6 has been tagged. Hopefully there is still time to use an rc6 based kernel for beta.
Comment 8 Josh Boyer 2011-09-12 18:36:22 EDT
(In reply to comment #7)
> rc6 has been tagged. Hopefully there is still time to use an rc6 based kernel
> for beta.

Please test this build when it completes.  It also has the debugging options disabled:

http://koji.fedoraproject.org/koji/taskinfo?taskID=3346292
Comment 9 Bruno Wolff III 2011-09-12 23:30:19 EDT
I am now running 3.1.0-0.rc6.git0.0.fc17.i686.PAE and 3.1.0-0.rc6.git0.0.fc16.i686.PAE and didn't see any obvious kernel related regressions after running them for a few minutes.
If both systems are still up tomorrow morning, then likely the raid issue is fixed, but its probably best to wait two days before closing the bug (assuming no raid related crashes).
Comment 10 Fedora Update System 2011-09-13 06:19:54 EDT
kernel-3.1.0-0.rc6.git0.0.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/kernel-3.1.0-0.rc6.git0.0.fc16
Comment 11 Bruno Wolff III 2011-09-13 07:42:17 EDT
Both systems made it through the night without crashing, so most likely this bug is fixed, though I'd like to see them make it another day before closing this.
Comment 12 Bruno Wolff III 2011-09-14 16:32:00 EDT
Things are still looking good, so I think we can be confident this issue is fixed and the bug can be closed when the update is in f16 proper.
Comment 13 Fedora Update System 2011-09-15 11:15:04 EDT
kernel-3.1.0-0.rc6.git0.0.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.