Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1496836 - [RH 7.5 bug] Request for upstream commit 3664847d95e6 to be merged into RHEL 7.5/7.4
[RH 7.5 bug] Request for upstream commit 3664847d95e6 to be merged into RHEL ...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel (Show other bugs)
7.4
All Linux
urgent Severity urgent
: rc
: ---
Assigned To: Nigel Croxon
guazhang@redhat.com
: TestOnly, ZStream
Depends On:
Blocks: 1442258 1535883
  Show dependency treegraph
 
Reported: 2017-09-28 10:38 EDT by John Pittman
Modified: 2018-04-10 18:16 EDT (History)
15 users (show)

See Also:
Fixed In Version: kernel-3.10.0-820.el7
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1535883 (view as bug list)
Environment:
Last Closed: 2018-04-10 18:14:43 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:1062 None None None 2018-04-10 18:16 EDT

  None (edit)
Description John Pittman 2017-09-28 10:38:53 EDT
Description of problem:

Request for upstream commit 3664847d95e6 to be merged into RHEL 7.5/7.4.  This patch was recently added to latest -rc by Linus under commit 12fcf66e74b1.

md/raid5: fix a race condition in stripe batch
https://github.com/torvalds/linux/commit/3664847d95e60a9a943858b7800f8484669740fc

There have been reports of RHEL 7 systems crashing due to this bug in large environments when using Lustre.  Testing with the mentioned patch indicates that it fixes the issue.

Version-Release number of selected component (if applicable):

kernel-3.10.0-693.2.2.el7

Actual results:

System crash

Expected results:

No crash
Comment 3 Nigel Croxon 2017-10-06 12:34:47 EDT
A test kernel is available to verify the fix:

http://people.redhat.com/ncroxon/rhel7/.rhel75_Oct6/kernel-3.10.0-726.el7.test.x86_64.rpm 

Please test as soon as possible and give feedback on your results.

-Nigel
Comment 4 Stephane Thiell 2017-10-06 13:19:35 EDT
Hi Nigel,

While others might be able to test, we had to put our new hardware into production very recently, so while I was able to test last week, we're not in a position of testing anymore. :/
We have servers connected to multiple JBODs and using mdraid. I applied the patch to 7.4 myself and it's working fine now. We're using the patch for more than a month now, maybe even 6 weeks. We had about 1 or 2 kernel panics per week without the patch (in 7.3 and 7.4). The tiny patch fixes a race condition that is well explained by the MD kernel maintainer. I hope you'll still be considering it for RH 7.4/7.5. Sorry of not behind more helpful right now but I prefer to be honest with you. :)

Thanks for your help!

Stephane
Comment 6 Nigel Croxon 2017-10-06 14:51:07 EDT
I will be including commit id 3664847d95e60a9a943858b7800f8484669740fc into RHEL7.5.

Are you ok with closing this issue?

-Nigel
Comment 7 Stephane Thiell 2017-10-06 16:42:51 EDT
Great news, thanks Nigel!

Is there any plan to integrate this patch into a RHEL7.4 kernel update?

Thanks,

Stephane
Comment 8 Nigel Croxon 2017-10-09 09:29:58 EDT
Hello Stephane,

Yes, we will back port this to RHEL7.4 kernel update.

-Nigel
Comment 9 Stephane Thiell 2017-10-09 10:53:01 EDT
Thanks much. And yes, I'm ok with closing this issue now.

Best,

Stephane
Comment 18 Nigel Croxon 2018-01-09 15:03:54 EST
This was fixed in RHEL7.5 with the following commit ID by me.
commit 4f386c203da33195fa5ac0379aebec7ab1948e2e
Author: Nigel Croxon <ncroxon@redhat.com>
Date:   Thu Nov 2 19:14:22 2017 -0400

-Nigel
Comment 23 guazhang@redhat.com 2018-01-19 01:45:00 EST
Hello

Did code review and verified patches applied correctly on kernel-3.10.0-820.el7. so will moved to "VERIFIED"
Comment 24 errata-xmlrpc 2018-04-10 18:14:43 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1062

Note You need to log in before you can comment on or make changes to this bug.