Bug 147580

Summary: Race condition in md subsystem causes panic
Product: Red Hat Enterprise Linux 3 Reporter: James Olin Oden <james.oden>
Component: kernelAssignee: Doug Ledford <dledford>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 3.0CC: petrides, riel
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-05-18 13:29:16 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Description Flags
panic log at normal logging level (will have interspersed md output)
Panic log at loglevel 1
Test script that reads /proc/mdstat in loop
Test script that creates and destroys md devices in a loop
Add locking around displaying mddev info none

Description James Olin Oden 2005-02-09 15:34:57 UTC
Description of problem:
There is a race condition in the md subsystem where if members and md 
devices are being removed while /proc/mdstat is being read a kernel 
panic can occur.  Ultimately, it boils down to lack of locking on the 
internal md data structures that are being scanned in order to 
produce the output of /proc/mdstat.  This bug has already been fixed 
in the 2.6 stream (actually it was fixed in 2.5), and I think it was 
backported to 2.4.22 kernel (Neil Brown <neilb@cse.unsw.edu.au> 
created the patches for both).

Here is an example of the oops and panic:

   Unable to handle kernel NULL pointer dereference at virtual 
address 000003d8
   Oops: 0000
   raid1 usbserial parport_pc lp parport autofs4 audit 
iptable_filter   ip_tables e1
   00 floppy sg scsi_mod microcode ke<y6bd>medv: m omud1s esdtevop 
   npmudt: u sunbb-uinhdci<    
hudcs1bc3,o1r>e                                i
   : exporCtPU_r: d e v( h3
   c13EI)                  d
    m  d: 0 0un60:b[in<f8da<hfd4ecc123>,]0>
   Nmdo:t  etaxipnortte_d
   evE(FhLAdcGS12:) 0     rd
   010286            0

   EIP is at raid1_status [raid1] 0x13 (2.4.21-27.0.1.ELsmp/i686)
   eax: f7bb0a80   ebx: f7e2c400   ecx: 00000006   edx: f6485980
   esi: 00000000   edi: 00000000   ebp: f6485980   esp: f586ff18
   ds: 0068   es: 0068   ss: 0068
   Process cat (pid: 6618, stackpage=f586f000)
   Stack: f7bb0a94 0001f500 c0185757 c3bb3108 f7e2c400 f7bb0a94 
f7bb0a94 0001f500
          c0218bd3 f6485980 f7bb0a80 0000fa80 00000000 00000000 
f6485980 f7bb0a80
          000000e4 c0185213 f6485980 f7bb0a80 f586ff74 f6485998 
00000000 00000002
   Call Trace:   [<c0185757>] seq_printf [kernel] 0x47 (0xf586ff20)
   [<c0218bd3>] md_seq_show [kernel] 0x153 (0xf586ff38)
   [<c0185213>] seq_read [kernel] 0x173 (0xf586ff5c)
   [<c0164<0867>r>a] isdy1s: _rmiearrdo [r kerrensyenl]c  0wxa9s 7n 
   y< f6>f9m4in)di
   h:e d[,m< crd0e_1sdtoa_rst6yinnecg7( )0n 9e>gxo]tt  t siymsse_i.gf
   alta t.6.4.  [ekxerintienl]g                                      
   x49 (0xf586ffa8)             0

   Code: 8b 87 d8 03 00 00 89 44 2<46<>4md>: 0 mcd2 8 sbt op8p7e dd.4
    0md3: <4u>nb i00nd <0h0dc1 c57,1> 4
   4md<:4> e x24por 0t_4rd
   v                      e
    pmdan: icu:nb iFandta<lhd cex1c4e,0pt>i
   : export_rdev(hdc14)

Version-Release number of selected component (if applicable):

How reproducible:
Always with the test scripts.  Under normal system operation the bug 
is not likely to happen, but if someone were having to fail a disk, 
and happened to have some monitoring software running that was 
reading /proc/mdstat it would happen at this most inopportune moment.

Steps to Reproduce:
1.  Start running test script that creates and removes md devices in 
a loop.
2.  Run test script that cats /proc/mdstat in a loop.
Actual results:
In a short amount of time the panic will occur.

Expected results:
No panic.

Additional info:
I am going to attach two test scripts that you use to reproduce this.
I am also trying to figure out how to patch the kernel to avoid this 
problem, but I don't mind if you produce a patch first (-:.   I also 
have just sent an email to Neil Brown about this issue.  Will gladly 
copy the responsible engineer on correspondence if requested.

Comment 1 James Olin Oden 2005-02-09 15:36:51 UTC
Created attachment 110871 [details]
panic log at normal logging level (will have interspersed md output)

Comment 2 James Olin Oden 2005-02-09 15:38:17 UTC
Created attachment 110872 [details]
Panic log at loglevel 1

Comment 3 James Olin Oden 2005-02-09 15:39:09 UTC
Created attachment 110873 [details]
Test script that reads /proc/mdstat in loop

Comment 4 James Olin Oden 2005-02-09 15:41:16 UTC
Created attachment 110874 [details]
Test script that creates and destroys md devices in a loop

Will probably want to change the partitions used and perhaps the names of the
md devices to work with whatever system you test with.	Also, just so you know,
we were testing on SMP systems when doing this.  One was E7501 chipset and the
other E7520 chipset (Nocona/Lindhurst).

Comment 6 James Olin Oden 2005-02-10 21:47:30 UTC
Created attachment 110942 [details]
Add locking around displaying mddev info

It took me a while grok the fix in 2.5.23 by Neil Brown and then translate to 
2.4.21 +N redhat patches, but I think I got.  Basically, just needed to add
locking on the mddev structure before displaying it in md_seq_show().  Patch is
very tiny, but seems to work.

Comment 7 James Olin Oden 2005-02-11 14:26:38 UTC
Ran test scripts against patched kernel all night and its still going 
without any problems.  This seems to fix the problem.


Comment 8 James Olin Oden 2005-02-16 20:17:52 UTC
Just wondering if you had a moment to look at this?

Comment 9 Doug Ledford 2005-02-18 07:55:11 UTC
The patch looks sane to me.  I'm currently testing it.  If it passes
testing, I'll submit it for review and possible inclusion in our next
kernel update.

Comment 10 Doug Ledford 2005-02-18 09:46:04 UTC
The patch has passed my testing so far (it's hard to say it's right
since the problem only reproduces occasionally, but at least it
doesn't deadlock or anything like that).  It's been submitted for
review and possible inclusion in the next kernel update (for both
AS2.1 and RHEL3).

Comment 11 Ernie Petrides 2005-02-23 22:52:27 UTC
A fix for this problem has just been committed to the RHEL3 U5
patch pool this afternoon (in kernel version 2.4.21-28.EL).

Comment 12 James Olin Oden 2005-02-24 01:41:07 UTC
Thanks much...james

Comment 13 Tim Powers 2005-05-18 13:29:16 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.