Bug 495858

Summary: show_partition() oops when race with rescan_partitions().
Product: Red Hat Enterprise Linux 4 Reporter: Wengang Wang <wen.gang.wang>
Component: kernelAssignee: Jeff Moyer <jmoyer>
Status: CLOSED ERRATA QA Contact: Evan McNabb <emcnabb>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.7CC: bmr, dejohnso, dhoward, greg.marsden, iannis, jolsa, lmcilroy, moshiro, sghosh, tao, tuju
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 507758 (view as bug list) Environment:
Last Closed: 2011-02-16 15:18:41 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 507758, 512310    
Attachments:
Description Flags
the patch fixes the oops
none
Protect the gendisk->part array with rcu
none
systemtap script to print potential culprits for the panic none

Description Wengang Wang 2009-04-15 07:33:57 UTC
show_partition() oops when it races with rescan_partitions().
there is no syncing between them. when rescan_partitions() deleted a partition
the show_partition() access the point(got before the deletion), it oops.

solution:
add ruc lock to sync the operations between show_partition and rescan_partitions().

Comment 1 Wengang Wang 2009-04-15 07:38:19 UTC
Created attachment 339635 [details]
the patch fixes the oops

this patch fixes the oops.
it's based kernel 2.6.9-80.EL
please have it included for next release.

Comment 2 Bryn M. Reeves 2009-05-12 13:45:03 UTC
Looks a bit like this:

Unable to handle kernel NULL pointer dereference at 0000000000000008 RIP: <ffffffff80255e23>{show_partition+149}
PML4 10b3ef067 PGD 10ac36067 PMD 0
Oops: 0000 [1] SMP
CPU 1
Modules linked in: 8021q nfsd exportfs lockd nfs_acl md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc ds yenta_socket pcmcia_core ide_dump cciss_dump scsi_dump diskdump zlib_deflate emcpdm(U) emcpgpx(U) emcpmpx(U) emcp(U) button battery ac ohci_hcd hw_random k8_edac edac_mc tg3 e1000(U) bonding(U) floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod qla6312 qla2400 qla2300 qla2xxx scsi_transport_fc cciss sd_mod scsi_mod
Pid: 6732, comm: cmaperfd Tainted: P      2.6.9-67.0.22.ELsmp
RIP: 0010:[<ffffffff80255e23>] <ffffffff80255e23>{show_partition+149}
RSP: 0018:000001010b39fe68  EFLAGS: 00010293
RAX: 0000000000000008 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffffffff8033c3f7 RSI: fffffffffffffffe RDI: 000001010b39fe68
RBP: 00000102fc711a00 R08: 00000000ffffffff R09: 0000000000000006
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 000001010cae5280 R14: 00000000000001a8 R15: 0000000000000010
FS:  0000002a95b767e0(0000) GS:ffffffff804f3700(005b) knlGS:00000000f7fba6c0
CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 0000000000000008 CR3: 00000002fc7be000 CR4: 00000000000006e0
Process cmaperfd (pid: 6732, threadinfo 000001010b39e000, task 000001010b5697f0)
Stack: 0000000031796400 0000000000000246 00000000f7fb9000 0000000000000246
       0000000000000246 00000102fc711a00 000001010cae5280 0000000000000000
       00000000000003f0 ffffffff80198a96
Call Trace:<ffffffff80198a96>{seq_read+445} <ffffffff8017b0f4>{vfs_read+207}
       <ffffffff8017b350>{sys_read+69} <ffffffff80126737>{cstar_do_call+27}

Comment 3 Bryn M. Reeves 2009-05-12 13:52:13 UTC
What is the upstream status of the patch in comment #1? This isn't in linux-2.6.git and I can't find it having been posted for review anywhere.

If this has been submitted, can you provide a link to the thread where this was discussed?

Comment 4 Wengang Wang 2009-05-13 01:54:26 UTC
I think it is already committed in upstream at
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.29.y.git;a=commitdiff;h=e71bf0d0ee89e51b92776391c5634938236977d5

just that commit includes more. my patch is a part of it(surely have something changed to fit the RHEL4 kernel), but I think it's enough to solve the race here.

and yes to comment #2, to be exactly
@     Unable to handle kernel NULL pointer dereference at 0000000000000008 RIP:
@     <ffffffff802514bf>{show_partition+209}
@     PML4 3a5cd7067 PGD 1830ac067 PMD 0
@     Oops: 0000 [1] SMP
@     CPU 7
@     Modules linked in: oracleasm(U) emcpdm(U) emcpgpx(U) emcpmpx(U) emcp(U)
@ emcplib(U) qla2300 qla2xxx
@     netconsole netdump joydev md5 ipv6 sunrpc ide_scsi dm_mod button battery
@ ac ohci_hcd hw_random tg3
@     e1000 bond2(U) bond1(U) bond0(U) floppy sg ext3 jbd raid1 mptscsih mptsas
@ mptspi mptfc mptscsi
@     mptbase scsi_transport_fc sd_mod scsi_mod
@     Pid: 5216, comm: iostat Tainted: P   M  2.6.9-42.0.3.6.1.ELsmp
@     RIP: 0010:[<ffffffff802514bf>] <ffffffff802514bf>{show_partition+209}
@     RSP: 0018:000001030e5a9e68  EFLAGS: 00010206
@     RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000071
@     RDX: 0000000000000000 RSI: ffffffff803364ef RDI: 000001032f9e4080
@     RBP: 00000100012e8400 R08: 00000000fffffffb R09: 000001030e5a9e68
@     R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
@     R13: 000001032f9e4080 R14: 00000000000001b9 R15: 000000000000001c
@     FS:  0000002a959a1da0(0000) GS:ffffffff804e5400(0000)
@ knlGS:00000000f611fbb0
@     CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
@     CR2: 0000000000000008 CR3: 000000000e0c0000 CR4: 00000000000006e0
@     Process iostat (pid: 5216, threadinfo 000001030e5a8000, task
@ 00000100eecbe7f0)
@     Stack: 0000003168686473 0000000000000246 00000100fbec3a00
@ 0000000000000246
@            0000000000000246 00000100012e8400 000001032f9e4080
@ 0000000000000000
@            00000000000003e4 ffffffff80195e4a
@     Call Trace:<ffffffff80195e4a>{seq_read+445}
@ <ffffffff8017949b>{vfs_read+207}
@            <ffffffff801796f2>{sys_read+69}
@ <ffffffff8011026a>{system_call+126}
@    
@    
@     Code: 4c 8b 42 08 8b 55 00 49 d1 e8 e8 f3 4c f4 ff ff c3 eb a1 48
@     RIP <ffffffff802514bf>{show_partition+209} RSP <000001030e5a9e68>

Comment 7 Lachlan McIlroy 2009-05-21 07:10:02 UTC
I have been able to reproduce this bug with:

# while true; do cat /proc/partitions > /dev/null; done &
# while true; do hdparm -z /dev/sda; done

where /dev/sda had 4 primary partitions.  I left the test
running in a VM for about 15 mins before it panicked.

Comment 8 Lachlan McIlroy 2009-06-02 01:07:56 UTC
Adding issue 302472.  Setting priority/severity to match.

Comment 10 Jerome Marchand 2009-06-22 14:56:02 UTC
There some other issue with partitions statistics which are already solved upstream. I am looking for which fixes should be backported. This one is definitely part of them.

Comment 15 Jeff Moyer 2009-07-01 21:50:38 UTC
Created attachment 350206 [details]
Protect the gendisk->part array with rcu

Hi,

This patch is exactly the same as the last, except it adds the rcu protection to diskstats_show.  Nice work on a minimal backport, by the way.

I wasn't able to reproduce the original oops because of the sheer number of hotplug events generated by the rescanning.  The test case, for me, was basically a fork bomb.  ;-)  So, if someone who can reproduce this problem is willing to test a kernel, I'd greatly appreciate it.  There's an x86_64 binary rpm here:
    http://people.redhat.com/jmoyer/bz495858/

Cheers,
Jeff

Comment 16 Jeff Moyer 2009-07-01 22:06:24 UTC
To be clear, on the testing front I'd like to know that the original reproducer is still fixed.  I'd also like to know what happens if you replace /proc/partitions with /proc/diskstats in both the unpatched and the patched kernels.

Thanks!

Comment 17 Wengang Wang 2009-07-02 01:57:11 UTC
actually, I have no good test case for the original problem.
I just guess the problem is because of the race, and I added some sleeps in code to trigger it.

Comment 18 Lachlan McIlroy 2009-07-03 05:43:16 UTC
I thought I was able to reproduce this bug since the test case in this BZ has reliably triggered this panic in the past but today I cannot get it to work.

Comment 20 Vivek Goyal 2009-07-07 18:10:53 UTC
Committed in 89.5.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 23 Jeff Moyer 2009-07-10 13:04:18 UTC
The issue you describe is essentially a fork bomb.  It is not a good idea to run hdparm -z in a tight loop as the root user.  Given that this can only be done as root, and given that this is not something that is done in normal operations, I think we don't need to address this problem.  If you ever see it in normal operations, then definitely file a bug.

Thanks for testing!

Comment 25 Jeff Moyer 2009-07-10 13:35:55 UTC
Sorry, I'm usually better about not being vague!

The fork bomb issue (in other words, the hotplug events spawned by re-reading the partition tables) is not something that should happen in real-world use.  I think that if we see OOM kills related to hotplug events in the wild, then we should definitely address it in a different bugzilla.

Comment 28 Chris Van Hoof 2009-07-23 18:09:15 UTC
Created attachment 354902 [details]
systemtap script to print potential culprits for the panic

Quick stap script that prints the time, parent, child, uid, and function called in an effort to track down processes that trigger this race condition.  I've found a number of custom scripts that look to be contributing factors.

If anything this should give an idea of what to disable if hitting this panic frequently.

Comment 32 Vivek Goyal 2010-10-18 21:43:44 UTC
Some errors/confusions while adding the bz to errata tool. Returning bz to MODIFIED state so that it can be added to errata.

Comment 35 errata-xmlrpc 2011-02-16 15:18:41 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0263.html