Bug 233653 - race condition in nbd driver triggers BUG in kunmap and kernel panic
race condition in nbd driver triggers BUG in kunmap and kernel panic
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Neil Horman
Martin Jenner
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-03-23 12:27 EDT by Paul Clements
Modified: 2010-10-22 09:59 EDT (History)
1 user (show)

See Also:
Fixed In Version: RHBA-2007-0791
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-11-15 11:23:17 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
patch #1 (9.34 KB, patch)
2007-03-23 12:27 EDT, Paul Clements
no flags Details | Diff
patch #2 (1.85 KB, patch)
2007-03-23 12:28 EDT, Paul Clements
no flags Details | Diff

  None (edit)
Description Paul Clements 2007-03-23 12:27:00 EDT
Description of problem:
There is a race condition in the nbd driver that triggers the following BUG in
kunmap:


Mar 18 10:54:28 dm1 kernel: ------------[ cut here ]------------
Mar 18 10:54:28 dm1 kernel: kernel BUG at mm/highmem.c:193!
Mar 18 10:54:28 dm1 kernel: invalid operand: 0000 [#1]
Mar 18 10:54:28 dm1 kernel: SMP
Mar 18 10:54:28 dm1 kernel: Modules linked in: nbd dgrp(U) parport_pc lp parport
autofs4 i2c_dev i2c_core sunrpc button battery ac md5 ipv6 joydev uhci_hcd
ehci_hcd hw_random e1000 floppy sg st dm_snapshot dm_zero dm_mirror ext3 jbd
raid1 dm_mod ata_piix libata megaraid_mbox megaraid_mm mptscsih mptsas mptspi
mptfc mptscsi mptbase sd_mod scsi_mod
Mar 18 10:54:28 dm1 kernel: CPU:    3
Mar 18 10:54:28 dm1 kernel: EIP:    0060:[<c014b6c2>]    Not tainted VLI
Mar 18 10:54:28 dm1 kernel: EFLAGS: 00010246   (2.6.9-42.0.3.ELsmp)
Mar 18 10:54:28 dm1 kernel: EIP is at kunmap_high+0x42/0x80
Mar 18 10:54:28 dm1 kernel: eax: 000000c7   ebx: 00000000   ecx: c043d008   edx:
00000000
Mar 18 10:54:28 dm1 kernel: esi: d67f2a00   edi: 00001000   ebp: 00000000   esp:
cb3d1c38
Mar 18 10:54:28 dm1 kernel: ds: 007b   es: 007b   ss: 0068
Mar 18 10:54:28 dm1 kernel: Process pdflush (pid: 27445, threadinfo=cb3d1000
task=de4dd6b0)
Mar 18 10:54:28 dm1 kernel: Stack: d3ea6610 f8a385ac d516a180 f8a3b920 e4010e0c
13956025 01000000 e4010e0c
Mar 18 10:54:28 dm1 kernel:        e4010e0c 17000000 00903afc 00100000 f7abd028
f8a3b920 f8a3b934 f7abd028
Mar 18 10:54:28 dm1 kernel:        00000008 f8a38a2d e4010e0c f7abd028 f7abd028
00000008 c0224448 dea7772c
Mar 18 10:54:28 dm1 kernel: Call Trace:
Mar 18 10:54:28 dm1 kernel:  [<f8a385ac>] nbd_send_req+0x283/0x2ec [nbd]
Mar 18 10:54:28 dm1 kernel:  [<f8a38a2d>] do_nbd_request+0x142/0x1cd [nbd]
Mar 18 10:54:28 dm1 kernel:  [<c0224448>] __generic_unplug_device+0x2b/0x2d
Mar 18 10:54:28 dm1 kernel:  [<c02255eb>] __make_request+0x421/0x46c
Mar 18 10:54:28 dm1 kernel:  [<c02257c4>] generic_make_request+0x18e/0x19e
Mar 18 10:54:28 dm1 kernel:  [<c015f484>] bio_clone+0x84/0x9c
Mar 18 10:54:28 dm1 kernel:  [<f8884bce>] make_request+0x2a0/0x2cd [raid1]
Mar 18 10:54:28 dm1 kernel:  [<f8884bce>] make_request+0x2a0/0x2cd [raid1]
Mar 18 10:54:28 dm1 kernel:  [<c02257c4>] generic_make_request+0x18e/0x19e
Mar 18 10:54:28 dm1 kernel:  [<c01204f5>] autoremove_wake_function+0x0/0x2d
Mar 18 10:54:28 dm1 kernel:  [<c022589e>] submit_bio+0xca/0xd2
Mar 18 10:54:28 dm1 kernel:  [<c0129e39>] __mod_timer+0x101/0x10b
Mar 18 10:54:28 dm1 kernel:  [<c015f2bd>] bio_alloc+0x100/0x168
Mar 18 10:54:28 dm1 kernel:  [<c015ec74>] submit_bh+0x141/0x166
Mar 18 10:54:28 dm1 kernel:  [<c015d74d>] __block_write_full_page+0x1f0/0x2ea
Mar 18 10:54:28 dm1 kernel:  [<f89038e4>] ext3_get_block+0x0/0x6c [ext3]
Mar 18 10:54:28 dm1 kernel:  [<c015eabc>] block_write_full_page+0xc5/0xce
Mar 18 10:54:28 dm1 kernel:  [<f89038e4>] ext3_get_block+0x0/0x6c [ext3]
Mar 18 10:54:28 dm1 kernel:  [<f890425a>] ext3_ordered_writepage+0xce/0x13a [ext3]
Mar 18 10:54:28 dm1 kernel:  [<f890416c>] bget_one+0x0/0x7 [ext3]
Mar 18 10:54:28 dm1 kernel:  [<c0178962>] mpage_writepages+0x1c2/0x314
Mar 18 10:54:28 dm1 kernel:  [<f890418c>] ext3_ordered_writepage+0x0/0x13a [ext3]
Mar 18 10:54:28 dm1 kernel:  [<c014597c>] mapping_tagged+0x2b/0x33
Mar 18 10:54:28 dm1 kernel:  [<c01772cc>] __sync_single_inode+0x5f/0x1c1
Mar 18 10:54:28 dm1 kernel:  [<c0177660>] sync_sb_inodes+0x1a7/0x274
Mar 18 10:54:28 dm1 kernel:  [<c0145b04>] pdflush+0x0/0x1e
Mar 18 10:54:28 dm1 kernel:  [<c01777be>] writeback_inodes+0x91/0xde
Mar 18 10:54:28 dm1 kernel:  [<c0145286>] wb_kupdate+0x7b/0xde
Mar 18 10:54:28 dm1 kernel:  [<c0145a70>] __pdflush+0xec/0x180
Mar 18 10:54:28 dm1 kernel:  [<c0145b1e>] pdflush+0x1a/0x1e
Mar 18 10:54:28 dm1 kernel:  [<c014520b>] wb_kupdate+0x0/0xde
Mar 18 10:54:28 dm1 kernel:  [<c0145b04>] pdflush+0x0/0x1e
Mar 18 10:54:28 dm1 kernel:  [<c01341ed>] kthread+0x73/0x9b
Mar 18 10:54:28 dm1 kernel:  [<c013417a>] kthread+0x0/0x9b
Mar 18 10:54:28 dm1 kernel:  [<c01041f5>] kernel_thread_helper+0x5/0xb
Mar 18 10:54:28 dm1 kernel: Code: 08 0f 0b b7 00 51 60 2e c0 05 00 00 00 01 31
db c1 e8 0c 8b 14 85 20 b2 43 c0 4a 85 d2 89 14 85 20 b2 43 c0 74 05 4a 74 0a eb
17 <0f> 0b c1 00 51 60 2e c0 31 db 81 3d c0 cf 32 c0 c0 cf 32 c0 0f
Mar 18 10:54:28 dm1 kernel:  <0>Fatal exception: panic in 5 seconds


Version-Release number of selected component (if applicable):

RHEL 4 kernel version: 2.6.9-42.0.3.ELsmp


How reproducible: don't know -- intermittent


Steps to Reproduce:
1. The problem occurs when an I/O is sent over nbd and the reply for that I/O
comes back from the server before the sending routine has completed. This causes
pages to be freed before they get kunmapped, which results in a BUG. The bug
occurs on SMP systems as follows:

CPU0				CPU1
do_nbd_request
	add req to queuelist
	nbd_send_request
		send req head
		for each bio
			kmap
			send
				nbd_read_stat
					nbd_find_request
					nbd_end_request
			kunmap

When CPU1 finishes nbd_end_request, the request and all its associated
bio's are freed.  So when CPU0 calls kunmap whose argument is derived from
the last bio, it may crash.

Actual results: kernel panic

Expected results: no panic

Additional info: The 2 attached patches fix this problem. They went into the
mainline kernel 19Nov05.
Comment 1 Paul Clements 2007-03-23 12:27:00 EDT
Created attachment 150774 [details]
patch #1
Comment 2 Paul Clements 2007-03-23 12:28:26 EDT
Created attachment 150775 [details]
patch #2
Comment 3 RHEL Product and Program Management 2007-04-10 07:05:22 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 4 RHEL Product and Program Management 2007-04-18 18:25:49 EDT
This request was evaluated by Red Hat Kernel Team for inclusion in a Red
Hat Enterprise Linux maintenance release, and has moved to bugzilla 
status POST.
Comment 6 Jason Baron 2007-06-20 15:41:28 EDT
committed in stream U6 build 55.10. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/
Comment 9 errata-xmlrpc 2007-11-15 11:23:17 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0791.html

Note You need to log in before you can comment on or make changes to this bug.