155447 – dm-multipath oopses when the last path fails

Bug 155447 - dm-multipath oopses when the last path fails

Summary: dm-multipath oopses when the last path fails

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	device-mapper-multipath
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Alasdair Kergon
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2005-04-20 13:22 UTC by Lars Marowsky-Bree
Modified:	2007-11-30 22:11 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2005-04-21 10:25:44 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Novell	78986	0	None	None	None	Never

Description Lars Marowsky-Bree 2005-04-20 13:22:30 UTC

Description of problem:

dm-multipath oopses when the last path fails; IO is requeued, but
dispatch_queued_ios() doesn't seem able to cope if map_io fails; but I don't
understand why, it seems to oops somewhere in the endio path:

Unable to handle kernel NULL pointer dereference at virtual address 00000010
 printing eip:
f8e048ed
*pde = 00102001
Oops: 0000 [#1]
SMP 
CPU:    0
EIP:    0060:[<f8e048ed>]    Tainted: G  U
EFLAGS: 00010206   (2.6.5-7.165-bigsmp SLES9_SP2_BRANCH-200504201212570000) 
EIP is at multipath_end_io+0x5d/0x260 [dm_multipath]
eax: 00000000   ebx: cdcde280   ecx: fffffffb   edx: 00000000
esi: e8b80bc0   edi: ffffffa1   ebp: f0a035e4   esp: f4029eac
ds: 007b   es: 007b   ss: 0068
Process kmpathd/0 (pid: 23278, threadinfo=f4028000 task=f3524000)
Stack: e8b80c40 00001000 0004aa90 00000000 00000001 cdcde290 00000000 fffffffb 
       e5f3a960 f0f7e960 e8b80bc0 f889b580 e5f3a968 f8d79080 f8e04890 00000000 
       e8b80bc0 00001000 f889b510 fffffffb c017be38 00000001 00000296 00000296 
Call Trace: 
 [<f889b580>] clone_endio+0x70/0x120 [dm_mod]
 [<f8e04890>] multipath_end_io+0x0/0x260 [dm_multipath]
 [<f889b510>] clone_endio+0x0/0x120 [dm_mod]
 [<c017be38>] bio_endio+0x68/0xa0
 [<f8e04e0c>] process_queued_ios+0x12c/0x140 [dm_multipath]
 [<c013c066>] worker_thread+0x186/0x230
 [<f8e04ce0>] process_queued_ios+0x0/0x140 [dm_multipath]
 [<c01238f0>] default_wake_function+0x0/0x10
 [<c01238f0>] default_wake_function+0x0/0x10
 [<c013bee0>] worker_thread+0x0/0x230
 [<c013fdd9>] kthread+0xf9/0x12d
 [<c013fce0>] kthread+0x0/0x12d
 [<c0107005>] kernel_thread_helper+0x5/0x10

Code: 8b 40 10 c7 04 24 58 5b e0 f8 83 c0 14 89 44 24 04 e8 1d 70  
 Dumping to block device (104,5) on CPU 0 ...


Version-Release number of selected component (if applicable):

2.6.5 but with latest DM patches applied.

This happens all the time and is 100% reproducible.

Comment 1 Lars Marowsky-Bree 2005-04-20 17:46:51 UTC

It's independent of the workqueue patch, just to add a data point.

Comment 2 Lars Marowsky-Bree 2005-04-20 20:55:30 UTC

Slightly better trace from a kernel compiled with framepointers etc:

Unable to handle kernel NULL pointer dereference at virtual address 00000010
 printing eip:
f8dedd48
*pde = 330bc001
Oops: 0000 [#1]
SMP 
CPU:    0
EIP:    0060:[<f8dedd48>]    Tainted: G  U
EFLAGS: 00010206   (2.6.5-7.165-biglmb ) 
EIP is at multipath_end_io+0x58/0x380 [dm_multipath]
eax: 00000000   ebx: f0621a38   ecx: fffffffb   edx: 00000000
esi: f0629cec   edi: f3b11084   ebp: f39d5ea0   esp: f39d5e64
ds: 007b   es: 007b   ss: 0068
Process kmpathd/0 (pid: 22795, threadinfo=f39d4000 task=f3cb4c60)
Stack: 27fc854a c045e800 00000088 00000000 f3cb4c60 00000001 f3b11098 00000000 
       f0629cec 00000046 f39d5ef0 c029269d fffffffb f0627914 f0626914 f39d5ec8 
       f88f754d f062791c f8dac080 f8dedcf0 00000000 f0621a38 f0621a38 00001000 
Call Trace: 
 [<c029269d>] generic_make_request+0x10d/0x1f0
 [<f88f754d>] clone_endio+0x6d/0x110 [dm_mod]
 [<f8dedcf0>] multipath_end_io+0x0/0x380 [dm_multipath]
 [<f88f74e0>] clone_endio+0x0/0x110 [dm_mod]
 [<c018d25b>] bio_endio+0x5b/0x90
 [<f8dee3ec>] process_queued_ios+0x19c/0x220 [dm_multipath]
 [<c0142ba0>] worker_thread+0x1a0/0x2e0
 [<f8dee250>] process_queued_ios+0x0/0x220 [dm_multipath]
 [<c01260c0>] default_wake_function+0x0/0x10
 [<c01260c0>] default_wake_function+0x0/0x10
 [<c01476dc>] kthread+0xec/0x11c
 [<c0142a00>] worker_thread+0x0/0x2e0
 [<c01475f0>] kthread+0x0/0x11c
 [<c0107005>] kernel_thread_helper+0x5/0x10

Code: 8b 40 10 c7 04 24 7c f1 de f8 83 c0 14 89 44 24 04 e8 f2 16

Comment 3 Lars Marowsky-Bree 2005-04-21 10:25:44 UTC

Cough, cough. Looks like I introduced this bug myself in my extended logging
patch. I'll attach a cleaned up version of that one to the respective bug soon.

Note You need to log in before you can comment on or make changes to this bug.