Description of problem: dm-multipath oopses when the last path fails; IO is requeued, but dispatch_queued_ios() doesn't seem able to cope if map_io fails; but I don't understand why, it seems to oops somewhere in the endio path: Unable to handle kernel NULL pointer dereference at virtual address 00000010 printing eip: f8e048ed *pde = 00102001 Oops: 0000 [#1] SMP CPU: 0 EIP: 0060:[<f8e048ed>] Tainted: G U EFLAGS: 00010206 (2.6.5-7.165-bigsmp SLES9_SP2_BRANCH-200504201212570000) EIP is at multipath_end_io+0x5d/0x260 [dm_multipath] eax: 00000000 ebx: cdcde280 ecx: fffffffb edx: 00000000 esi: e8b80bc0 edi: ffffffa1 ebp: f0a035e4 esp: f4029eac ds: 007b es: 007b ss: 0068 Process kmpathd/0 (pid: 23278, threadinfo=f4028000 task=f3524000) Stack: e8b80c40 00001000 0004aa90 00000000 00000001 cdcde290 00000000 fffffffb e5f3a960 f0f7e960 e8b80bc0 f889b580 e5f3a968 f8d79080 f8e04890 00000000 e8b80bc0 00001000 f889b510 fffffffb c017be38 00000001 00000296 00000296 Call Trace: [<f889b580>] clone_endio+0x70/0x120 [dm_mod] [<f8e04890>] multipath_end_io+0x0/0x260 [dm_multipath] [<f889b510>] clone_endio+0x0/0x120 [dm_mod] [<c017be38>] bio_endio+0x68/0xa0 [<f8e04e0c>] process_queued_ios+0x12c/0x140 [dm_multipath] [<c013c066>] worker_thread+0x186/0x230 [<f8e04ce0>] process_queued_ios+0x0/0x140 [dm_multipath] [<c01238f0>] default_wake_function+0x0/0x10 [<c01238f0>] default_wake_function+0x0/0x10 [<c013bee0>] worker_thread+0x0/0x230 [<c013fdd9>] kthread+0xf9/0x12d [<c013fce0>] kthread+0x0/0x12d [<c0107005>] kernel_thread_helper+0x5/0x10 Code: 8b 40 10 c7 04 24 58 5b e0 f8 83 c0 14 89 44 24 04 e8 1d 70 Dumping to block device (104,5) on CPU 0 ... Version-Release number of selected component (if applicable): 2.6.5 but with latest DM patches applied. This happens all the time and is 100% reproducible.
It's independent of the workqueue patch, just to add a data point.
Slightly better trace from a kernel compiled with framepointers etc: Unable to handle kernel NULL pointer dereference at virtual address 00000010 printing eip: f8dedd48 *pde = 330bc001 Oops: 0000 [#1] SMP CPU: 0 EIP: 0060:[<f8dedd48>] Tainted: G U EFLAGS: 00010206 (2.6.5-7.165-biglmb ) EIP is at multipath_end_io+0x58/0x380 [dm_multipath] eax: 00000000 ebx: f0621a38 ecx: fffffffb edx: 00000000 esi: f0629cec edi: f3b11084 ebp: f39d5ea0 esp: f39d5e64 ds: 007b es: 007b ss: 0068 Process kmpathd/0 (pid: 22795, threadinfo=f39d4000 task=f3cb4c60) Stack: 27fc854a c045e800 00000088 00000000 f3cb4c60 00000001 f3b11098 00000000 f0629cec 00000046 f39d5ef0 c029269d fffffffb f0627914 f0626914 f39d5ec8 f88f754d f062791c f8dac080 f8dedcf0 00000000 f0621a38 f0621a38 00001000 Call Trace: [<c029269d>] generic_make_request+0x10d/0x1f0 [<f88f754d>] clone_endio+0x6d/0x110 [dm_mod] [<f8dedcf0>] multipath_end_io+0x0/0x380 [dm_multipath] [<f88f74e0>] clone_endio+0x0/0x110 [dm_mod] [<c018d25b>] bio_endio+0x5b/0x90 [<f8dee3ec>] process_queued_ios+0x19c/0x220 [dm_multipath] [<c0142ba0>] worker_thread+0x1a0/0x2e0 [<f8dee250>] process_queued_ios+0x0/0x220 [dm_multipath] [<c01260c0>] default_wake_function+0x0/0x10 [<c01260c0>] default_wake_function+0x0/0x10 [<c01476dc>] kthread+0xec/0x11c [<c0142a00>] worker_thread+0x0/0x2e0 [<c01475f0>] kthread+0x0/0x11c [<c0107005>] kernel_thread_helper+0x5/0x10 Code: 8b 40 10 c7 04 24 7c f1 de f8 83 c0 14 89 44 24 04 e8 f2 16
Cough, cough. Looks like I introduced this bug myself in my extended logging patch. I'll attach a cleaned up version of that one to the respective bug soon.