Bug 441832
Summary: | mptscsi race between hotremove and mptscsih_bus_reset | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Bryn M. Reeves <bmr> | ||||
Component: | kernel | Assignee: | Doug Ledford <dledford> | ||||
Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 5.2 | CC: | duck, dzickus, knweiss, peterm, tao | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2009-01-20 19:49:38 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 391501, 409971, 448732 | ||||||
Attachments: |
|
Description
Bryn M. Reeves
2008-04-10 13:54:37 UTC
Created attachment 301986 [details]
Check for NULL pointers when retrieving vdevice / vdevice->vtarget in mptscsih_bus_reset
Comment #0 should read "hot-removed or offlined" - you don't actually need to disconnect the drive, just have it offlined so that the kernel removes it. Eric posted a patch containing this change (along with one or two others :) to linux-scsi last year: http://marc.info/?l=linux-scsi&m=119008142831206&w=2 $ diffstat /tmp/mpt-linux-scsi.patch mptscsih.c | 1498 +++++++++++++++++++++++++++---------------------------------- mptscsih.h | 8 2 files changed, 666 insertions(+), 840 deletions(-) $ diffstat /tmp/mpt_bus_reset.patch mptscsih.c | 3 +++ 1 file changed, 3 insertion Just seen now on F9 RHTS x86_64 ibm-taroko.rhts.bos.redhat.com, Job 24616, kernel-2.6.25-14.fc9.x86_64, during startup, no device removal/plugging: http://rhts.redhat.com/testlogs/24616/89682/749716/3448895-test_log--tools-gdb-gdb-any-EXTERNALWATCHDOG.log BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 IP: [<ffffffff880a37d6>] :mptscsih:mptscsih_bus_reset+0xa6/0x109 PGD 3e172067 PUD 3edcf067 PMD 3e0ca067 PTE 0 Oops: 0000 [1] SMP CPU 1 Modules linked in: nfs lockd nfs_acl bridge bnep rfcomm l2cap bluetooth sunrpc ipv6 cpufreq_ondemand acpi_cpufreq freq_table loop dm_multipath sr_mod cdrom pata_acpi ata_generic ppdev snd_hda_intel parport_pc snd_seq_dummy parport snd_seq_oss floppy snd_seq_midi_event snd_seq snd_seq_device firewire_ohci firewire_core pcspkr serio_raw snd_pcm_oss snd_mixer_oss crc_itu_t snd_pcm i2c_i801 snd_timer snd_page_alloc ahci i2c_core iTCO_wdt ata_piix iTCO_vendor_support snd_hwdep libata button i82975x_edac snd edac_core tg3 soundcore sg dm_snapshot dm_zero dm_mirror dm_mod shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] Pid: 462, comm: scsi_eh_0 Not tainted 2.6.25-14.fc9.x86_64 #1 RIP: 0010:[<ffffffff880a37d6>] [<ffffffff880a37d6>] :mptscsih:mptscsih_bus_reset+0xa6/0x109 RSP: 0018:ffff81003e061dd0 EFLAGS: 00010246 RAX: ffff81003f3e3802 RBX: ffff81003f3e2c80 RCX: 000000000000000a RDX: 0000000000000000 RSI: 0000000000000046 RDI: ffffffff814e64e4 RBP: ffff81003e061e00 R08: 0000000000000002 R09: 0000000000000000 R10: ffffffff8806027f R11: ffffffff814e6900 R12: ffff81003c94c3c0 R13: ffff81003e4e7000 R14: ffff81003e4e7008 R15: ffff81003f3e2800 FS: 0000000000000000(0000) GS:ffff81003f802680(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000000 CR3: 000000003edf8000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process scsi_eh_0 (pid: 462, threadinfo ffff81003e060000, task ffff81003e096000) Stack: ffff81003f3e3810 0000000000000000 ffff81003c94c3c0 0000000000002003 0000000000000000 ffff81003e061ee0 ffff81003e061e20 ffffffff880541e1 ffff81003c94c3c0 0000000000000000 ffff81003e061e60 ffffffff88054f53 Call Trace: [<ffffffff880541e1>] :scsi_mod:scsi_try_bus_reset+0x52/0xde [<ffffffff88054f53>] :scsi_mod:scsi_eh_ready_devs+0x2d3/0x4af [<ffffffff8805562f>] :scsi_mod:scsi_error_handler+0x352/0x4f1 [<ffffffff81026ae5>] ? __wake_up_common+0x46/0x75 [<ffffffff880552dd>] ? :scsi_mod:scsi_error_handler+0x0/0x4f1 [<ffffffff810477e3>] kthread+0x49/0x76 [<ffffffff8100ccf8>] child_rip+0xa/0x12 [<ffffffff8104779a>] ? kthread+0x0/0x76 [<ffffffff8100ccee>] ? child_rip+0x0/0x12 Code: 00 00 49 8b 04 24 b9 28 00 00 00 48 8b 90 88 00 00 00 41 8a 85 98 00 00 00 84 c0 74 0e 31 c9 3c 02 0f 94 c1 8d 0c cd 02 00 00 00 <48> 8b 02 45 31 c9 45 31 c0 48 89 df be 04 00 00 00 0f b6 50 0b RIP [<ffffffff880a37d6>] :mptscsih:mptscsih_bus_reset+0xa6/0x109 RSP <ffff81003e061dd0> CR2: 0000000000000000 ---[ end trace 0e0ecc73240609da ]--- in kernel-2.6.18-107.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html |