586761 – Oops in cdrom_analyze_sense_data (race with cdrom_pc_intr?)

Bug 586761 - Oops in cdrom_analyze_sense_data (race with cdrom_pc_intr?)

Summary: Oops in cdrom_analyze_sense_data (race with cdrom_pc_intr?)

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.8
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Prarit Bhargava
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-04-28 10:28 UTC by Bryn M. Reeves
Modified:	2018-12-05 15:14 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2012-10-25 18:47:35 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
dmesg leading up to panic (33.34 KB, text/plain) 2010-04-28 10:32 UTC, Bryn M. Reeves	no flags	Details
dmesg from bootup of affected system (85.39 KB, application/octet-stream) 2011-04-07 16:39 UTC, John Ruemker	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Legacy)	36847	0	None	None	None	Never

Description Bryn M. Reeves 2010-04-28 10:28:29 UTC

Description of problem:
Unable to handle kernel paging request at 0000000100000248 RIP: 
<ffffffff8028166c>{cdrom_analyze_sense_data+53}
PML4 1dc1b3067 PGD 0 
Oops: 0000 [1] SMP 
CPU 1 
Modules linked in: netconsole netdump autofs4 i2c_dev i2c_core ib_srp ib_sdp ib_ipoib inet_lro md5 ipv6 rdma_ucm rdma_cm iw_cm ib_addr ib_umad ib_ucm ib_uverbs ib_cm ib_sa ib_mad ib_core dm_round_robin dm_multipath button battery ac ohci_hcd hw_random k8_edac edac_mc tg3 bonding(U) floppy sg dm_snapshot dm_zero dm_mirror ext3 jbd raid1 raid0 dm_mod qla2300 qla2xxx scsi_transport_fc mptscsih mptsas mptspi mptscsi mptbase sd_mod scsi_mod
Pid: 15825, comm: oracle Not tainted 2.6.9-89.0.20.ELsmp
RIP: 0010:[<ffffffff8028166c>] <ffffffff8028166c>{cdrom_analyze_sense_data+53}
RSP: 0000:00000101fabcbe18  EFLAGS: 00010246
RAX: 0000000000000200 RBX: 0000000000000000 RCX: 0000000100000246
RDX: 0000000100000200 RSI: 00000100c03cbd18 RDI: ffffffff804e4f90
RBP: 0000000100000246 R08: 00000000000d0cdd R09: 000000000f8d0954
R10: 0000000000000246 R11: 0000000000000246 R12: 00000100c03cbd18
R13: ffffffff804e4f90 R14: ffffffff804e4f90 R15: 0000000000000001
FS:  0000002a975ad920(0000) GS:ffffffff80505000(0000) knlGS:00000000f6aed6c0
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000100000248 CR3: 0000000001032000 CR4: 00000000000006e0
Process oracle (pid: 15825, threadinfo 0000010054214000, task 00000100716ca030)
Stack: 0000000000002706 ffffffff804e4f90 0000010080011460 0000000000000001 
       0000769a0788937e 00000101f37377f0 000001008007fc80 0000000000000000 
       00000101fabcbe88 ffffffff80132cc7 
Call Trace:<IRQ> <ffffffff80132cc7>{activate_task+124} <ffffffff80281caf>{cdrom_end_request+129} 
       <ffffffff802829e0>{cdrom_pc_intr+229} <ffffffff802828fb>{cdrom_pc_intr+0} 
       <ffffffff802767f5>{ide_intr+379} <ffffffff80112ff6>{handle_IRQ_event+41} 
       <ffffffff80113270>{do_IRQ+197} <ffffffff801108c3>{ret_from_intr+0} 
        <EOI> 

Code: 0f b6 45 02 83 e0 0f 0f b6 c0 83 f8 06 77 37 89 c0 ff 24 c5 
RIP <ffffffff8028166c>{cdrom_analyze_sense_data+53} RSP <00000101fabcbe18>
CR2: 0000000100000248


Version-Release number of selected component (if applicable):
2.6.9-89.20.EL

How reproducible:
Reported to have occurred several times.

Steps to Reproduce:
Unknown - awaiting clarification from reporter.

Actual results:
Above oops.

Expected results:
No oops.

Additional info:

Comment 1 Bryn M. Reeves 2010-04-28 10:32:52 UTC

Created attachment 409758 [details]
dmesg leading up to panic

dmesg leading up to panic in comment #0

Comment 3 Bryn M. Reeves 2010-04-28 10:42:17 UTC

At the time of the oops, CPU1 was processing an IDE interrupt that ended up calling cdrom_analyze_sense_data:

crash> set 15825
    PID: 15825
COMMAND: "oracle"
   TASK: 100716ca030  [THREAD_INFO: 10054214000]
    CPU: 1
  STATE: TASK_RUNNING (PANIC)
crash> bt
PID: 15825  TASK: 100716ca030       CPU: 1   COMMAND: "oracle"
 #0 [101fabcbc50] netpoll_start_netdump at ffffffffa02ac366
 #1 [101fabcbc80] do_page_fault at ffffffff80124c50
 #2 [101fabcbd60] error_exit at ffffffff80110e21
    [exception RIP: cdrom_analyze_sense_data+53]
    RIP: ffffffff8028166c  RSP: 00000101fabcbe18  RFLAGS: 00010246
    RAX: 0000000000000200  RBX: 0000000000000000  RCX: 0000000100000246
    RDX: 0000000100000200  RSI: 00000100c03cbd18  RDI: ffffffff804e4f90
    RBP: 0000000100000246   R8: 00000000000d0cdd   R9: 000000000f8d0954
    R10: 0000000000000246  R11: 0000000000000246  R12: 00000100c03cbd18
    R13: ffffffff804e4f90  R14: ffffffff804e4f90  R15: 0000000000000001
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
 #3 [101fabcbe90] cdrom_end_request at ffffffff80281caf
 #4 [101fabcbed0] cdrom_pc_intr at ffffffff802829e0
 #5 [101fabcbf10] ide_intr at ffffffff802767f5
 #6 [101fabcbf50] handle_IRQ_event at ffffffff80112ff6
 #7 [101fabcbf80] do_IRQ at ffffffff80113270
--- <IRQ stack> ---
 #8 [10054215f58] ret_from_intr at ffffffff801108c3
    RIP: ffffffffff6000a5  RSP: 0000007fbfffe0c8  RFLAGS: 00000246
    RAX: 00000000ee74565f  RBX: 000482078624efcb  RCX: 00000000000d08f5
    RDX: 000000000000038a  RSI: 0000000000000000  RDI: 0000007fbfffe0f0
    RBP: 0000007fbfffe100   R8: 00000000000d0cdd   R9: 000000000f8d0954
    R10: 000000004ba1667a  R11: 0000000000000246  R12: 000000000a19e480
    R13: 0000000060004650  R14: 0000000000004d58  R15: 0000000000003cd8
    ORIG_RAX: ffffffffffffff0f  CS: 0033  SS: 002b

The actual oops takes place in cdrom_log_sense():

388 void cdrom_analyze_sense_data(ide_drive_t *drive,
389                               struct request *failed_command,
390                               struct request_sense *sense)
391 {
392         unsigned long sector;
393         unsigned long bio_sectors;
394         unsigned long valid;
395
396         if (!cdrom_log_sense(drive, failed_command, sense)) <---
397                 return;
[...]

342 static int cdrom_log_sense(ide_drive_t *drive, struct request *rq,
343                            struct request_sense *sense)
344 {
345         int log = 0;
346
347         if (!sense || !rq || (rq->flags & REQ_QUIET))
348                 return 0;
349
350         switch (sense->sense_key) { <--- 1st access to *sense
351                 case NO_SENSE: case RECOVERED_ERROR:
352                         break;
[...]

Pulling out the ide_drive_t:

crash> ide_drive_s ffffffff804e4f90
struct ide_drive_s {
 name = "hdc",
 driver_req = "\000\000\000\000\000\000\000\000\000",
 queue = 0x10080de03d0,
 rq = 0x0,
 next = 0xffffffff804e4f90,
 driver = 0xffffffff804228c0,
 driver_data = 0x10080011400,
 id = 0x10037ff5e00,
 proc = 0x100081a8080,
 settings = 0x10080db7b00,
 devfs_name = "ide/host0/bus1/target0/lun0\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
 hwif = 0xffffffff804e4e48,
 sleep = 0,
 service_start = 4425133547,
 service_time = 0,
 timeout = 0,
 special = {
   all = 0,
   b = {
     set_geometry = 0,
     recalibrate = 0,
     set_multmode = 0,
     set_tune = 0,
     serviced = 0,
     reserved = 0
   }
 },
 select = {
   all = 160,
   b = {
     head = 0,
     unit = 0,
     bit5 = 1,
     lba = 0,
     bit7 = 1
   }
 },
 keep_settings = 0 '\0',
 autodma = 1 '\001',
 using_dma = 1 '\001',
 retry_pio = 0 '\0',
 state = 0 '\0',
 waiting_for_dma = 0 '\0',
 unmask = 1 '\001',
 bswap = 0 '\0',
 dsc_overlap = 0 '\0',
 nice1 = 1 '\001',
 present = 1,
 dead = 0,
 id_read = 1,
 noprobe = 0,
 removable = 1,
 attach = 0,
 is_flash = 0,
 forced_geom = 0,
 no_unmask = 0,
 no_io_32bit = 0,
 atapi_overlap = 0,
 nice0 = 0,
 nice2 = 0,
 doorlocking = 0,
 autotune = 1,
 remap_0_to_1 = 0,
 blocked = 0,
 vdma = 0,
 stroke = 0,
 addressing = 0,
 scsi = 0,
 quirk_list = 0 '\0',
 suspend_reset = 0 '\0',
 init_speed = 12 '\f',
 pio_speed = 0 '\0',
 current_speed = 66 'B',
 dn = 2 '\002',
 wcache = 0 '\0',
 acoustic = 0 '\0',
 media = 5 '\005',
 ctl = 8 '\b',
 ready_stat = 0 '\0',
 mult_count = 0 '\0',
 mult_req = 0 '\0',
 tune_req = 0 '\0',
 io_32bit = 1 '\001',
 bad_wstat = 161 '<A1>',
 nowerr = 0 '\0',
 sect0 = 0 '\0',
 head = 0 '\0',
 sect = 0 '\0',
 bios_head = 0 '\0',
 bios_sect = 0 '\0',
 doing_barrier = 0 '\0',
 bios_cyl = 0,
 cyl = 0,
 drive_data = 0,
 usage = 1,
 failures = 0,
 max_failures = 1,
 capacity64 = 0,
 probed_capacity = 8388604,
 lun = 0,
 crc_count = 0,
 list = {
   next = 0xffffffff80422a18,
   prev = 0xffffffff80422a18
 },
 gendev = {
   node = {
     next = 0xffffffff804e5700,
     prev = 0xffffffff804e5700
   },
   bus_list = {
     next = 0xffffffff80421c68,
     prev = 0xffffffff80421c68
   },
   driver_list = {
     next = 0xffffffff804e50e0,
     prev = 0xffffffff804e50e0
   },
   children = {
     next = 0xffffffff804e50f0,
     prev = 0xffffffff804e50f0
   },
   parent = 0xffffffff804e56d0,
   kobj = {
     k_name = 0xffffffff804e5110 "1.0",
     name = "1.0\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
     kref = {
       refcount = {
         counter = 5
       }
     },
     entry = {
       next = 0x101fa84b1b0,
       prev = 0xffffffff804e5738
     },
     parent = 0xffffffff804e5718,
     kset = 0xffffffff804190a0,
     ktype = 0x0,
     dentry = 0x1000818b318
   },
   bus_id = "1.0\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
   bus = 0xffffffff80421b40,
   driver = 0xffffffff80422960,
   driver_data = 0xffffffff804e4f90,
   platform_data = 0x0,
   power = {
     power_state = 0,
     prev_state = 0,
     saved_state = 0x0,
     pm_users = {
       counter = 0
     },
     pm_parent = 0xffffffff804e56d0,
     entry = {
       next = 0x101fa84b238,
       prev = 0xffffffff804e57c0
     }
   },
   power_state = 0,
   saved_state = 0x0,
   detach_state = 0,
   dma_mask = 0x0,
   coherent_dma_mask = 0,
   dma_pools = {
     next = 0xffffffff804e51e8,
     prev = 0xffffffff804e51e8
   },
   dma_mem = 0x0,
   release = 0xffffffff80279c38 <drive_release_dev>
 },
 gendev_rel_sem = {
   count = {
     counter = 0
   },
   sleepers = 0,
   wait = {
     lock = {
       lock = 1,
       magic = 3735899821
     },
     task_list = {
       next = 0xffffffff804e5218,
       prev = 0xffffffff804e5218
     }
   }
 },
 disk = 0x10037ff5a00
}

From that get back to the rq via the hwgroup from the hwif:

crash> ide_drive_s ffffffff804e4f90 | grep hwif
 hwif = 0xffffffff804e4e48,

crash> hwif_s 0xffffffff804e4e48|grep hwg
 hwgroup = 0x10037ff5200,

Using:
struct request *rq = HWGROUP(drive)->rq;
#define HWGROUP(drive) ((ide_hwgroup_t *)(HWIF(drive)->hwgroup))
#define HWIF(drive) ((ide_hwif_t *)((drive)->hwif))

crash> hwgroup_s 0x10037ff5200 | grep rq
 rq = 0x10080011460,

crash> request 0x10080011460
struct request {
 queuelist = {
   next = 0x10080011460,
   prev = 0x10080011460
 },
 flags = 264256,
 sector = 0,
 nr_sectors = 0,
 current_nr_sectors = 0,
 hard_sector = 0,
 hard_nr_sectors = 0,
 hard_cur_sectors = 0,
 nr_cbio_segments = 0,
 nr_cbio_sectors = 0,
 cbio = 0x0,
 bio = 0x0,
 biotail = 0x0,
 elevator_private = 0x0,
 rq_status = -1,
 rq_disk = 0x10037ff5a00,
 errors = 0,
 start_time = 0,
 nr_phys_segments = 0,
 nr_hw_segments = 0,
 tag = 0,
 buffer = 0x100c03cbd18 "<FF>?<F9><A8>",
 ref_count = 0,
 q = 0x0,
 rl = 0x0,
 waiting = 0x0,
 special = 0x0,
 cmd_len = 0,
 cmd = "\003\000\000\000\022\000\000\000\000\000\000\000\000\000\000",
 data_len = 0,
 data = 0x100c03cbcaa,
 sense_len = 18,
 sense = 0x0,
 timeout = 60000,
 pm = 0x0
}

At this point, rq->buffer should point to the "original" (failed) request but it's garbage:

crash> request 0x100c03cbd18
struct request {
 queuelist = {
   next = 0x100a8f93fff,
   prev = 0xffffffff801f1077
 },
 flags = 0,
 sector = 18446744071564103241,
 nr_sectors = 4397712,
 current_nr_sectors = 937394277,
 hard_sector = 937394277,
 hard_nr_sectors = 18446744071563492786,
 hard_cur_sectors = 0,
 nr_cbio_segments = 0,
 nr_cbio_sectors = 18446744071563558405,
 cbio = 0x101f6055180,
 bio = 0x101f8769028,
 biotail = 0xa31190,
 elevator_private = 0x101f88e49f8,
 rq_status = 48,
 rq_disk = 0x100c03cbe68,
 errors = 937394279,
 start_time = 937394277,
 nr_phys_segments = 32869,
 nr_hw_segments = 14303,
 tag = 0,
 buffer = 0x37df8065 <Address 0x37df8065 out of bounds>,
 ref_count = -165768824,
 q = 0x10038def030,
 rl = 0xa31190,
 waiting = 0x101fa07a040,
 special = 0x101f88e49f8,
 cmd_len = 2148977923,
 cmd = "<FF><FF><FF><FF>\000\000\000\000\000\000\000\000\000\000\000",
 data_len = 0,
 data = 0x2a95573,
 sense_len = 18,
 sense = 0x100000246, <---- crash value
 timeout = 3588521672,
 pm = 0x101f2a87148
}

At the same time on CPU0 hald is issuing an open(2) on the same IDE CD device that's issuing a cdrom_check_status as part of check_disk_change()->media_changed()->ide_cdrom_check_media_change_real():

crash> set 7210
    PID: 7210
COMMAND: "hald"
   TASK: 100790b27f0  [THREAD_INFO: 10078e0a000]
    CPU: 0
  STATE: TASK_UNINTERRUPTIBLE 
crash> bt
PID: 7210   TASK: 100790b27f0       CPU: 0   COMMAND: "hald"
 #0 [10078e0b738] schedule at ffffffff80314c64
 #1 [10078e0b810] wait_for_completion at ffffffff80314ea8
 #2 [10078e0b890] ide_do_drive_cmd at ffffffff80276973
 #3 [10078e0b900] cdrom_queue_packet_command at ffffffff80282b99
 #4 [10078e0b970] cdrom_check_status at ffffffff802839bc
 #5 [10078e0bab0] ide_cdrom_check_media_change_real at ffffffff80284802
 #6 [10078e0bac0] media_changed at ffffffff802888ec
 #7 [10078e0bae0] check_disk_change at ffffffff8018388c
 #8 [10078e0bb00] cdrom_open at ffffffff80288475
 #9 [10078e0bdc0] idecd_open at ffffffff80284b5c
#10 [10078e0bdf0] do_open at ffffffff80183c31
#11 [10078e0be40] blkdev_open at ffffffff80184088
#12 [10078e0be60] __dentry_open at ffffffff8017b8c1
#13 [10078e0bea0] filp_open at ffffffff8017ba9a
#14 [10078e0bf50] sys_open at ffffffff8017bc89
#15 [10078e0bf80] system_call at ffffffff801102f6
    RIP: 0000002a959d25e2  RSP: 0000007fbffff808  RFLAGS: 00000202
    RAX: 0000000000000002  RBX: ffffffff801102f6  RCX: 0000000000649c70
    RDX: 0000000000000000  RSI: 0000000000000880  RDI: 00000000006b7db0
    RBP: 0000000000000000   R8: 0000000000000000   R9: 000000000f8d0952
    R10: 0000000000000000  R11: 0000000000000246  R12: 0000000000000000
    R13: 00000000006b3ef0  R14: 00000000006b72f0  R15: 0000000000000000
    ORIG_RAX: 0000000000000002  CS: 0033  SS: 002b

This isn't code that I'm too familiar with but I'm trying to understand if the hald->open() and cdrom_pc_intr() paths are racing here or if there's something else going on to have left the bogus pointer in the request..

Comment 5 Mauro Carvalho Chehab 2010-06-01 23:40:32 UTC

There's a known issue with some IDE CDROM drivers, where it takes a long time to check if a CDROM is inserted, and if the CDROM is empty or not. The proper fix is to lock subsequent calls, to avoid OOPS/Panic. This is what happens on RHEL5.

As hald periodically checks if a CDROM is inserted, in order to mount the driver and open a window at the X interface, if, for some reason, the previous check didn't ended, this could cause troubles.

On RHEL5, the adopted solution were to lock the calls to disk status stuff. Sometimes, this results on long wait periods (see bz#500416).

While such lock is not backported to RHEL4, a workaround for this issue is to disable hald polling for CDROM.

Comment 9 Mauro Carvalho Chehab 2010-06-02 14:34:28 UTC

(In reply to comment #8)
 
> Another option would be to provide the customer a custom kernel with cdrom
> compiled as module, and blacklist it via /etc/modprobe.conf    

Another solution that won't require using a custom kernel would be to disable it via boot parameter, with something like:

hdc=none
(assuming that the cdrom is at hdc on all machines)

Comment 14 David Jeffery 2010-07-21 17:40:53 UTC

Another case with a recent 4.8 kernel looks to be suffering from the same root issue.

From the core:

PID: 2504   TASK: 10094902030       CPU: 0   COMMAND: "ls"
 #0 [ffffffff8046e610] netpoll_start_netdump at ffffffffa016c366
 #1 [ffffffff8046e640] do_page_fault at ffffffff80124c50
 #2 [ffffffff8046e720] error_exit at ffffffff80110e21
    [exception RIP: __end_that_request_first+202]
    RIP: ffffffff80257a6d  RSP: ffffffff8046e7d8  RFLAGS: 00010006
    RAX: 0000000000000000  RBX: 0000000000000046  RCX: 0000000000000046
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: 00000100e5ddf968
    RBP: 0000014a00000000   R8: 000000000000ea69   R9: 0000000000000206
    R10: ffffffff802769c9  R11: ffffffff802769c9  R12: 0000000000000000
    R13: 00000100e5ddf968  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
 #3 [ffffffff8046e810] cdrom_end_request at ffffffff80281d07
 #4 [ffffffff8046e850] cdrom_pc_intr at ffffffff802829f0
 #5 [ffffffff8046e890] ide_intr at ffffffff80276805
 #6 [ffffffff8046e8d0] handle_IRQ_event at ffffffff80112ff6
 #7 [ffffffff8046e900] do_IRQ at ffffffff80113270
--- <IRQ stack> ---
 #8 [1009723bf58] ret_from_intr at ffffffff801108c3
    RIP: 00000037fcc106b3  RSP: 0000007fbfffee78  RFLAGS: 00000287
    RAX: 0000000000000017  RBX: 0000007fbfffeef0  RCX: 00000037fd033a18
    RDX: 00000000000005e8  RSI: 0000000000000000  RDI: 00000037fd033a18
    RBP: 0000007fbffff1f0   R8: 0000000000000000   R9: 00000037fd034000
    R10: 0000000000000812  R11: 0000000000000206  R12: 00000037fd033a18
    R13: 00000037fd038028  R14: 0000002a956664a0  R15: 0000007fbffff298
    ORIG_RAX: ffffffffffffff0e  CS: 0033  SS: 002b


In this case, the ide-cd driver made it a little past cdrom_analyze_sense_data() and died in __end_that_request_first().  The bad request at 0x100e5ddf968 is a stack address and looks to have been from a call to cdrom_check_status() from within ide_cdrom_drive_status().  The cdrom_check_status() has completed and the process continued on to call cdrom_get_media_event()

crash> bt 3881
PID: 3881   TASK: 100ed033030       CPU: 0   COMMAND: "coda"
 #0 [100e5ddf6b8] schedule at ffffffff80314c74
 #1 [100e5ddf790] wait_for_completion at ffffffff80314eb8
 #2 [100e5ddf810] ide_do_drive_cmd at ffffffff80276983
 #3 [100e5ddf880] cdrom_queue_packet_command at ffffffff80282ba9
 #4 [100e5ddf8f0] ide_cdrom_packet at ffffffff80284154
 #5 [100e5ddfa20] cdrom_get_media_event at ffffffff80287767
 #6 [100e5ddfaa0] ide_cdrom_drive_status at ffffffff8028469e
 #7 [100e5ddfb00] cdrom_open at ffffffff80287d8f
 #8 [100e5ddfdc0] idecd_open at ffffffff80284b6c
 #9 [100e5ddfdf0] do_open at ffffffff80183e15
#10 [100e5ddfe40] blkdev_open at ffffffff8018409c
#11 [100e5ddfe60] __dentry_open at ffffffff8017b8d5
#12 [100e5ddfea0] filp_open at ffffffff8017baae
#13 [100e5ddff50] sys32_open at ffffffff80128d8a

This double attempt to complete the request is being caused by the way the ide-cd's sense requests interact with the noop io scheduler.  From the attached dmesg and looking at the new cores, the original system and the new case were all using the noop scheduler instead of the default cfq.  In RHEL4, the noop scheduler isn't honoring the ELEVATOR_INSERT_FRONT flag.  Instead, it always appends requests to the end of the queue.

The ide-cd driver using a single, pre-allocated request (cdrom_info.request_sense_request) for each cdrom to gather sense information.  Whenever this special request is used, ide uses ELEVATOR_INSERT_FRONT to place it at the front of the queue.  This should insure that the single special request is run right after the failed command.  Since the noop scheduler ignores ELEVATOR_INSERT_FRONT, if another command was already in the queue and it also fails, the request_sense_request request will be used again, overwriting the pointers to the original failed request and being inserted into the queue list a second time.  Thus, the queue list is now also corrupted.

Sense blkdev_dequeue_request() uses list_del_init(), no poisoned pointers are ever inserted into the corrupted list.  Instead, the corrupt list causes the request_sense_request to be executed multiple times.  The extra completion then causes the system to crash since the failed command was already freed by the first completion of request_sense_request.

In summary, the RHEL4 noop io scheduler's current behavior which ignores ELEVATOR_INSERT_FRONT is incompatible with ide-cd's handling of sense requests.

Comment 15 Rick Beldin 2010-07-22 19:30:43 UTC

I'm trying to follow this more closely and a bit confused by "noop scheduler ignores ELEVATOR_INSERT_FRONT".  Looking at: 

drivers/block/noop-iosched.c in 2.6.9-89.0.26.EL

 38 void elevator_noop_add_request(request_queue_t *q, struct request *rq,
 39                                int where)
 40 {
 41         struct list_head *insert = q->queue_head.prev;
 42 
 43         if (where == ELEVATOR_INSERT_FRONT)
 44                 insert = &q->queue_head;
 45 
 46         list_add_tail(&rq->queuelist, &q->queue_head);
 47 
 48         /*
 49          * new merges must not precede this barrier
 50          */
 51         if (rq->flags & REQ_HARDBARRIER)
 52                 q->last_merge = NULL;
 53         else if (!q->last_merge)
 54                 q->last_merge = rq;
 55 }

It seems to be that this would put our request at the head of the queue. 

ide_do_drive_cmd() seems a little tricky and may try to use ELEVATOR_INSERT_BACK.  That would seem to indicate the problem is in the driver, not the elevator. 

Or did I misunderstand?

Comment 20 John Ruemker 2011-04-07 16:39:10 UTC

Created attachment 490591 [details]
dmesg from bootup of affected system

Here is their dmesg from startup of the system that panicked.  We're still waiting to hear back on whether they were using the IDE drive, or what they were doing at the time.  They have 11 "identical" servers, and only 6 of them have demonstrated the issue thus far.  They are unsure what the difference is between them.

FWIW, they've removed noop for now and have been stable since.

Comment 24 Justin I. Nevill 2011-11-08 15:24:38 UTC

Prarit,

Any update on where we stand with this or what else needs to be done?

Thanks very much,
Justin

Note You need to log in before you can comment on or make changes to this bug.