Description of problem: The DRIVER_LOCK blocks interrupts while it does its while loop. RHEL3's driver seems to not have this problem. Version-Release number of selected component (if applicable): e.27 How reproducible: fleetingly Steps to Reproduce: 1. Run e.27 with aic_7xxx 2. Wait 3. Actual results: Expected results: Additional info:
See issue tracker 28301 It looks like the real kernel trace would be something like this: EIP is at aic7xxx_handle_scsiint [aic7xxx] 0x258 eax: 0000000d ebx: f7745084 ecx: f8848000 edx: 00000000 esi: f8848000 edi: 00000000 ebp: 00000000 esp: c22f9bc8 ds: 0018 es: 0018 ss: 0018 Process swapper (pid: 0, stackpage=c22f9000) Stack: 00000013 f8a89000 f7370b20 00000206 d83d7178 c0ab6f00 00000020 d83d7040 c0201c37 d83d7040 c2653360 000004ec 00000001 00000056 ea57dd6c 00000001 00000286 cdeae5a0 c011970d ea57c000 00000282 d83d7040 00000001 03000014 aic7xxx_isr do_aic7xxx_isr handle_IRQ_event do_IRQ call_do_IRQ aic7xxx_done_cmds_complete (takes SCSI interrupt) aic7xxx_handle_scsiint aic7xxx_isr aic7xxx_abort scsi_abort scsi_old_times_out __run_timers run_local_timers smp_apic_timer_interrupt [kernel] 0xb8 do_IRQ [kernel] 0xe3 default_idle [kernel] 0x0 The CPU was in idle, took a timer interrupt, run some timer functions that were due, including the scsi_old_times_out() function. It determined that a SCSI command has timed out, so it issued an abort. This led to aic7xxx_done_cmds_complete(), which took a SCSI interrupt while it was executing. In handling the interrupt, aic7xxx_isr() called aic7xxx_handle_scsiint() which panicked on a NULL reference at 0x258 bytes into the function. So -- my guess is that it was touching something that aic7xxx_done_cmds_complete() was fiddling with when it took the SCSI interrupt. What I find interesting is the changes made to that function between AS2.1 and RHEL3. Here's the AS2.1 version: static void aic7xxx_done_cmds_complete(struct aic7xxx_host *p) { Scsi_Cmnd *cmd; while (p->completeq.head != NULL) { cmd = p->completeq.head; p->completeq.head = (Scsi_Cmnd *)cmd->host_scribble; cmd->host_scribble = NULL; cmd->scsi_done(cmd); } } Here's the RHEL3 version: static void aic7xxx_done_cmds_complete(struct aic7xxx_host *p) { Scsi_Cmnd *cmd; #if LINUX_VERSION_CODE < KERNEL_VERSION(2,1,95) unsigned int cpu_flags = 0; #endif DRIVER_LOCK while (p->completeq.head != NULL) { cmd = p->completeq.head; p->completeq.head = (Scsi_Cmnd *)cmd->host_scribble; cmd->host_scribble = NULL; cmd->scsi_done(cmd); } DRIVER_UNLOCK } The DRIVER_LOCK blocks interrupts while it does its while loop. In the AS2.1 version, it would appear vulnerable unless they already were blocked.