we've got a kernel crash with kernel-2.6.18-164.el5: ------------------------------------------------------- Sep 16 18:52:12 test kernel: BUG: warning at drivers/ata/libata-sff.c:1327/ata_sff_hsm_move() (Tainted: G ) Sep 16 18:52:12 test kernel: [<f898f391>] ata_sff_hsm_move+0x69a/0x6e9 [libata] Sep 16 18:52:12 test kernel: [<c041d83c>] __wake_up+0x2a/0x3d Sep 16 18:52:12 test kernel: [<f8990083>] ata_sff_interrupt+0x12b/0x1bc [libata] Sep 16 18:52:12 test kernel: [<c044d72d>] handle_IRQ_event+0x45/0x8c Sep 16 18:52:12 test kernel: [<c044d7f8>] __do_IRQ+0x84/0xd6 Sep 16 18:52:12 test kernel: [<c044d774>] __do_IRQ+0x0/0xd6 Sep 16 18:52:12 test kernel: [<c04074b2>] do_IRQ+0x99/0xc3 Sep 16 18:52:12 test kernel: [<c0405946>] common_interrupt+0x1a/0x20 Sep 16 18:52:12 test kernel: [<c05fd380>] _spin_unlock_irqrestore+0x8/0x9 Sep 16 18:52:12 test kernel: [<f8852a7b>] scsi_dispatch_cmd+0x214/0x281 [scsi_mod] Sep 16 18:52:12 test kernel: [<f8857303>] scsi_request_fn+0x247/0x2f9 [scsi_mod] Sep 16 18:52:12 test kernel: [<c04df30f>] blk_run_queue+0x37/0x63 Sep 16 18:52:12 test kernel: [<f8856478>] scsi_next_command+0x25/0x2f [scsi_mod] Sep 16 18:52:12 test kernel: [<f88565aa>] scsi_end_request+0xa1/0xab [scsi_mod] Sep 16 18:52:12 test kernel: [<f88566f4>] scsi_io_completion+0x140/0x2ea [scsi_mod] Sep 16 18:52:12 test kernel: [<f882f9b1>] sd_rw_intr+0x2aa/0x2ef [sd_mod] Sep 16 18:52:12 test kernel: [<f88523b9>] scsi_finish_command+0x73/0x77 [scsi_mod] Sep 16 18:52:12 test kernel: [<c04df7a0>] blk_done_softirq+0x4d/0x58 Sep 16 18:52:12 test kernel: [<c0428ecb>] __do_softirq+0x87/0x114 Sep 16 18:52:12 test kernel: [<c04073cf>] do_softirq+0x52/0x9c Sep 16 18:52:12 test kernel: [<c044d774>] __do_IRQ+0x0/0xd6 Sep 16 18:52:12 test kernel: [<c04074ce>] do_IRQ+0xb5/0xc3 Sep 16 18:52:12 test kernel: [<c0405946>] common_interrupt+0x1a/0x20 Sep 16 18:52:12 test kernel: [<c0403ce7>] mwait_idle+0x25/0x38 Sep 16 18:52:12 test kernel: [<c0403ca8>] cpu_idle+0x9f/0xb9 Sep 16 18:52:12 test kernel: ======================= -------------------------------------------------------
For this to BUG() in ata_sff_hsm_move(), then the ata_port task state is an unexpected value. What driver are you using? Would you please attach the output of dmesg after booting, lsmod and "lspci -vvxxx"? Also, do you have a method to reproduce? Or can you describe what kind of load the system was under, and does the crash happen consistently? Thanks
Created attachment 362518 [details] the requested info
I can confirm this bug as it has consistently caused the same SATA port to fail and fail the RAID 5 array to which it is associated. So far it has occurred and failed on the following dates whilst there has been no significant load on the server. Monitoring showed no issues except for a dramatic increase in IO wait from a usual 0.2s to ~1.3s. Following a reboot the drive is re-added to the array and behaves normally (until the next time). 30th July 20:09 5th July 11:12 9th June 19:05 28th May 05:39 Controller being used [6 ports]: Marvell Technology Group Ltd. 88SE6121 SATA II Controller (rev b1) Jul 30 20:09:52 serv kernel: BUG: warning at drivers/ata/libata-sff.c:1327/ata_sff_hsm_move() (Tainted: G ) Jul 30 20:09:52 serv kernel: Jul 30 20:09:52 serv kernel: Call Trace: Jul 30 20:09:52 serv kernel: <IRQ> [<ffffffff880f788d>] :libata:ata_sff_hsm_move+0x6cb/0x722 Jul 30 20:09:52 serv kernel: [<ffffffff8002e1c9>] __wake_up+0x38/0x4f Jul 30 20:09:52 serv kernel: [<ffffffff880f8738>] :libata:ata_sff_interrupt+0x148/0x1d8 Jul 30 20:09:52 serv kernel: [<ffffffff80010bd1>] handle_IRQ_event+0x51/0xa6 Jul 30 20:09:52 serv kernel: [<ffffffff800baec9>] __do_IRQ+0xa4/0x103 Jul 30 20:09:52 serv kernel: [<ffffffff800123da>] __do_softirq+0x89/0x133 Jul 30 20:09:52 serv kernel: [<ffffffff8006ca11>] do_IRQ+0xe7/0xf5 Jul 30 20:09:52 serv kernel: [<ffffffff80057289>] mwait_idle+0x0/0x4a Jul 30 20:09:52 serv kernel: [<ffffffff8005d615>] ret_from_intr+0x0/0xa Jul 30 20:09:52 serv kernel: <EOI> [<ffffffff800572bf>] mwait_idle+0x36/0x4a Jul 30 20:09:52 serv kernel: [<ffffffff80049477>] cpu_idle+0x95/0xb8 Jul 30 20:09:52 serv kernel: [<ffffffff8007797a>] start_secondary+0x498/0x4a7 Jul 30 20:09:52 serv kernel: Jul 30 20:09:52 serv kernel: ata2.00: exception Emask 0x50 SAct 0x0 SErr 0x280900 action 0x6 Jul 30 20:09:52 serv kernel: ata2.00: BMDMA stat 0x66 Jul 30 20:09:52 serv kernel: ata2: SError: { UnrecovData HostInt 10B8B BadCRC } Jul 30 20:09:52 serv kernel: ata2.00: cmd 25/00:e8:57:e4:33/00:01:04:00:00/e0 tag 0 dma 249856 in Jul 30 20:09:52 serv kernel: res 51/84:b9:86:e4:33/84:01:04:00:00/04 Emask 0x70 (host bus error) Jul 30 20:09:52 serv kernel: ata2.00: status: { DRDY ERR } Jul 30 20:09:52 serv kernel: ata2.00: error: { ICRC ABRT } Jul 30 20:09:52 serv kernel: ata2: hard resetting link Jul 30 20:09:52 serv kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Jul 30 20:09:52 serv kernel: ata2.00: configured for UDMA/33 Jul 30 20:09:52 serv kernel: ata2.01: configured for UDMA/133 Jul 30 20:09:52 serv kernel: ata2: EH complete Attached: comment3-bz524243.tgz
Created attachment 435641 [details] Information related to comment #3 Contents: comment3-dmesg-after-reboot.txt comment3-dmesg-before-reboot.txt comment3-lsmod.txt comment3-lspci.txt
I also can confirm this problem. [2.6.18-194.8.1] kernel parameter: elevator=deadline ext3 options: barrier=1,defaults Nov 9 08:20:58 share kernel: BUG: warning at drivers/ata/libata-sff.c:1327/ata_sff_hsm_move() (Tainted: P ) Nov 9 08:20:58 share kernel: [<f8c65524>] ata_sff_hsm_move+0x68c/0x6de [libata] Nov 9 08:20:58 share kernel: [<c041f274>] fairsched_schedule+0x352/0x5ee Nov 9 08:20:58 share kernel: [<f8c6621c>] ata_sff_interrupt+0x12b/0x1bb [libata] Nov 9 08:20:58 share kernel: [<c0456271>] handle_IRQ_event+0x45/0x8c Nov 9 08:20:58 share kernel: [<c045633c>] __do_IRQ+0x84/0xd6 Nov 9 08:20:58 share kernel: [<c0406575>] do_IRQ+0xa3/0xb2 Nov 9 08:20:58 share kernel: [<c0625d9a>] common_interrupt+0x1a/0x20 Nov 9 08:20:58 share kernel: [<c0403d7a>] mwait_idle+0x25/0x38 Nov 9 08:20:58 share kernel: [<c0403d3f>] cpu_idle+0x7a/0x90 Nov 9 08:20:58 share kernel: [<c0764c17>] start_kernel+0x399/0x3a1 Nov 9 08:20:58 share kernel: ======================= Nov 9 08:20:58 share kernel: ata3.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 Nov 9 08:20:58 share kernel: ata3.01: BMDMA stat 0x66 Nov 9 08:20:58 share kernel: ata3.01: cmd 25/00:00:c7:13:db/00:01:34:00:00/f0 tag 0 dma 131072 in Nov 9 08:20:58 share kernel: res 51/84:00:c6:14:db/84:00:34:00:00/f0 Emask 0x30 (host bus error) Nov 9 08:20:58 share kernel: ata3.01: status: { DRDY ERR } Nov 9 08:20:58 share kernel: ata3.01: error: { ICRC ABRT } Nov 9 08:20:58 share kernel: ata3: soft resetting link Nov 9 08:20:58 share kernel: ata3.00: configured for UDMA/133 Nov 9 08:20:58 share kernel: ata3.01: configured for UDMA/133 Nov 9 08:20:58 share kernel: ata3: EH complete Nov 9 08:20:58 share kernel: BUG: warning at drivers/ata/libata-sff.c:1327/ata_sff_hsm_move() (Tainted: P ) Nov 9 08:20:58 share kernel: [<f8c65524>] ata_sff_hsm_move+0x68c/0x6de [libata] Nov 9 08:20:58 share kernel: [<f8c6621c>] ata_sff_interrupt+0x12b/0x1bb [libata] Nov 9 08:20:58 share kernel: [<c0456271>] handle_IRQ_event+0x45/0x8c Nov 9 08:20:59 share kernel: [<c045633c>] __do_IRQ+0x84/0xd6 Nov 9 08:20:59 share kernel: [<c0406575>] do_IRQ+0xa3/0xb2 Nov 9 08:20:59 share kernel: [<c0625d9a>] common_interrupt+0x1a/0x20 Nov 9 08:20:59 share kernel: [<c0403d7a>] mwait_idle+0x25/0x38 Nov 9 08:20:59 share kernel: [<c0403d3f>] cpu_idle+0x7a/0x90 Nov 9 08:20:59 share kernel: [<c0764c17>] start_kernel+0x399/0x3a1 Nov 9 08:20:59 share kernel: ======================= Nov 9 08:20:59 share kernel: ata3.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 Nov 9 08:20:59 share kernel: ata3.01: BMDMA stat 0x66 Nov 9 08:20:59 share kernel: ata3.01: cmd 25/00:00:c7:13:db/00:01:34:00:00/f0 tag 0 dma 131072 in Nov 9 08:20:59 share kernel: res 51/84:cf:f8:13:db/84:00:34:00:00/f0 Emask 0x30 (host bus error) Nov 9 08:20:59 share kernel: ata3.01: status: { DRDY ERR } Nov 9 08:20:59 share kernel: ata3.01: error: { ICRC ABRT } Nov 9 08:20:59 share kernel: ata3: soft resetting link Nov 9 08:20:59 share kernel: ata3.00: configured for UDMA/133 Nov 9 08:20:59 share kernel: ata3.01: configured for UDMA/133 Nov 9 08:20:59 share kernel: ata3: EH complete Nov 9 08:20:59 share kernel: SCSI device sdc: 781420655 512-byte hdwr sectors (400087 MB)
Created attachment 459058 [details] logs for comment #7 messages, lsmod, lspci
May 27 03:27:38 xn6 kernel: WARNING: at drivers/ata/libata-sff.c:1327 ata_sff_hsm_move() May 27 03:27:38 xn6 kernel: Call Trace: May 27 03:27:38 xn6 kernel: <IRQ> [<ffffffff880d44a5>] :libata:ata_sff_hsm_move+0x601/0x658 May 27 03:27:39 xn6 kernel: [<ffffffff880d535e>] :libata:ata_sff_interrupt+0x148/0x1da May 27 03:27:39 xn6 kernel: [<ffffffff802116a8>] handle_IRQ_event+0x55/0xae May 27 03:27:40 xn6 kernel: [<ffffffff802b5339>] __do_IRQ+0xa4/0x103 May 27 03:27:40 xn6 kernel: [<ffffffff80294b52>] run_timer_softirq+0x23a/0x249 May 27 03:27:40 xn6 kernel: [<ffffffff8026e6fa>] do_IRQ+0xc6/0xcf May 27 03:27:40 xn6 kernel: [<ffffffff803b78b1>] evtchn_do_upcall+0x147/0x201 May 27 03:27:40 xn6 kernel: [<ffffffff802608d6>] do_hypervisor_callback+0x1e/0x2c May 27 03:27:40 xn6 kernel: <EOI> [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000 May 27 03:27:40 xn6 kernel: [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000 May 27 03:27:40 xn6 kernel: [<ffffffff8026ff2a>] raw_safe_halt+0x84/0xa8 May 27 03:27:40 xn6 kernel: [<ffffffff8026d4e1>] xen_idle+0x38/0x4a May 27 03:27:40 xn6 kernel: [<ffffffff8024b10d>] cpu_idle+0x97/0xba May 27 03:27:40 xn6 kernel: [<ffffffff806a6b0f>] start_kernel+0x21f/0x224 May 27 03:27:40 xn6 kernel: [<ffffffff806a61e5>] _sinittext+0x1e5/0x1eb May 27 03:27:40 xn6 kernel: May 27 03:27:40 xn6 kernel: ata2: limiting SATA link speed to 1.5 Gbps May 27 03:27:40 xn6 kernel: ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x280100 action 0x6 May 27 03:27:40 xn6 kernel: ata2.00: BMDMA stat 0x26 May 27 03:27:40 xn6 kernel: ata2: SError: { UnrecovData 10B8B BadCRC } May 27 03:27:40 xn6 kernel: ata2.00: cmd 25/00:00:42:73:d5/00:03:02:00:00/e0 tag 0 dma 393216 in May 27 03:27:40 xn6 kernel: res 51/84:af:93:73:d5/84:02:02:00:00/e0 Emask 0x30 (host bus error) May 27 03:27:40 xn6 kernel: ata2.00: status: { DRDY ERR } May 27 03:27:40 xn6 kernel: ata2.00: error: { ICRC ABRT } May 27 03:27:40 xn6 kernel: ata2: hard resetting link Linux 2.6.18-238.9.1.el5xen #1 SMP Tue Apr 12 18:53:56 EDT 2011 x86_64 lspci 00:00.0 Host bridge: Intel Corporation Core Processor DMI (rev 11) 00:03.0 PCI bridge: Intel Corporation Core Processor PCI Express Root Port 1 (rev 11) 00:05.0 PCI bridge: Intel Corporation Core Processor PCI Express Root Port 3 (rev 11) 00:08.0 System peripheral: Intel Corporation Core Processor System Management Registers (rev 11) 00:08.1 System peripheral: Intel Corporation Core Processor Semaphore and Scratchpad Registers (rev 11) 00:08.2 System peripheral: Intel Corporation Core Processor System Control and Status Registers (rev 11) 00:08.3 System peripheral: Intel Corporation Core Processor Miscellaneous Registers (rev 11) 00:10.0 System peripheral: Intel Corporation Core Processor QPI Link (rev 11) 00:10.1 System peripheral: Intel Corporation Core Processor QPI Routing and Protocol Registers (rev 11) 00:1a.0 USB Controller: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller (rev 05) 00:1c.0 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 1 (rev 05) 00:1c.4 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 5 (rev 05) 00:1c.5 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 6 (rev 05) 00:1d.0 USB Controller: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller (rev 05) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a5) 00:1f.0 ISA bridge: Intel Corporation 3400 Series Chipset LPC Interface Controller (rev 05) 00:1f.2 IDE interface: Intel Corporation 5 Series/3400 Series Chipset 4 port SATA IDE Controller (rev 05) 00:1f.3 SMBus: Intel Corporation 5 Series/3400 Series Chipset SMBus Controller (rev 05) 00:1f.5 IDE interface: Intel Corporation 5 Series/3400 Series Chipset 2 port SATA IDE Controller (rev 05) 04:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection 05:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection 06:03.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200eW WPCM450 (rev 0a) May 26 19:12:21 xn6 kernel: ata1.00: ATA-8: WDC WD5002ABYS-02B1B0, 02.03B03, max UDMA/133 May 26 19:12:21 xn6 kernel: ata1.00: 976773168 sectors, multi 16: LBA48 NCQ (depth 0/32) May 26 19:12:21 xn6 kernel: ata1.00: configured for UDMA/133 May 26 19:12:21 xn6 kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) May 26 19:12:21 xn6 kernel: ata2.00: ATA-8: WDC WD5002ABYS-02B1B0, 02.03B03, max UDMA/133 May 26 19:12:21 xn6 kernel: ata2.00: 976773168 sectors, multi 16: LBA48 NCQ (depth 0/32) May 26 19:12:21 xn6 kernel: ata2.00: configured for UDMA/133
In recent kernel serie 3.0.x a related issue is observed: Look at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/965213
You may have a related report in your own bugtracker: https://bugzilla.redhat.com/show_bug.cgi?id=474552
This Bugzilla has been reviewed by Red Hat and is not planned on being addressed in Red Hat Enterprise Linux 5, and therefore is being closed. If this bug is critical to production systems, please contact your Red Hat support representative and provide a sufficient business justification in order to re-open it.