Bug 524243 - kernel crash in libata
Summary: kernel crash in libata
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.4
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: David Milburn
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-09-18 14:02 UTC by Levente Farkas
Modified: 2013-10-30 22:17 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-10-30 22:17:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
the requested info (11.75 KB, application/octet-stream)
2009-09-24 15:34 UTC, Levente Farkas
no flags Details
Information related to comment #3 (16.98 KB, application/x-gzip)
2010-07-30 20:58 UTC, rob
no flags Details
logs for comment #7 (36.43 KB, application/x-gzip)
2010-11-09 09:45 UTC, customer.joe
no flags Details

Description Levente Farkas 2009-09-18 14:02:10 UTC
we've got a kernel crash with kernel-2.6.18-164.el5:
-------------------------------------------------------
Sep 16 18:52:12 test kernel: BUG: warning at drivers/ata/libata-sff.c:1327/ata_sff_hsm_move() (Tainted: G     )     
Sep 16 18:52:12 test kernel:  [<f898f391>] ata_sff_hsm_move+0x69a/0x6e9 [libata]
Sep 16 18:52:12 test kernel:  [<c041d83c>] __wake_up+0x2a/0x3d
Sep 16 18:52:12 test kernel:  [<f8990083>] ata_sff_interrupt+0x12b/0x1bc [libata]
Sep 16 18:52:12 test kernel:  [<c044d72d>] handle_IRQ_event+0x45/0x8c
Sep 16 18:52:12 test kernel:  [<c044d7f8>] __do_IRQ+0x84/0xd6
Sep 16 18:52:12 test kernel:  [<c044d774>] __do_IRQ+0x0/0xd6
Sep 16 18:52:12 test kernel:  [<c04074b2>] do_IRQ+0x99/0xc3
Sep 16 18:52:12 test kernel:  [<c0405946>] common_interrupt+0x1a/0x20
Sep 16 18:52:12 test kernel:  [<c05fd380>] _spin_unlock_irqrestore+0x8/0x9
Sep 16 18:52:12 test kernel:  [<f8852a7b>] scsi_dispatch_cmd+0x214/0x281 [scsi_mod]
Sep 16 18:52:12 test kernel:  [<f8857303>] scsi_request_fn+0x247/0x2f9 [scsi_mod]
Sep 16 18:52:12 test kernel:  [<c04df30f>] blk_run_queue+0x37/0x63
Sep 16 18:52:12 test kernel:  [<f8856478>] scsi_next_command+0x25/0x2f [scsi_mod]
Sep 16 18:52:12 test kernel:  [<f88565aa>] scsi_end_request+0xa1/0xab [scsi_mod]
Sep 16 18:52:12 test kernel:  [<f88566f4>] scsi_io_completion+0x140/0x2ea [scsi_mod]
Sep 16 18:52:12 test kernel:  [<f882f9b1>] sd_rw_intr+0x2aa/0x2ef [sd_mod]
Sep 16 18:52:12 test kernel:  [<f88523b9>] scsi_finish_command+0x73/0x77 [scsi_mod]
Sep 16 18:52:12 test kernel:  [<c04df7a0>] blk_done_softirq+0x4d/0x58
Sep 16 18:52:12 test kernel:  [<c0428ecb>] __do_softirq+0x87/0x114
Sep 16 18:52:12 test kernel:  [<c04073cf>] do_softirq+0x52/0x9c
Sep 16 18:52:12 test kernel:  [<c044d774>] __do_IRQ+0x0/0xd6
Sep 16 18:52:12 test kernel:  [<c04074ce>] do_IRQ+0xb5/0xc3
Sep 16 18:52:12 test kernel:  [<c0405946>] common_interrupt+0x1a/0x20
Sep 16 18:52:12 test kernel:  [<c0403ce7>] mwait_idle+0x25/0x38
Sep 16 18:52:12 test kernel:  [<c0403ca8>] cpu_idle+0x9f/0xb9
Sep 16 18:52:12 test kernel:  =======================
-------------------------------------------------------

Comment 1 David Milburn 2009-09-22 17:46:22 UTC
For this to BUG() in ata_sff_hsm_move(), then the ata_port task state is an
unexpected value. What driver are you using?

Would you please attach the output of dmesg after booting, lsmod and "lspci -vvxxx"? Also, do you have a method to reproduce? Or can you describe what
kind of load the system was under, and does the crash happen consistently? Thanks

Comment 2 Levente Farkas 2009-09-24 15:34:43 UTC
Created attachment 362518 [details]
the requested info

Comment 5 rob 2010-07-30 20:56:21 UTC
I can confirm this bug as it has consistently caused the same SATA port to fail and fail the RAID 5 array to which it is associated.
So far it has occurred and failed on the following dates whilst there has been no significant load on the server. Monitoring showed no issues except for a dramatic increase in IO wait from a usual 0.2s to ~1.3s.
Following a reboot the drive is re-added to the array and behaves normally (until the next time).

30th July 20:09
5th July 11:12
9th June 19:05
28th May 05:39

Controller being used [6 ports]: Marvell Technology Group Ltd. 88SE6121 SATA II Controller (rev b1)

Jul 30 20:09:52 serv kernel: BUG: warning at drivers/ata/libata-sff.c:1327/ata_sff_hsm_move() (Tainted: G     )
Jul 30 20:09:52 serv kernel:
Jul 30 20:09:52 serv kernel: Call Trace:
Jul 30 20:09:52 serv kernel:  <IRQ>  [<ffffffff880f788d>] :libata:ata_sff_hsm_move+0x6cb/0x722
Jul 30 20:09:52 serv kernel:  [<ffffffff8002e1c9>] __wake_up+0x38/0x4f
Jul 30 20:09:52 serv kernel:  [<ffffffff880f8738>] :libata:ata_sff_interrupt+0x148/0x1d8
Jul 30 20:09:52 serv kernel:  [<ffffffff80010bd1>] handle_IRQ_event+0x51/0xa6
Jul 30 20:09:52 serv kernel:  [<ffffffff800baec9>] __do_IRQ+0xa4/0x103
Jul 30 20:09:52 serv kernel:  [<ffffffff800123da>] __do_softirq+0x89/0x133
Jul 30 20:09:52 serv kernel:  [<ffffffff8006ca11>] do_IRQ+0xe7/0xf5
Jul 30 20:09:52 serv kernel:  [<ffffffff80057289>] mwait_idle+0x0/0x4a
Jul 30 20:09:52 serv kernel:  [<ffffffff8005d615>] ret_from_intr+0x0/0xa
Jul 30 20:09:52 serv kernel:  <EOI>  [<ffffffff800572bf>] mwait_idle+0x36/0x4a
Jul 30 20:09:52 serv kernel:  [<ffffffff80049477>] cpu_idle+0x95/0xb8
Jul 30 20:09:52 serv kernel:  [<ffffffff8007797a>] start_secondary+0x498/0x4a7
Jul 30 20:09:52 serv kernel:
Jul 30 20:09:52 serv kernel: ata2.00: exception Emask 0x50 SAct 0x0 SErr 0x280900 action 0x6
Jul 30 20:09:52 serv kernel: ata2.00: BMDMA stat 0x66
Jul 30 20:09:52 serv kernel: ata2: SError: { UnrecovData HostInt 10B8B BadCRC }
Jul 30 20:09:52 serv kernel: ata2.00: cmd 25/00:e8:57:e4:33/00:01:04:00:00/e0 tag 0 dma 249856 in
Jul 30 20:09:52 serv kernel:          res 51/84:b9:86:e4:33/84:01:04:00:00/04 Emask 0x70 (host bus error)
Jul 30 20:09:52 serv kernel: ata2.00: status: { DRDY ERR }
Jul 30 20:09:52 serv kernel: ata2.00: error: { ICRC ABRT }
Jul 30 20:09:52 serv kernel: ata2: hard resetting link
Jul 30 20:09:52 serv kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Jul 30 20:09:52 serv kernel: ata2.00: configured for UDMA/33
Jul 30 20:09:52 serv kernel: ata2.01: configured for UDMA/133
Jul 30 20:09:52 serv kernel: ata2: EH complete

Attached: comment3-bz524243.tgz

Comment 6 rob 2010-07-30 20:58:00 UTC
Created attachment 435641 [details]
Information related to comment #3

Contents:
comment3-dmesg-after-reboot.txt
comment3-dmesg-before-reboot.txt
comment3-lsmod.txt
comment3-lspci.txt

Comment 7 customer.joe 2010-11-09 09:43:02 UTC
I also can confirm this problem. [2.6.18-194.8.1]

kernel parameter: elevator=deadline
ext3 options: barrier=1,defaults


Nov  9 08:20:58 share kernel: BUG: warning at drivers/ata/libata-sff.c:1327/ata_sff_hsm_move() (Tainted:  P     )
Nov  9 08:20:58 share kernel:  [<f8c65524>] ata_sff_hsm_move+0x68c/0x6de [libata]
Nov  9 08:20:58 share kernel:  [<c041f274>] fairsched_schedule+0x352/0x5ee
Nov  9 08:20:58 share kernel:  [<f8c6621c>] ata_sff_interrupt+0x12b/0x1bb [libata]
Nov  9 08:20:58 share kernel:  [<c0456271>] handle_IRQ_event+0x45/0x8c
Nov  9 08:20:58 share kernel:  [<c045633c>] __do_IRQ+0x84/0xd6
Nov  9 08:20:58 share kernel:  [<c0406575>] do_IRQ+0xa3/0xb2
Nov  9 08:20:58 share kernel:  [<c0625d9a>] common_interrupt+0x1a/0x20
Nov  9 08:20:58 share kernel:  [<c0403d7a>] mwait_idle+0x25/0x38
Nov  9 08:20:58 share kernel:  [<c0403d3f>] cpu_idle+0x7a/0x90
Nov  9 08:20:58 share kernel:  [<c0764c17>] start_kernel+0x399/0x3a1
Nov  9 08:20:58 share kernel:  =======================
Nov  9 08:20:58 share kernel: ata3.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
Nov  9 08:20:58 share kernel: ata3.01: BMDMA stat 0x66
Nov  9 08:20:58 share kernel: ata3.01: cmd 25/00:00:c7:13:db/00:01:34:00:00/f0 tag 0 dma 131072 in
Nov  9 08:20:58 share kernel:          res 51/84:00:c6:14:db/84:00:34:00:00/f0 Emask 0x30 (host bus error)
Nov  9 08:20:58 share kernel: ata3.01: status: { DRDY ERR }
Nov  9 08:20:58 share kernel: ata3.01: error: { ICRC ABRT }
Nov  9 08:20:58 share kernel: ata3: soft resetting link
Nov  9 08:20:58 share kernel: ata3.00: configured for UDMA/133
Nov  9 08:20:58 share kernel: ata3.01: configured for UDMA/133
Nov  9 08:20:58 share kernel: ata3: EH complete
Nov  9 08:20:58 share kernel: BUG: warning at drivers/ata/libata-sff.c:1327/ata_sff_hsm_move() (Tainted:  P     )
Nov  9 08:20:58 share kernel:  [<f8c65524>] ata_sff_hsm_move+0x68c/0x6de [libata]
Nov  9 08:20:58 share kernel:  [<f8c6621c>] ata_sff_interrupt+0x12b/0x1bb [libata]
Nov  9 08:20:58 share kernel:  [<c0456271>] handle_IRQ_event+0x45/0x8c
Nov  9 08:20:59 share kernel:  [<c045633c>] __do_IRQ+0x84/0xd6
Nov  9 08:20:59 share kernel:  [<c0406575>] do_IRQ+0xa3/0xb2
Nov  9 08:20:59 share kernel:  [<c0625d9a>] common_interrupt+0x1a/0x20
Nov  9 08:20:59 share kernel:  [<c0403d7a>] mwait_idle+0x25/0x38
Nov  9 08:20:59 share kernel:  [<c0403d3f>] cpu_idle+0x7a/0x90
Nov  9 08:20:59 share kernel:  [<c0764c17>] start_kernel+0x399/0x3a1
Nov  9 08:20:59 share kernel:  =======================
Nov  9 08:20:59 share kernel: ata3.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
Nov  9 08:20:59 share kernel: ata3.01: BMDMA stat 0x66
Nov  9 08:20:59 share kernel: ata3.01: cmd 25/00:00:c7:13:db/00:01:34:00:00/f0 tag 0 dma 131072 in
Nov  9 08:20:59 share kernel:          res 51/84:cf:f8:13:db/84:00:34:00:00/f0 Emask 0x30 (host bus error)
Nov  9 08:20:59 share kernel: ata3.01: status: { DRDY ERR }
Nov  9 08:20:59 share kernel: ata3.01: error: { ICRC ABRT }
Nov  9 08:20:59 share kernel: ata3: soft resetting link
Nov  9 08:20:59 share kernel: ata3.00: configured for UDMA/133
Nov  9 08:20:59 share kernel: ata3.01: configured for UDMA/133
Nov  9 08:20:59 share kernel: ata3: EH complete
Nov  9 08:20:59 share kernel: SCSI device sdc: 781420655 512-byte hdwr sectors (400087 MB)

Comment 8 customer.joe 2010-11-09 09:45:25 UTC
Created attachment 459058 [details]
logs for comment #7

messages, lsmod, lspci

Comment 9 Wes 2011-05-27 07:58:52 UTC
May 27 03:27:38 xn6 kernel: WARNING: at drivers/ata/libata-sff.c:1327 ata_sff_hsm_move()
May 27 03:27:38 xn6 kernel: Call Trace:
May 27 03:27:38 xn6 kernel:  <IRQ>  [<ffffffff880d44a5>] :libata:ata_sff_hsm_move+0x601/0x658
May 27 03:27:39 xn6 kernel:  [<ffffffff880d535e>] :libata:ata_sff_interrupt+0x148/0x1da
May 27 03:27:39 xn6 kernel:  [<ffffffff802116a8>] handle_IRQ_event+0x55/0xae
May 27 03:27:40 xn6 kernel:  [<ffffffff802b5339>] __do_IRQ+0xa4/0x103
May 27 03:27:40 xn6 kernel:  [<ffffffff80294b52>] run_timer_softirq+0x23a/0x249
May 27 03:27:40 xn6 kernel:  [<ffffffff8026e6fa>] do_IRQ+0xc6/0xcf
May 27 03:27:40 xn6 kernel:  [<ffffffff803b78b1>] evtchn_do_upcall+0x147/0x201
May 27 03:27:40 xn6 kernel:  [<ffffffff802608d6>] do_hypervisor_callback+0x1e/0x2c
May 27 03:27:40 xn6 kernel:  <EOI>  [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000
May 27 03:27:40 xn6 kernel:  [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000
May 27 03:27:40 xn6 kernel:  [<ffffffff8026ff2a>] raw_safe_halt+0x84/0xa8
May 27 03:27:40 xn6 kernel:  [<ffffffff8026d4e1>] xen_idle+0x38/0x4a
May 27 03:27:40 xn6 kernel:  [<ffffffff8024b10d>] cpu_idle+0x97/0xba
May 27 03:27:40 xn6 kernel:  [<ffffffff806a6b0f>] start_kernel+0x21f/0x224
May 27 03:27:40 xn6 kernel:  [<ffffffff806a61e5>] _sinittext+0x1e5/0x1eb
May 27 03:27:40 xn6 kernel: 
May 27 03:27:40 xn6 kernel: ata2: limiting SATA link speed to 1.5 Gbps
May 27 03:27:40 xn6 kernel: ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x280100 action 0x6
May 27 03:27:40 xn6 kernel: ata2.00: BMDMA stat 0x26
May 27 03:27:40 xn6 kernel: ata2: SError: { UnrecovData 10B8B BadCRC }
May 27 03:27:40 xn6 kernel: ata2.00: cmd 25/00:00:42:73:d5/00:03:02:00:00/e0 tag 0 dma 393216 in
May 27 03:27:40 xn6 kernel:          res 51/84:af:93:73:d5/84:02:02:00:00/e0 Emask 0x30 (host bus error)
May 27 03:27:40 xn6 kernel: ata2.00: status: { DRDY ERR }
May 27 03:27:40 xn6 kernel: ata2.00: error: { ICRC ABRT }
May 27 03:27:40 xn6 kernel: ata2: hard resetting link


Linux 2.6.18-238.9.1.el5xen #1 SMP Tue Apr 12 18:53:56 EDT 2011 x86_64

lspci
00:00.0 Host bridge: Intel Corporation Core Processor DMI (rev 11)
00:03.0 PCI bridge: Intel Corporation Core Processor PCI Express Root Port 1 (rev 11)
00:05.0 PCI bridge: Intel Corporation Core Processor PCI Express Root Port 3 (rev 11)
00:08.0 System peripheral: Intel Corporation Core Processor System Management Registers (rev 11)
00:08.1 System peripheral: Intel Corporation Core Processor Semaphore and Scratchpad Registers (rev 11)
00:08.2 System peripheral: Intel Corporation Core Processor System Control and Status Registers (rev 11)
00:08.3 System peripheral: Intel Corporation Core Processor Miscellaneous Registers (rev 11)
00:10.0 System peripheral: Intel Corporation Core Processor QPI Link (rev 11)
00:10.1 System peripheral: Intel Corporation Core Processor QPI Routing and Protocol Registers (rev 11)
00:1a.0 USB Controller: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller (rev 05)
00:1c.0 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 1 (rev 05)
00:1c.4 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 5 (rev 05)
00:1c.5 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 6 (rev 05)
00:1d.0 USB Controller: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller (rev 05)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a5)
00:1f.0 ISA bridge: Intel Corporation 3400 Series Chipset LPC Interface Controller (rev 05)
00:1f.2 IDE interface: Intel Corporation 5 Series/3400 Series Chipset 4 port SATA IDE Controller (rev 05)
00:1f.3 SMBus: Intel Corporation 5 Series/3400 Series Chipset SMBus Controller (rev 05)
00:1f.5 IDE interface: Intel Corporation 5 Series/3400 Series Chipset 2 port SATA IDE Controller (rev 05)
04:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
05:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
06:03.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200eW WPCM450 (rev 0a)


May 26 19:12:21 xn6 kernel: ata1.00: ATA-8: WDC WD5002ABYS-02B1B0, 02.03B03, max UDMA/133
May 26 19:12:21 xn6 kernel: ata1.00: 976773168 sectors, multi 16: LBA48 NCQ (depth 0/32)
May 26 19:12:21 xn6 kernel: ata1.00: configured for UDMA/133
May 26 19:12:21 xn6 kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
May 26 19:12:21 xn6 kernel: ata2.00: ATA-8: WDC WD5002ABYS-02B1B0, 02.03B03, max UDMA/133
May 26 19:12:21 xn6 kernel: ata2.00: 976773168 sectors, multi 16: LBA48 NCQ (depth 0/32)
May 26 19:12:21 xn6 kernel: ata2.00: configured for UDMA/133

Comment 10 melchiaros 2012-03-26 14:23:14 UTC
In recent kernel serie 3.0.x a related issue is observed:

Look at:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/965213

Comment 11 melchiaros 2012-03-26 14:29:35 UTC
You may have a related report in your own bugtracker:

https://bugzilla.redhat.com/show_bug.cgi?id=474552

Comment 12 John Feeney 2013-10-30 22:17:45 UTC
This Bugzilla has been reviewed by Red Hat and is not planned on being
addressed in Red Hat Enterprise Linux 5, and therefore is being closed.
If this bug is critical to production systems, please contact your Red
Hat support representative and provide a sufficient business justification
in order to re-open it.


Note You need to log in before you can comment on or make changes to this bug.