Bug 228979 - Bug in drivers/ata/libata-core.c:4602
Bug in drivers/ata/libata-core.c:4602
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
6
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Kernel Maintainer List
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-02-16 05:52 EST by Marcus Haebler
Modified: 2007-11-30 17:11 EST (History)
3 users (show)

See Also:
Fixed In Version: kernel-2.6.20-1.2925.fc6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-03-17 18:14:41 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Marcus Haebler 2007-02-16 05:52:44 EST
Description of problem:

Enabeling NCQ (as described here: http://linux-ata.org/faq.html#ncq) will result
in a flood of the following type of error messages:

Feb 14 08:20:23 saruman2 kernel: BUG: warning: (ap->ops->error_handler &&
ata_tag_vali
d(ap->active_tag)) at drivers/ata/libata-core.c:4602/ata_qc_issue() (Not tainted)
Feb 14 08:20:23 saruman2 kernel:  [<c0405018>] dump_trace+0x69/0x1b6
Feb 14 08:20:23 saruman2 kernel:  [<c040517d>] show_trace_log_lvl+0x18/0x2c
Feb 14 08:20:23 saruman2 kernel:  [<c0405778>] show_trace+0xf/0x11
Feb 14 08:20:23 saruman2 kernel:  [<c0405875>] dump_stack+0x15/0x17
Feb 14 08:20:23 saruman2 kernel:  [<f8950876>] ata_qc_issue+0x5d/0x549 [libata]
Feb 14 08:20:23 saruman2 kernel:  [<f895538e>] ata_scsi_translate+0xb8/0xfe [libata]
Feb 14 08:20:23 saruman2 kernel:  [<f8956813>] ata_scsi_queuecmd+0xf2/0x111 [libata]
Feb 14 08:20:23 saruman2 kernel:  [<f88c1a92>] scsi_dispatch_cmd+0x231/0x2af
[scsi_mod
]
Feb 14 08:20:23 saruman2 kernel:  [<f88c6542>] scsi_request_fn+0x27d/0x33a
[scsi_mod]
Feb 14 08:20:23 saruman2 kernel:  [<c04e4e74>] __generic_unplug_device+0x1d/0x1f
Feb 14 08:20:23 saruman2 kernel:  [<c04e6915>] __make_request+0x37f/0x3c5
Feb 14 08:20:23 saruman2 kernel:  [<c04e41a8>] generic_make_request+0x29d/0x2ad
Feb 14 08:20:23 saruman2 kernel:  [<f8911be6>] handle_stripe+0x1ff0/0x218f [raid456]
Feb 14 08:20:23 saruman2 kernel:  [<f8912dc7>] make_request+0x5a9/0x685 [raid456]
Feb 14 08:20:23 saruman2 kernel:  [<c04e41a8>] generic_make_request+0x29d/0x2ad
Feb 14 08:20:23 saruman2 kernel:  [<c04e5fa6>] submit_bio+0xfc/0x103
Feb 14 08:20:23 saruman2 kernel:  [<c0491d85>] submit_bh+0x12b/0x14b
Feb 14 08:20:23 saruman2 kernel:  [<c0493462>] __block_write_full_page+0x294/0x3ae
Feb 14 08:20:23 saruman2 kernel:  [<c0493864>] block_write_full_page+0xd4/0xdc
Feb 14 08:20:23 saruman2 kernel:  [<c045d6f2>] generic_writepages+0x18c/0x2bd
Feb 14 08:20:23 saruman2 kernel:  [<c045d843>] do_writepages+0x20/0x30
Feb 14 08:20:23 saruman2 kernel:  [<c048e8ab>] __writeback_single_inode+0x1c3/0x2ff
Feb 14 08:20:23 saruman2 kernel:  [<c048ed53>] sync_sb_inodes+0x19a/0x248
Feb 14 08:20:23 saruman2 kernel:  [<c048f122>] writeback_inodes+0x7b/0xc5
Feb 14 08:20:23 saruman2 kernel:  [<c045dc9d>] wb_kupdate+0x7b/0xde
Feb 14 08:20:23 saruman2 kernel:  [<c045e123>] pdflush+0x111/0x1a7
Feb 14 08:20:23 saruman2 kernel:  [<c0439810>] kthread+0xc0/0xec
Feb 14 08:20:23 saruman2 kernel:  [<c0404c03>] kernel_thread_helper+0x7/0x10
Feb 14 08:20:23 saruman2 kernel:  =======================



Version-Release number of selected component (if applicable):
# uname -a
Linux saruman2 2.6.19-1.2895.fc6 #1 SMP Wed Jan 10 19:28:18 EST 2007 i686 i686
i386 GNU/Linux


How reproducible:
Always.

Steps to Reproduce:
1. echo 31 > /sys/block/sdX/device/queue_depth where sdX is one of my 5 Seagate
320G SATA drives (controller set to ahci in BIOS of my Asus P5B-Deluxe MB)
2. Error messages start flooding the messages log after that
3.
  
Actual results:
Error messages & bad performance due to log message flooding

Expected results:
NCQ enabled, improved performance 

Additional info:

There seems to be one other report of this:
http://lkml.org/lkml/2007/1/23/273

There are multiple software raids [0, 1 & 5] running over the those HDDs.
Comment 1 Marcus Haebler 2007-02-17 08:41:57 EST
Problem is present in 2.6.19-1.2911.fc6 as well. Same error message - including
same line number.
Comment 2 Marcus Haebler 2007-02-18 12:52:13 EST
Is this bug the reason for NCQ being disabled by default? See 

  https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=223564

and

  http://bugzilla.kernel.org/show_bug.cgi?id=7987
Comment 3 Michael Cronenworth 2007-02-18 13:04:59 EST
If I try to add a queue_depth my system becomes unstable, as your bug report states.

Per bug 223564 I have similar hardware. (Asus P5B-E)
Comment 4 Marcus Haebler 2007-02-21 01:36:59 EST
I installed a 2.6.20.1 vanilla kernel today to prepare my system for a patch
from Tejun Heo. Under the unpatched vanilla 2.6.20.1 kernel NCQ is enabled at
boot time and the kernel bug (messages) described above have vanished. The
problem might be related to the JMB363 actually rather than the Intel ICH8R.

Reference: http://lkml.org/lkml/2007/2/21/11

Comment 5 Michael Cronenworth 2007-02-21 01:50:35 EST
Will this can be backported to 2.6.19.3 (latest FC6 kernel)?

Thanks for the update. I'll CC.
Comment 6 Michael Cronenworth 2007-03-12 21:37:55 EDT
I've loaded the 2.6.20-1.2925.fc6 RPM from testing and it enabled NCQ. Looks
like the only way to go is to use 2.6.20.

Note You need to log in before you can comment on or make changes to this bug.