Bug 456078

Summary: Timeouts in wait_drive_not_busy with TEAC DV-W28ECW and similar
Product: Red Hat Enterprise Linux 4 Reporter: Bryn M. Reeves <bmr>
Component: kernelAssignee: Prarit Bhargava <prarit>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: high Docs Contact:
Priority: high    
Version: 4.6CC: cward, dmair, emcnabb, james.brown, jrfuller, peterm, sandy.garza, tao, vgoyal
Target Milestone: rcKeywords: OtherQA, Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-05-18 19:23:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 391511, 456484, 461297    
Attachments:
Description Flags
Patch increasing timeout in wait_drive_not_busy
none
RHEL4 fix for this issue none

Description Bryn M. Reeves 2008-07-21 12:37:18 UTC
Description of problem:

Some TEAC drives take a long time to return to non-Busy status following an
IDENTIFY command (>=6ms), causing following messages to be logged:

08:20:36 hex kernel: hda: irq timeout: status=0xd0 { Busy }
08:20:36 hex kernel: hda: irq timeout: error=0x00
08:20:36 hex kernel: hda: ATAPI reset complete
08:20:36 hex kernel: VFS: busy inodes on changed media 

Version-Release number of selected component (if applicable):


How reproducible:
Infrequently but depends on drive/usage. Some users report the drives to be
unusable while others see only infrequent failures.

Steps to Reproduce:
1. cat /proc/ide/<drive>/identify
  
Actual results:
22 failures in 60,000 accesses:

 kernel: hda: task_in_intr: status=0xd0 { Busy }
 kernel: hda: task_in_intr: error=0xd0LastFailedSense 0x0d

Expected results:
No error logged. Drive usable as normal.

Additional info:
Reported upstream as kernel bug 10887:
http://bugzilla.kernel.org/show_bug.cgi?id=10887

And fixed in commit f54feafa6d47d0aa1a96adefdc763b708b02f94f:

Author: Bartlomiej Zolnierkiewicz <bzolnier>
Date:   Fri Jun 20 20:53:33 2008 +0200

    ide: increase timeout in wait_drive_not_busy()
    
    Some ATAPI devices take longer than the current max timeout value to
    become ready (i.e. TEAC DV-W28ECW takes 6 ms) so increase the timeout
    value to 10 ms.
    
    This fixes kernel.org bugzilla bug #10887:
    http://bugzilla.kernel.org/show_bug.cgi?id=10887
    
    Reported-by: Masanari Iida <standby24x7>
    Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier>

Comment 1 Bryn M. Reeves 2008-07-21 12:37:20 UTC
Created attachment 312253 [details]
Patch increasing timeout in wait_drive_not_busy

Comment 2 Bryn M. Reeves 2008-07-21 12:37:59 UTC
This is also anecdotaly affecting drives from other vendors.


Comment 3 Bryn M. Reeves 2008-07-21 12:38:49 UTC
Related to bug 453808 for the same drive model:

qc timeout probing TEAC DV-28E-V CD/DVD drive on SATA/PATA bridge

Comment 5 Alan Cox 2008-07-23 19:44:40 UTC
I'm happy with that proposed patch


Comment 7 RHEL Program Management 2008-09-03 12:57:29 UTC
Updating PM score.

Comment 14 RHEL Program Management 2009-03-17 13:45:04 UTC
Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request.

Comment 22 Prarit Bhargava 2009-03-24 13:43:41 UTC
Created attachment 336467 [details]
RHEL4 fix for this issue

Comment 23 Sandy Garza 2009-03-26 14:20:15 UTC
HP has begun some validation of the patch. Will post results shortly.

Comment 24 Ludek Smid 2009-03-26 19:22:00 UTC
(In reply to comment #23)
> HP has begun some validation of the patch. Will post results shortly.  
Are the test results available?

Comment 25 Sandy Garza 2009-03-26 19:41:56 UTC
Here are our results for now. We are in the middle of the i686 testing.

So far we have completed testing x86_64 bz kernel. Test was successful. None of the following messages were reported by "dmesg":

08:20:36 hex kernel: hda: irq timeout: status=0xd0 { Busy }
08:20:36 hex kernel: hda: irq timeout: error=0x00
08:20:36 hex kernel: hda: ATAPI reset complete
08:20:36 hex kernel: VFS: busy inodes on changed media

Currently we are running the same test on i686 kernel. Test has successfully run for 30+ minutes and there are no "hda" related messages by "dmesg" so far.

Comment 26 Sandy Garza 2009-03-26 20:15:22 UTC
The test on i686 bz kernel ran successfully for over an hour. "dmesg" did not report "had" timeout/error.

Thank you,
Sandy

Comment 28 Vivek Goyal 2009-03-31 15:42:58 UTC
Committed in 86.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 30 Chris Ward 2009-04-09 07:43:07 UTC
~~ Attention Partners! Snap 3 Released ~~
RHEL 4.8 Snapshot 3 has been released on partners.redhat.com. There
should be a fix present that resolves this bug.

If you encounter any issues, please set the bug back to the ASSIGNED state and
describe the issues you encountered. If you have found a NEW bug, clone this
bug and describe the issues you encountered. Further questions can be
directed to your Red Hat Partner Manager.

If you have VERIFIED the bug fix. Please select your PartnerID from the
Verified field above. Please leave a comment with your test results details.
Include which arches tested, package version and any applicable logs.

Comment 31 Chris Ward 2009-04-16 13:13:26 UTC
~~ Attention! Snap 4 Released ~~
RHEL 4.8 Snapshot 4 has been released on partners.redhat.com. There
should be a fix present that resolves this bug. There's not much more time to test. Please report back results ASAP.

If you encounter any issues, please set the bug back to the ASSIGNED state and
describe the issues you encountered. If you have found a NEW bug, clone this
bug and describe the issues you encountered. Further questions can be
directed to your Red Hat Partner Manager.

If you have VERIFIED the bug fix. Please select your PartnerID from the
Verified field above. Please leave a comment with your test results details.
Include which arches tested, package version and any applicable logs.

Comment 33 Bryn M. Reeves 2009-04-21 15:36:29 UTC
This patch has already had a fair amount of testing mileage both here and upstream. It'd still be nice to get it verified in 86.EL though.

Comment 35 errata-xmlrpc 2009-05-18 19:23:19 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1024.html