Bug 1651706 - kernel: sd 0:0:9:0: timing out command, waited XXXs - which command timed out?
Summary: kernel: sd 0:0:9:0: timing out command, waited XXXs - which command timed out?
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel
Version: 7.6
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: rc
: ---
Assignee: Maurizio Lombardi
QA Contact: guazhang@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1711360
TreeView+ depends on / blocked
 
Reported: 2018-11-20 15:55 UTC by Rainer Beyel
Modified: 2021-09-06 12:45 UTC (History)
7 users (show)

Fixed In Version: kernel-3.10.0-1065.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-31 19:14:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2020:1016 0 None None None 2020-03-31 19:15:19 UTC

Description Rainer Beyel 2018-11-20 15:55:50 UTC
Description of problem:

With a faulty disk the following is shown in "/var/log/messages":
...
894 Nov 12 17:39:26 myhostname kernel: sd 0:0:9:0: timing out command, waited 180s
895 Nov 12 17:39:26 myhostname kernel: sd 0:0:9:0: [sdc] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
896 Nov 12 17:39:26 myhostname kernel: sd 0:0:9:0: [sdc] CDB: Read(16) 88 00 00 00 00 00 00 01 0e 00 00 00 00 80 00 00
897 Nov 12 17:39:26 myhostname kernel: blk_update_request: I/O error, dev sdc, sector 69120
898 Nov 12 17:40:01 myhostname systemd: Started Session 2150 of user root.
899 Nov 12 17:42:26 myhostname kernel: sd 0:0:9:0: timing out command, waited 180s
900 Nov 12 17:42:46 myhostname kernel: sd 0:0:9:0: timing out command, waited 20s
901 Nov 12 17:43:06 myhostname kernel: sd 0:0:9:0: timing out command, waited 20s
902 Nov 12 17:43:26 myhostname kernel: sd 0:0:9:0: timing out command, waited 20s
...
912 Nov 12 17:46:26 myhostname kernel: sd 0:0:9:0: timing out command, waited 180s
913 Nov 12 17:46:46 myhostname kernel: sd 0:0:9:0: timing out command, waited 20s
914 Nov 12 17:47:06 myhostname kernel: sd 0:0:9:0: timing out command, waited 20s
915 Nov 12 17:47:26 myhostname kernel: sd 0:0:9:0: timing out command, waited 60s
916 Nov 12 17:47:26 myhostname kernel: sd 0:0:9:0: timing out command, waited 20s
...

"timing out command, waited XXXs" is not showing which command is "timing out".

Version-Release number of selected component (if applicable):
RHEL 7.6

How reproducible:
RHEL 7.6 with a faulty disk

Actual results:
The logs show errors in regards to a (possible faulty) disk. Some infos are generic ("timing out command, waited 20s") and the details are missing (which command timed out?).

Expected results:
With the "timing out command, waited XXs" info, also show which command timed out.

Comment 3 Ewan D. Milne 2018-11-26 21:47:44 UTC
So the customer would like to see the SCSI command (i.e. log message like:

896 Nov 12 17:39:26 myhostname kernel: sd 0:0:9:0: [sdc] CDB: Read(16) 88 00 00 00 00 00 00 01 0e 00 00 00 00 80 00 00

in the output for the "timing out command" message?  Is that it?

To help with upstream acceptance of such a change, can the customer identify
why this information would be useful?

Also, please provide some information about what type of storage device
the customer was using (the INQUIRY information from sg_inq would help).

Comment 5 Steffen Froemer 2018-12-11 11:11:43 UTC
(In reply to Ewan D. Milne from comment #3)
> So the customer would like to see the SCSI command (i.e. log message like:
> 
> 896 Nov 12 17:39:26 myhostname kernel: sd 0:0:9:0: [sdc] CDB: Read(16) 88 00
> 00 00 00 00 00 01 0e 00 00 00 00 80 00 00
> 
> in the output for the "timing out command" message?  Is that it?
> 
> To help with upstream acceptance of such a change, can the customer identify
> why this information would be useful?
> 
> Also, please provide some information about what type of storage device
> the customer was using (the INQUIRY information from sg_inq would help).

Following command is good example of what command timed out.

    894 Nov 12 17:39:26 host01 kernel: sd 0:0:9:0: timing out command, waited 180s
    895 Nov 12 17:39:26 host01 kernel: sd 0:0:9:0: [sdc] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
    896 Nov 12 17:39:26 host01 kernel: sd 0:0:9:0: [sdc] CDB: Read(16) 88 00 00 00 00 00 00 01 0e 00 00 00 00 80 00 00
    897 Nov 12 17:39:26 host01 kernel: blk_update_request: I/O error, dev sdc, sector 69120

So line 894 to 897 provides information what times out.
But there are a lot of additional message entries, where it's not possible to identify, what command timed out.

for example.
    912 Nov 12 17:46:26 evesp07 kernel: sd 0:0:9:0: timing out command, waited 180s
    913 Nov 12 17:46:46 evesp07 kernel: sd 0:0:9:0: timing out command, waited 20s
    914 Nov 12 17:47:06 evesp07 kernel: sd 0:0:9:0: timing out command, waited 20s
    915 Nov 12 17:47:26 evesp07 kernel: sd 0:0:9:0: timing out command, waited 60s
    916 Nov 12 17:47:26 evesp07 kernel: sd 0:0:9:0: timing out command, waited 20s

here in line 912 to 916 it's unable to identify which command timed out. It seems there are 5 different commands sent, which all timed out. 
Here the request is, to show the command always in form like above (line 894-897 or similar), when there is a command timing out. This would help in case of debugging. 

Does this make sense?

Comment 9 guazhang@redhat.com 2019-07-12 07:31:17 UTC
Hello

Is there any test steps here if I don't have faulty disk? or just do sanity testing 

thanks
Guazhang

Comment 10 Maurizio Lombardi 2019-07-12 09:00:06 UTC
IMO the patch is so trivial that sanity testing is sufficient.

Comment 11 guazhang@redhat.com 2019-07-12 09:18:55 UTC
(In reply to Maurizio Lombardi from comment #10)
> IMO the patch is so trivial that sanity testing is sufficient.

Ok, thanks your responds

Comment 13 Jan Stancek 2019-08-05 17:55:59 UTC
Patch(es) committed on kernel-3.10.0-1065.el7

Comment 16 guazhang@redhat.com 2019-08-13 07:22:48 UTC
I did code review on fixed kernel and the regression testing, After check, all modified can be found in source 



https://beaker.engineering.redhat.com/jobs/3722823

this is from auto test bug bot

Comment 17 guazhang@redhat.com 2019-08-13 07:22:56 UTC
I did code review on fixed kernel and the regression testing, After check, all modified can be found in source 



https://beaker.engineering.redhat.com/jobs/3722824

this is from auto test bug bot

Comment 18 guazhang@redhat.com 2019-08-13 07:23:08 UTC
I did code review on fixed kernel and the regression testing, After check, all modified can be found in source 



https://beaker.engineering.redhat.com/jobs/3722828

this is from auto test bug bot

Comment 19 guazhang@redhat.com 2019-08-13 07:23:19 UTC
I did code review on fixed kernel and the regression testing, After check, all modified can be found in source 



https://beaker.engineering.redhat.com/jobs/3722830

this is from auto test bug bot

Comment 21 errata-xmlrpc 2020-03-31 19:14:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:1016


Note You need to log in before you can comment on or make changes to this bug.