Bug 1651706
| Summary: | kernel: sd 0:0:9:0: timing out command, waited XXXs - which command timed out? | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Rainer Beyel <rbeyel> |
| Component: | kernel | Assignee: | Maurizio Lombardi <mlombard> |
| kernel sub component: | Storage Drivers | QA Contact: | guazhang <guazhang> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | low | ||
| Priority: | unspecified | CC: | emilne, loberman, nyewale, rbeyel, revers, rhandlin, sfroemer |
| Version: | 7.6 | ||
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | kernel-3.10.0-1065.el7 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-03-31 19:14:32 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1711360 | ||
So the customer would like to see the SCSI command (i.e. log message like: 896 Nov 12 17:39:26 myhostname kernel: sd 0:0:9:0: [sdc] CDB: Read(16) 88 00 00 00 00 00 00 01 0e 00 00 00 00 80 00 00 in the output for the "timing out command" message? Is that it? To help with upstream acceptance of such a change, can the customer identify why this information would be useful? Also, please provide some information about what type of storage device the customer was using (the INQUIRY information from sg_inq would help). (In reply to Ewan D. Milne from comment #3) > So the customer would like to see the SCSI command (i.e. log message like: > > 896 Nov 12 17:39:26 myhostname kernel: sd 0:0:9:0: [sdc] CDB: Read(16) 88 00 > 00 00 00 00 00 01 0e 00 00 00 00 80 00 00 > > in the output for the "timing out command" message? Is that it? > > To help with upstream acceptance of such a change, can the customer identify > why this information would be useful? > > Also, please provide some information about what type of storage device > the customer was using (the INQUIRY information from sg_inq would help). Following command is good example of what command timed out. 894 Nov 12 17:39:26 host01 kernel: sd 0:0:9:0: timing out command, waited 180s 895 Nov 12 17:39:26 host01 kernel: sd 0:0:9:0: [sdc] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK 896 Nov 12 17:39:26 host01 kernel: sd 0:0:9:0: [sdc] CDB: Read(16) 88 00 00 00 00 00 00 01 0e 00 00 00 00 80 00 00 897 Nov 12 17:39:26 host01 kernel: blk_update_request: I/O error, dev sdc, sector 69120 So line 894 to 897 provides information what times out. But there are a lot of additional message entries, where it's not possible to identify, what command timed out. for example. 912 Nov 12 17:46:26 evesp07 kernel: sd 0:0:9:0: timing out command, waited 180s 913 Nov 12 17:46:46 evesp07 kernel: sd 0:0:9:0: timing out command, waited 20s 914 Nov 12 17:47:06 evesp07 kernel: sd 0:0:9:0: timing out command, waited 20s 915 Nov 12 17:47:26 evesp07 kernel: sd 0:0:9:0: timing out command, waited 60s 916 Nov 12 17:47:26 evesp07 kernel: sd 0:0:9:0: timing out command, waited 20s here in line 912 to 916 it's unable to identify which command timed out. It seems there are 5 different commands sent, which all timed out. Here the request is, to show the command always in form like above (line 894-897 or similar), when there is a command timing out. This would help in case of debugging. Does this make sense? Hello Is there any test steps here if I don't have faulty disk? or just do sanity testing thanks Guazhang IMO the patch is so trivial that sanity testing is sufficient. (In reply to Maurizio Lombardi from comment #10) > IMO the patch is so trivial that sanity testing is sufficient. Ok, thanks your responds Patch(es) committed on kernel-3.10.0-1065.el7 I did code review on fixed kernel and the regression testing, After check, all modified can be found in source https://beaker.engineering.redhat.com/jobs/3722823 this is from auto test bug bot I did code review on fixed kernel and the regression testing, After check, all modified can be found in source https://beaker.engineering.redhat.com/jobs/3722824 this is from auto test bug bot I did code review on fixed kernel and the regression testing, After check, all modified can be found in source https://beaker.engineering.redhat.com/jobs/3722828 this is from auto test bug bot I did code review on fixed kernel and the regression testing, After check, all modified can be found in source https://beaker.engineering.redhat.com/jobs/3722830 this is from auto test bug bot Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:1016 |
Description of problem: With a faulty disk the following is shown in "/var/log/messages": ... 894 Nov 12 17:39:26 myhostname kernel: sd 0:0:9:0: timing out command, waited 180s 895 Nov 12 17:39:26 myhostname kernel: sd 0:0:9:0: [sdc] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK 896 Nov 12 17:39:26 myhostname kernel: sd 0:0:9:0: [sdc] CDB: Read(16) 88 00 00 00 00 00 00 01 0e 00 00 00 00 80 00 00 897 Nov 12 17:39:26 myhostname kernel: blk_update_request: I/O error, dev sdc, sector 69120 898 Nov 12 17:40:01 myhostname systemd: Started Session 2150 of user root. 899 Nov 12 17:42:26 myhostname kernel: sd 0:0:9:0: timing out command, waited 180s 900 Nov 12 17:42:46 myhostname kernel: sd 0:0:9:0: timing out command, waited 20s 901 Nov 12 17:43:06 myhostname kernel: sd 0:0:9:0: timing out command, waited 20s 902 Nov 12 17:43:26 myhostname kernel: sd 0:0:9:0: timing out command, waited 20s ... 912 Nov 12 17:46:26 myhostname kernel: sd 0:0:9:0: timing out command, waited 180s 913 Nov 12 17:46:46 myhostname kernel: sd 0:0:9:0: timing out command, waited 20s 914 Nov 12 17:47:06 myhostname kernel: sd 0:0:9:0: timing out command, waited 20s 915 Nov 12 17:47:26 myhostname kernel: sd 0:0:9:0: timing out command, waited 60s 916 Nov 12 17:47:26 myhostname kernel: sd 0:0:9:0: timing out command, waited 20s ... "timing out command, waited XXXs" is not showing which command is "timing out". Version-Release number of selected component (if applicable): RHEL 7.6 How reproducible: RHEL 7.6 with a faulty disk Actual results: The logs show errors in regards to a (possible faulty) disk. Some infos are generic ("timing out command, waited 20s") and the details are missing (which command timed out?). Expected results: With the "timing out command, waited XXs" info, also show which command timed out.