Bug 2143884
| Summary: | non-stop polling of removable media storage | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | GV <rhel> | ||||||||
| Component: | kernel | Assignee: | Ewan D. Milne <emilne> | ||||||||
| kernel sub component: | Storage Drivers | QA Contact: | Storage QE <storage-qe> | ||||||||
| Status: | CLOSED MIGRATED | Docs Contact: | |||||||||
| Severity: | unspecified | ||||||||||
| Priority: | low | CC: | revers | ||||||||
| Version: | 8.7 | Keywords: | MigratedToJIRA, Triaged | ||||||||
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
||||||||
| Target Release: | --- | ||||||||||
| Hardware: | Unspecified | ||||||||||
| OS: | Unspecified | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2023-09-23 13:58:51 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Attachments: |
|
||||||||||
|
Description
GV
2022-11-18 08:43:40 UTC
There were changes made in kernel-4.18.0-396.el8 which included a log
message for a previously unreported condition, followed by a change to
reduce the amount of the resulting logging. The patch that introduced
the additional logging was:
commit 848ade90ba9c1e2bc2f7869fbe5d08bb32f3db09
Author: Christian Loehle <cloehle>
Date: Mon Aug 16 09:37:51 2021 +0000
scsi: sd: Do not exit sd_spinup_disk() quietly
The sd_spinup_disk() function logs what is happening. Unfortunately this
output stops if the media was marked as removed in the meantime. Add a
print for this case too.
Link: https://lore.kernel.org/r/CWXP265MB26803209FD08A64222EEEA02C4FD9@CWXP265MB2680.GBRP265.PROD.OUTLOOK.COM
Reviewed-by: Bart Van Assche <bvanassche>
Signed-off-by: Christian Loehle <cloehle>
Signed-off-by: Martin K. Petersen <martin.petersen>
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index ac431b0477da..920df3a04a7b 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2136,8 +2136,10 @@ sd_spinup_disk(struct scsi_disk *sdkp)
* doesn't have any media in it, don't bother
* with any more polling.
*/
- if (media_not_present(sdkp, &sshdr))
+ if (media_not_present(sdkp, &sshdr)) {
+ sd_printk(KERN_NOTICE, sdkp, "Media removed, stopped polling\n");
return;
+ }
if (the_result)
sense_valid = scsi_sense_valid(&sshdr);
but this was changed to suppress the message if the media was already
marked as not present:
Author: Heiner Kallweit <hkallweit1>
Date: Sat Sep 11 14:11:59 2021 +0200
scsi: sd: Make sd_spinup_disk() less noisy
sd_spinup_disk() is a little bit noisy after commit 848ade90ba9c ("scsi:
sd: Do not exit sd_spinup_disk() quietly"):
scsi 0:0:0:0: Direct-Access Multiple Card Reader 1.00 PQ: 0 ANSI: 0
sd 0:0:0:0: Attached scsi generic sg0 type 0
sd 0:0:0:0: [sda] Media removed, stopped polling
sd 0:0:0:0: [sda] Media removed, stopped polling
sd 0:0:0:0: [sda] Attached SCSI removable disk
sd 0:0:0:0: [sda] Media removed, stopped polling
There's not really a benefit in printing the same message multiple
times. Therefore print it only if media_present was previously set.
Link: https://lore.kernel.org/r/a2d0a249-6035-9697-626a-e14ec50ef6ee@gmail.com
Reviewed-by: Bart Van Assche <bvanassche>
Signed-off-by: Heiner Kallweit <hkallweit1>
Signed-off-by: Martin K. Petersen <martin.petersen>
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index a8039beb5a02..523bf2fdc253 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2124,6 +2124,8 @@ sd_spinup_disk(struct scsi_disk *sdkp)
retries = 0;
do {
+ bool media_was_present = sdkp->media_present;
+
cmd[0] = TEST_UNIT_READY;
memset((void *) &cmd[1], 0, 9);
@@ -2138,7 +2140,8 @@ sd_spinup_disk(struct scsi_disk *sdkp)
* with any more polling.
*/
if (media_not_present(sdkp, &sshdr)) {
- sd_printk(KERN_NOTICE, sdkp, "Media removed, stopped polling\n");
+ if (media_was_present)
+ sd_printk(KERN_NOTICE, sdkp, "Media removed, stopped polling\n");
return;
}
Since you are seeing the messages recorr every few hundred seconds it would
appear as if your devices are being re-inserted in the intervening period,
unless there is something else marking them present again.
Could you please attach a log from your system, beginning from when the machine
was booted (e.g. dmesg, if it shows all the messages, or journalctl -b 0) to this
Bugzilla (using "Add an attachment" above in the BZ user interface so we can see
what kind of devices these are any what status changes are taking place?
Created attachment 1931054 [details]
dmesg
Created attachment 1931055 [details]
journalctl
Created attachment 1931056 [details]
dmesg
Hi! (In reply to Ewan D. Milne from comment #1) > Could you please attach a log from your system, beginning from when the > machine > was booted (e.g. dmesg, if it shows all the messages, or journalctl -b 0) to > this > Bugzilla (using "Add an attachment" above in the BZ user interface so we can > see > what kind of devices these are any what status changes are taking place? dmesg, journalctl and lsscsi output attached. Please, make this issue public. I _really_ do not understand why by default is to make-it private and why RH bugzilla keep the private mode even if I choose to be public when the issue was created. Thank you! Sincerely, Gabriel Something is making your devices get alternately marked as media not present vs. media present Can you install the sg3_utils package and run the commands: sg_readcap <device> sg_readcap --16 <device> where the <device> is each of sda, sdb, sdc, sdd One possibility is that you are getting a NOT READY response to a READ CAPACITY(16) command to your USB attached devices instead of some other sense code (your USB bridge may be doing that). The Linux kernel code will set the removable media as not present in this case, but the periodic event checking would set it back as being present if it responded to TEST UNIT READY. There are other ways this could happen but let's try this first. I don't seem to have permissions to unlock this BZ but I will look into whether someone else here can do that. Hi! Thank you for spending time on this issue. > Can you install the sg3_utils package and run the commands: # for dev in /dev/sda /dev/sdb /dev/sdc /dev/sdd; do echo "[[sg_readcap $dev]]"; sg_readcap $dev; echo; echo "[[sg_readcap --16 $dev]]"; sg_readcap --16 $dev; echo; done [[sg_readcap /dev/sda]] READ CAPACITY (10) failed: Device not ready sg_readcap failed: Device not ready [[sg_readcap --16 /dev/sda]] bad field in READ CAPACITY (16) cdb including unsupported service action sg_readcap failed: Illegal request [[sg_readcap /dev/sdb]] READ CAPACITY (10) failed: Device not ready sg_readcap failed: Device not ready [[sg_readcap --16 /dev/sdb]] bad field in READ CAPACITY (16) cdb including unsupported service action sg_readcap failed: Illegal request [[sg_readcap /dev/sdc]] READ CAPACITY (10) failed: Device not ready sg_readcap failed: Device not ready [[sg_readcap --16 /dev/sdc]] bad field in READ CAPACITY (16) cdb including unsupported service action sg_readcap failed: Illegal request [[sg_readcap /dev/sdd]] READ CAPACITY (10) failed: Device not ready sg_readcap failed: Device not ready [[sg_readcap --16 /dev/sdd]] bad field in READ CAPACITY (16) cdb including unsupported service action sg_readcap failed: Illegal request > I don't seem to have permissions to unlock this BZ but I will look into whether someone else here can do that. Thank you! Sincerely, Gabriel I am still looking at how we might be able to handle differently in the
upstream kernel the way your hardware seems to be working. I take it that
if you ran
sg_tur <device>
it would respond ready even though other commands do not. This seems to
be why the messages are being generated.
I think it is possible to adjust the polling interval, if this would help
in the time being. There is a kernel parameter in the block layer:
block.disk_events_dfl_poll_msecs
that defaults to 2000 (i.e. 2 seconds), perhaps if you increase it you
would see less logging. The tradeoff is that if you insert a device,
it would take longer to be found via the polling.
The individual devices also have a "poll_msecs" attribute that can be
set via sysfs. The default value is -1 which means to use the global
default above, but it can be set separately. For example, on my system:
[root@rhel-storage-10 ~]# find /sys -name "*poll_msecs*"
/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/events_poll_msecs
/sys/devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0/1:0:0:0/block/sdb/events_poll_msecs
/sys/devices/pci0000:00/0000:00:1f.2/ata5/host4/target4:0:0/4:0:0:0/block/sr0/events_poll_msecs
/sys/module/block/parameters/events_dfl_poll_msecs
[root@rhel-storage-10 ~]# for i in `find /sys -name "*poll_msecs*"`
> do
> echo $i
> cat $i
> done
/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/events_poll_msecs
-1
/sys/devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0/1:0:0:0/block/sdb/events_poll_msecs
-1
/sys/devices/pci0000:00/0000:00:1f.2/ata5/host4/target4:0:0/4:0:0:0/block/sr0/events_poll_msecs
-1
/sys/module/block/parameters/events_dfl_poll_msecs
2000
Meanwhile I will continue to look into this but it may take a while as
we are entering our end-of-year shutdown.
I think I have set the BZ to be unlocked as you asked.
Hi! (In reply to Ewan D. Milne from comment #8) > I take it that if you ran > > sg_tur <device> > > it would respond ready even though other commands do not. This seems to > be why the messages are being generated. # for dev in /dev/sda /dev/sdb /dev/sdc /dev/sdd; do echo "[[sg_turs $dev]]"; sg_turs $dev; done [[sg_turs /dev/sda]] test unit ready: transport: Host_status=0x07 [DID_ERROR] Driver_status=0x00 [DRIVER_OK] [[sg_turs /dev/sdb]] test unit ready: transport: Host_status=0x07 [DID_ERROR] Driver_status=0x00 [DRIVER_OK] [[sg_turs /dev/sdc]] test unit ready: transport: Host_status=0x07 [DID_ERROR] Driver_status=0x00 [DRIVER_OK] [[sg_turs /dev/sdd]] test unit ready: transport: Host_status=0x07 [DID_ERROR] Driver_status=0x00 [DRIVER_OK] > I think it is possible to adjust the polling interval, if this would help > in the time being. Yes, it would help a lot! Thank you. Probably the fastest solution is to disconnect the device from MB. But it's a 4 hour drive to get to the location where PC is located. And are others issues then I get there... So, it's kind of inconvenient... > Meanwhile I will continue to look into this but it may take a while as > we are entering our end-of-year shutdown. Take your time. Thank you. > I think I have set the BZ to be unlocked as you asked. Thank you. Sincerely, Gabriel /var/log/messages when running sg_turs: Dec 14 09:52:50 xxxxx kernel: sd 0:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s Dec 14 09:52:50 xxxxx kernel: sd 0:0:0:0: [sda] tag#0 CDB: Test Unit Ready 00 00 00 00 00 00 Dec 14 09:52:50 xxxxx kernel: sd 0:0:0:1: [sdb] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s Dec 14 09:52:50 xxxxx kernel: sd 0:0:0:1: [sdb] tag#0 CDB: Test Unit Ready 00 00 00 00 00 00 Dec 14 09:52:50 xxxxx kernel: sd 0:0:0:2: [sdc] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s Dec 14 09:52:50 xxxxx kernel: sd 0:0:0:2: [sdc] tag#0 CDB: Test Unit Ready 00 00 00 00 00 00 Dec 14 09:52:50 xxxxx kernel: sd 0:0:0:3: [sdd] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s Dec 14 09:52:50 xxxxx kernel: sd 0:0:0:3: [sdd] tag#0 CDB: Test Unit Ready 00 00 00 00 00 00 Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug. This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there. Due to differences in account names between systems, some fields were not replicated. Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information. To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer. You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like: "Bugzilla Bug" = 1234567 In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information. |