Starting with kernel 4.18.0-425.3.1.el8.x86_64 I get a lot of the following messages that spam my logs: # dmesg | grep "Media removed, stopped polling" [ 2.382995] sd 0:0:0:0: [sda] Media removed, stopped polling [ 2.389493] sd 0:0:0:1: [sdb] Media removed, stopped polling [ 2.391864] sd 0:0:0:2: [sdc] Media removed, stopped polling [ 2.395737] sd 0:0:0:3: [sdd] Media removed, stopped polling [ 2.427609] sd 0:0:0:4: [sde] Media removed, stopped polling [ 54.732128] sd 0:0:0:4: [sde] Media removed, stopped polling [ 108.088275] sd 0:0:0:3: [sdd] Media removed, stopped polling [ 1333.664108] sd 0:0:0:0: [sda] Media removed, stopped polling [ 1758.637806] sd 0:0:0:0: [sda] Media removed, stopped polling [ 1758.649352] sd 0:0:0:4: [sde] Media removed, stopped polling [ 1918.772851] sd 0:0:0:0: [sda] Media removed, stopped polling [ 2239.034554] sd 0:0:0:1: [sdb] Media removed, stopped polling [ 2239.040563] sd 0:0:0:3: [sdd] Media removed, stopped polling [ 3091.095781] sd 0:0:0:0: [sda] Media removed, stopped polling [ 5433.569760] sd 0:0:0:0: [sda] Media removed, stopped polling [ 6870.670406] sd 0:0:0:2: [sdc] Media removed, stopped polling [ 8628.029277] sd 0:0:0:4: [sde] Media removed, stopped polling [ 9055.047826] sd 0:0:0:4: [sde] Media removed, stopped polling [ 9055.053926] sd 0:0:0:3: [sdd] Media removed, stopped polling [ 9907.045308] sd 0:0:0:2: [sdc] Media removed, stopped polling ... # dmesg | grep "Media removed, stopped polling" | wc -l 414 # uptime 10:38:18 up 2 days, 13:54, 1 user, load average: 0.06, 0.03, 0.00 It's really necessary to have to have a non-stop polling? Thank you!
There were changes made in kernel-4.18.0-396.el8 which included a log message for a previously unreported condition, followed by a change to reduce the amount of the resulting logging. The patch that introduced the additional logging was: commit 848ade90ba9c1e2bc2f7869fbe5d08bb32f3db09 Author: Christian Loehle <cloehle> Date: Mon Aug 16 09:37:51 2021 +0000 scsi: sd: Do not exit sd_spinup_disk() quietly The sd_spinup_disk() function logs what is happening. Unfortunately this output stops if the media was marked as removed in the meantime. Add a print for this case too. Link: https://lore.kernel.org/r/CWXP265MB26803209FD08A64222EEEA02C4FD9@CWXP265MB2680.GBRP265.PROD.OUTLOOK.COM Reviewed-by: Bart Van Assche <bvanassche> Signed-off-by: Christian Loehle <cloehle> Signed-off-by: Martin K. Petersen <martin.petersen> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index ac431b0477da..920df3a04a7b 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -2136,8 +2136,10 @@ sd_spinup_disk(struct scsi_disk *sdkp) * doesn't have any media in it, don't bother * with any more polling. */ - if (media_not_present(sdkp, &sshdr)) + if (media_not_present(sdkp, &sshdr)) { + sd_printk(KERN_NOTICE, sdkp, "Media removed, stopped polling\n"); return; + } if (the_result) sense_valid = scsi_sense_valid(&sshdr); but this was changed to suppress the message if the media was already marked as not present: Author: Heiner Kallweit <hkallweit1> Date: Sat Sep 11 14:11:59 2021 +0200 scsi: sd: Make sd_spinup_disk() less noisy sd_spinup_disk() is a little bit noisy after commit 848ade90ba9c ("scsi: sd: Do not exit sd_spinup_disk() quietly"): scsi 0:0:0:0: Direct-Access Multiple Card Reader 1.00 PQ: 0 ANSI: 0 sd 0:0:0:0: Attached scsi generic sg0 type 0 sd 0:0:0:0: [sda] Media removed, stopped polling sd 0:0:0:0: [sda] Media removed, stopped polling sd 0:0:0:0: [sda] Attached SCSI removable disk sd 0:0:0:0: [sda] Media removed, stopped polling There's not really a benefit in printing the same message multiple times. Therefore print it only if media_present was previously set. Link: https://lore.kernel.org/r/a2d0a249-6035-9697-626a-e14ec50ef6ee@gmail.com Reviewed-by: Bart Van Assche <bvanassche> Signed-off-by: Heiner Kallweit <hkallweit1> Signed-off-by: Martin K. Petersen <martin.petersen> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index a8039beb5a02..523bf2fdc253 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -2124,6 +2124,8 @@ sd_spinup_disk(struct scsi_disk *sdkp) retries = 0; do { + bool media_was_present = sdkp->media_present; + cmd[0] = TEST_UNIT_READY; memset((void *) &cmd[1], 0, 9); @@ -2138,7 +2140,8 @@ sd_spinup_disk(struct scsi_disk *sdkp) * with any more polling. */ if (media_not_present(sdkp, &sshdr)) { - sd_printk(KERN_NOTICE, sdkp, "Media removed, stopped polling\n"); + if (media_was_present) + sd_printk(KERN_NOTICE, sdkp, "Media removed, stopped polling\n"); return; } Since you are seeing the messages recorr every few hundred seconds it would appear as if your devices are being re-inserted in the intervening period, unless there is something else marking them present again. Could you please attach a log from your system, beginning from when the machine was booted (e.g. dmesg, if it shows all the messages, or journalctl -b 0) to this Bugzilla (using "Add an attachment" above in the BZ user interface so we can see what kind of devices these are any what status changes are taking place?
Created attachment 1931054 [details] dmesg
Created attachment 1931055 [details] journalctl
Created attachment 1931056 [details] dmesg
Hi! (In reply to Ewan D. Milne from comment #1) > Could you please attach a log from your system, beginning from when the > machine > was booted (e.g. dmesg, if it shows all the messages, or journalctl -b 0) to > this > Bugzilla (using "Add an attachment" above in the BZ user interface so we can > see > what kind of devices these are any what status changes are taking place? dmesg, journalctl and lsscsi output attached. Please, make this issue public. I _really_ do not understand why by default is to make-it private and why RH bugzilla keep the private mode even if I choose to be public when the issue was created. Thank you! Sincerely, Gabriel
Something is making your devices get alternately marked as media not present vs. media present Can you install the sg3_utils package and run the commands: sg_readcap <device> sg_readcap --16 <device> where the <device> is each of sda, sdb, sdc, sdd One possibility is that you are getting a NOT READY response to a READ CAPACITY(16) command to your USB attached devices instead of some other sense code (your USB bridge may be doing that). The Linux kernel code will set the removable media as not present in this case, but the periodic event checking would set it back as being present if it responded to TEST UNIT READY. There are other ways this could happen but let's try this first. I don't seem to have permissions to unlock this BZ but I will look into whether someone else here can do that.
Hi! Thank you for spending time on this issue. > Can you install the sg3_utils package and run the commands: # for dev in /dev/sda /dev/sdb /dev/sdc /dev/sdd; do echo "[[sg_readcap $dev]]"; sg_readcap $dev; echo; echo "[[sg_readcap --16 $dev]]"; sg_readcap --16 $dev; echo; done [[sg_readcap /dev/sda]] READ CAPACITY (10) failed: Device not ready sg_readcap failed: Device not ready [[sg_readcap --16 /dev/sda]] bad field in READ CAPACITY (16) cdb including unsupported service action sg_readcap failed: Illegal request [[sg_readcap /dev/sdb]] READ CAPACITY (10) failed: Device not ready sg_readcap failed: Device not ready [[sg_readcap --16 /dev/sdb]] bad field in READ CAPACITY (16) cdb including unsupported service action sg_readcap failed: Illegal request [[sg_readcap /dev/sdc]] READ CAPACITY (10) failed: Device not ready sg_readcap failed: Device not ready [[sg_readcap --16 /dev/sdc]] bad field in READ CAPACITY (16) cdb including unsupported service action sg_readcap failed: Illegal request [[sg_readcap /dev/sdd]] READ CAPACITY (10) failed: Device not ready sg_readcap failed: Device not ready [[sg_readcap --16 /dev/sdd]] bad field in READ CAPACITY (16) cdb including unsupported service action sg_readcap failed: Illegal request > I don't seem to have permissions to unlock this BZ but I will look into whether someone else here can do that. Thank you! Sincerely, Gabriel
I am still looking at how we might be able to handle differently in the upstream kernel the way your hardware seems to be working. I take it that if you ran sg_tur <device> it would respond ready even though other commands do not. This seems to be why the messages are being generated. I think it is possible to adjust the polling interval, if this would help in the time being. There is a kernel parameter in the block layer: block.disk_events_dfl_poll_msecs that defaults to 2000 (i.e. 2 seconds), perhaps if you increase it you would see less logging. The tradeoff is that if you insert a device, it would take longer to be found via the polling. The individual devices also have a "poll_msecs" attribute that can be set via sysfs. The default value is -1 which means to use the global default above, but it can be set separately. For example, on my system: [root@rhel-storage-10 ~]# find /sys -name "*poll_msecs*" /sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/events_poll_msecs /sys/devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0/1:0:0:0/block/sdb/events_poll_msecs /sys/devices/pci0000:00/0000:00:1f.2/ata5/host4/target4:0:0/4:0:0:0/block/sr0/events_poll_msecs /sys/module/block/parameters/events_dfl_poll_msecs [root@rhel-storage-10 ~]# for i in `find /sys -name "*poll_msecs*"` > do > echo $i > cat $i > done /sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/events_poll_msecs -1 /sys/devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0/1:0:0:0/block/sdb/events_poll_msecs -1 /sys/devices/pci0000:00/0000:00:1f.2/ata5/host4/target4:0:0/4:0:0:0/block/sr0/events_poll_msecs -1 /sys/module/block/parameters/events_dfl_poll_msecs 2000 Meanwhile I will continue to look into this but it may take a while as we are entering our end-of-year shutdown. I think I have set the BZ to be unlocked as you asked.
Hi! (In reply to Ewan D. Milne from comment #8) > I take it that if you ran > > sg_tur <device> > > it would respond ready even though other commands do not. This seems to > be why the messages are being generated. # for dev in /dev/sda /dev/sdb /dev/sdc /dev/sdd; do echo "[[sg_turs $dev]]"; sg_turs $dev; done [[sg_turs /dev/sda]] test unit ready: transport: Host_status=0x07 [DID_ERROR] Driver_status=0x00 [DRIVER_OK] [[sg_turs /dev/sdb]] test unit ready: transport: Host_status=0x07 [DID_ERROR] Driver_status=0x00 [DRIVER_OK] [[sg_turs /dev/sdc]] test unit ready: transport: Host_status=0x07 [DID_ERROR] Driver_status=0x00 [DRIVER_OK] [[sg_turs /dev/sdd]] test unit ready: transport: Host_status=0x07 [DID_ERROR] Driver_status=0x00 [DRIVER_OK] > I think it is possible to adjust the polling interval, if this would help > in the time being. Yes, it would help a lot! Thank you. Probably the fastest solution is to disconnect the device from MB. But it's a 4 hour drive to get to the location where PC is located. And are others issues then I get there... So, it's kind of inconvenient... > Meanwhile I will continue to look into this but it may take a while as > we are entering our end-of-year shutdown. Take your time. Thank you. > I think I have set the BZ to be unlocked as you asked. Thank you. Sincerely, Gabriel
/var/log/messages when running sg_turs: Dec 14 09:52:50 xxxxx kernel: sd 0:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s Dec 14 09:52:50 xxxxx kernel: sd 0:0:0:0: [sda] tag#0 CDB: Test Unit Ready 00 00 00 00 00 00 Dec 14 09:52:50 xxxxx kernel: sd 0:0:0:1: [sdb] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s Dec 14 09:52:50 xxxxx kernel: sd 0:0:0:1: [sdb] tag#0 CDB: Test Unit Ready 00 00 00 00 00 00 Dec 14 09:52:50 xxxxx kernel: sd 0:0:0:2: [sdc] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s Dec 14 09:52:50 xxxxx kernel: sd 0:0:0:2: [sdc] tag#0 CDB: Test Unit Ready 00 00 00 00 00 00 Dec 14 09:52:50 xxxxx kernel: sd 0:0:0:3: [sdd] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s Dec 14 09:52:50 xxxxx kernel: sd 0:0:0:3: [sdd] tag#0 CDB: Test Unit Ready 00 00 00 00 00 00