Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2143884

Summary: non-stop polling of removable media storage
Product: Red Hat Enterprise Linux 8 Reporter: GV <rhel>
Component: kernelAssignee: Ewan D. Milne <emilne>
kernel sub component: Storage Drivers QA Contact: Storage QE <storage-qe>
Status: CLOSED MIGRATED Docs Contact:
Severity: unspecified    
Priority: low CC: revers
Version: 8.7Keywords: MigratedToJIRA, Triaged
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-09-23 13:58:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg
none
journalctl
none
dmesg none

Description GV 2022-11-18 08:43:40 UTC
Starting with kernel 4.18.0-425.3.1.el8.x86_64 I get a lot of the following messages that spam my logs:

# dmesg | grep "Media removed, stopped polling"
[    2.382995] sd 0:0:0:0: [sda] Media removed, stopped polling
[    2.389493] sd 0:0:0:1: [sdb] Media removed, stopped polling
[    2.391864] sd 0:0:0:2: [sdc] Media removed, stopped polling
[    2.395737] sd 0:0:0:3: [sdd] Media removed, stopped polling
[    2.427609] sd 0:0:0:4: [sde] Media removed, stopped polling
[   54.732128] sd 0:0:0:4: [sde] Media removed, stopped polling
[  108.088275] sd 0:0:0:3: [sdd] Media removed, stopped polling
[ 1333.664108] sd 0:0:0:0: [sda] Media removed, stopped polling
[ 1758.637806] sd 0:0:0:0: [sda] Media removed, stopped polling
[ 1758.649352] sd 0:0:0:4: [sde] Media removed, stopped polling
[ 1918.772851] sd 0:0:0:0: [sda] Media removed, stopped polling
[ 2239.034554] sd 0:0:0:1: [sdb] Media removed, stopped polling
[ 2239.040563] sd 0:0:0:3: [sdd] Media removed, stopped polling
[ 3091.095781] sd 0:0:0:0: [sda] Media removed, stopped polling
[ 5433.569760] sd 0:0:0:0: [sda] Media removed, stopped polling
[ 6870.670406] sd 0:0:0:2: [sdc] Media removed, stopped polling
[ 8628.029277] sd 0:0:0:4: [sde] Media removed, stopped polling
[ 9055.047826] sd 0:0:0:4: [sde] Media removed, stopped polling
[ 9055.053926] sd 0:0:0:3: [sdd] Media removed, stopped polling
[ 9907.045308] sd 0:0:0:2: [sdc] Media removed, stopped polling
...

# dmesg | grep "Media removed, stopped polling" | wc -l
414

# uptime
 10:38:18 up 2 days, 13:54,  1 user,  load average: 0.06, 0.03, 0.00

It's really necessary to have to have a non-stop polling?

Thank you!

Comment 1 Ewan D. Milne 2022-12-07 19:44:52 UTC
There were changes made in kernel-4.18.0-396.el8 which included a log
message for a previously unreported condition, followed by a change to
reduce the amount of the resulting logging.  The patch that introduced
the additional logging was:

commit 848ade90ba9c1e2bc2f7869fbe5d08bb32f3db09
Author: Christian Loehle <cloehle>
Date:   Mon Aug 16 09:37:51 2021 +0000

    scsi: sd: Do not exit sd_spinup_disk() quietly
    
    The sd_spinup_disk() function logs what is happening. Unfortunately this
    output stops if the media was marked as removed in the meantime. Add a
    print for this case too.
    
    Link: https://lore.kernel.org/r/CWXP265MB26803209FD08A64222EEEA02C4FD9@CWXP265MB2680.GBRP265.PROD.OUTLOOK.COM
    Reviewed-by: Bart Van Assche <bvanassche>
    Signed-off-by: Christian Loehle <cloehle>
    Signed-off-by: Martin K. Petersen <martin.petersen>

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index ac431b0477da..920df3a04a7b 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2136,8 +2136,10 @@ sd_spinup_disk(struct scsi_disk *sdkp)
                         * doesn't have any media in it, don't bother
                         * with any more polling.
                         */
-                       if (media_not_present(sdkp, &sshdr))
+                       if (media_not_present(sdkp, &sshdr)) {
+                               sd_printk(KERN_NOTICE, sdkp, "Media removed, stopped polling\n");
                                return;
+                       }
 
                        if (the_result)
                                sense_valid = scsi_sense_valid(&sshdr);

but this was changed to suppress the message if the media was already
marked as not present:

Author: Heiner Kallweit <hkallweit1>
Date:   Sat Sep 11 14:11:59 2021 +0200

    scsi: sd: Make sd_spinup_disk() less noisy
    
    sd_spinup_disk() is a little bit noisy after commit 848ade90ba9c ("scsi:
    sd: Do not exit sd_spinup_disk() quietly"):
    
    scsi 0:0:0:0: Direct-Access     Multiple Card  Reader     1.00 PQ: 0 ANSI: 0
    sd 0:0:0:0: Attached scsi generic sg0 type 0
    sd 0:0:0:0: [sda] Media removed, stopped polling
    sd 0:0:0:0: [sda] Media removed, stopped polling
    sd 0:0:0:0: [sda] Attached SCSI removable disk
    sd 0:0:0:0: [sda] Media removed, stopped polling
    
    There's not really a benefit in printing the same message multiple
    times. Therefore print it only if media_present was previously set.
    
    Link: https://lore.kernel.org/r/a2d0a249-6035-9697-626a-e14ec50ef6ee@gmail.com
    Reviewed-by: Bart Van Assche <bvanassche>
    Signed-off-by: Heiner Kallweit <hkallweit1>
    Signed-off-by: Martin K. Petersen <martin.petersen>

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index a8039beb5a02..523bf2fdc253 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2124,6 +2124,8 @@ sd_spinup_disk(struct scsi_disk *sdkp)
                retries = 0;
 
                do {
+                       bool media_was_present = sdkp->media_present;
+
                        cmd[0] = TEST_UNIT_READY;
                        memset((void *) &cmd[1], 0, 9);
 
@@ -2138,7 +2140,8 @@ sd_spinup_disk(struct scsi_disk *sdkp)
                         * with any more polling.
                         */
                        if (media_not_present(sdkp, &sshdr)) {
-                               sd_printk(KERN_NOTICE, sdkp, "Media removed, stopped polling\n");
+                               if (media_was_present)
+                                       sd_printk(KERN_NOTICE, sdkp, "Media removed, stopped polling\n");
                                return;
                        }

Since you are seeing the messages recorr every few hundred seconds it would
appear as if your devices are being re-inserted in the intervening period,
unless there is something else marking them present again.

Could you please attach a log from your system, beginning from when the machine
was booted (e.g. dmesg, if it shows all the messages, or journalctl -b 0) to this
Bugzilla (using "Add an attachment" above in the BZ user interface so we can see
what kind of devices these are any what status changes are taking place?

Comment 2 GV 2022-12-08 13:24:51 UTC
Created attachment 1931054 [details]
dmesg

Comment 3 GV 2022-12-08 13:25:18 UTC
Created attachment 1931055 [details]
journalctl

Comment 4 GV 2022-12-08 13:25:33 UTC
Created attachment 1931056 [details]
dmesg

Comment 5 GV 2022-12-08 13:31:09 UTC
Hi!

(In reply to Ewan D. Milne from comment #1)
> Could you please attach a log from your system, beginning from when the
> machine
> was booted (e.g. dmesg, if it shows all the messages, or journalctl -b 0) to
> this
> Bugzilla (using "Add an attachment" above in the BZ user interface so we can
> see
> what kind of devices these are any what status changes are taking place?
dmesg, journalctl and lsscsi output attached.

Please, make this issue public. I _really_ do not understand why by default is to make-it private and why RH bugzilla keep the private mode even if I choose to be public when the issue was created.
Thank you!


Sincerely,
Gabriel

Comment 6 Ewan D. Milne 2022-12-08 21:30:53 UTC
Something is making your devices get alternately marked as
media not present vs. media present

Can you install the sg3_utils package and run the commands:

sg_readcap <device>
sg_readcap --16 <device>

where the <device> is each of sda, sdb, sdc, sdd

One possibility is that you are getting a NOT READY response to
a READ CAPACITY(16) command to your USB attached devices instead
of some other sense code (your USB bridge may be doing that).
The Linux kernel code will set the removable media as not present
in this case, but the periodic event checking would set it back
as being present if it responded to TEST UNIT READY.

There are other ways this could happen but let's try this first.

I don't seem to have permissions to unlock this BZ but I will look into
whether someone else here can do that.

Comment 7 GV 2022-12-09 15:21:39 UTC
Hi!

Thank you for spending time on this issue.

> Can you install the sg3_utils package and run the commands:
# for dev in /dev/sda /dev/sdb /dev/sdc /dev/sdd; do echo "[[sg_readcap $dev]]"; sg_readcap $dev; echo; echo "[[sg_readcap --16 $dev]]"; sg_readcap --16 $dev; echo; done
[[sg_readcap /dev/sda]]
READ CAPACITY (10) failed: Device not ready
sg_readcap failed: Device not ready

[[sg_readcap --16 /dev/sda]]
bad field in READ CAPACITY (16) cdb including unsupported service action
sg_readcap failed: Illegal request

[[sg_readcap /dev/sdb]]
READ CAPACITY (10) failed: Device not ready
sg_readcap failed: Device not ready

[[sg_readcap --16 /dev/sdb]]
bad field in READ CAPACITY (16) cdb including unsupported service action
sg_readcap failed: Illegal request

[[sg_readcap /dev/sdc]]
READ CAPACITY (10) failed: Device not ready
sg_readcap failed: Device not ready

[[sg_readcap --16 /dev/sdc]]
bad field in READ CAPACITY (16) cdb including unsupported service action
sg_readcap failed: Illegal request

[[sg_readcap /dev/sdd]]
READ CAPACITY (10) failed: Device not ready
sg_readcap failed: Device not ready

[[sg_readcap --16 /dev/sdd]]
bad field in READ CAPACITY (16) cdb including unsupported service action
sg_readcap failed: Illegal request


> I don't seem to have permissions to unlock this BZ but I will look into whether someone else here can do that.
Thank you!


Sincerely,
Gabriel

Comment 8 Ewan D. Milne 2022-12-13 16:00:41 UTC
I am still looking at how we might be able to handle differently in the
upstream kernel the way your hardware seems to be working.  I take it that
if you ran

  sg_tur <device>

it would respond ready even though other commands do not.  This seems to
be why the messages are being generated.

I think it is possible to adjust the polling interval, if this would help
in the time being.  There is a kernel parameter in the block layer:

block.disk_events_dfl_poll_msecs

that defaults to 2000 (i.e. 2 seconds), perhaps if you increase it you
would see less logging.  The tradeoff is that if you insert a device,
it would take longer to be found via the polling.

The individual devices also have a "poll_msecs" attribute that can be
set via sysfs.  The default value is -1 which means to use the global
default above, but it can be set separately.  For example, on my system:

[root@rhel-storage-10 ~]# find /sys -name "*poll_msecs*"
/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/events_poll_msecs
/sys/devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0/1:0:0:0/block/sdb/events_poll_msecs
/sys/devices/pci0000:00/0000:00:1f.2/ata5/host4/target4:0:0/4:0:0:0/block/sr0/events_poll_msecs
/sys/module/block/parameters/events_dfl_poll_msecs
[root@rhel-storage-10 ~]# for i in `find /sys -name "*poll_msecs*"`
> do
> echo $i
> cat $i
> done
/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/events_poll_msecs
-1
/sys/devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0/1:0:0:0/block/sdb/events_poll_msecs
-1
/sys/devices/pci0000:00/0000:00:1f.2/ata5/host4/target4:0:0/4:0:0:0/block/sr0/events_poll_msecs
-1
/sys/module/block/parameters/events_dfl_poll_msecs
2000



Meanwhile I will continue to look into this but it may take a while as
we are entering our end-of-year shutdown.

I think I have set the BZ to be unlocked as you asked.

Comment 9 GV 2022-12-14 08:00:07 UTC
Hi!

(In reply to Ewan D. Milne from comment #8)
> I take it that if you ran
> 
>   sg_tur <device>
> 
> it would respond ready even though other commands do not.  This seems to
> be why the messages are being generated.
# for dev in /dev/sda /dev/sdb /dev/sdc /dev/sdd; do echo "[[sg_turs $dev]]"; sg_turs $dev; done
[[sg_turs /dev/sda]]
test unit ready: transport: Host_status=0x07 [DID_ERROR]
Driver_status=0x00 [DRIVER_OK]

[[sg_turs /dev/sdb]]
test unit ready: transport: Host_status=0x07 [DID_ERROR]
Driver_status=0x00 [DRIVER_OK]

[[sg_turs /dev/sdc]]
test unit ready: transport: Host_status=0x07 [DID_ERROR]
Driver_status=0x00 [DRIVER_OK]

[[sg_turs /dev/sdd]]
test unit ready: transport: Host_status=0x07 [DID_ERROR]
Driver_status=0x00 [DRIVER_OK]

> I think it is possible to adjust the polling interval, if this would help
> in the time being.
Yes, it would help a lot! Thank you.

Probably the fastest solution is to disconnect the device from MB. But it's a 4 hour drive to get to the location where PC is located. And are others issues then I get there... So, it's kind of inconvenient...

> Meanwhile I will continue to look into this but it may take a while as
> we are entering our end-of-year shutdown.
Take your time. Thank you.

> I think I have set the BZ to be unlocked as you asked.
Thank you.


Sincerely,
Gabriel

Comment 10 GV 2022-12-14 08:14:33 UTC
/var/log/messages when running sg_turs:

Dec 14 09:52:50 xxxxx kernel: sd 0:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
Dec 14 09:52:50 xxxxx kernel: sd 0:0:0:0: [sda] tag#0 CDB: Test Unit Ready 00 00 00 00 00 00
Dec 14 09:52:50 xxxxx kernel: sd 0:0:0:1: [sdb] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
Dec 14 09:52:50 xxxxx kernel: sd 0:0:0:1: [sdb] tag#0 CDB: Test Unit Ready 00 00 00 00 00 00
Dec 14 09:52:50 xxxxx kernel: sd 0:0:0:2: [sdc] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
Dec 14 09:52:50 xxxxx kernel: sd 0:0:0:2: [sdc] tag#0 CDB: Test Unit Ready 00 00 00 00 00 00
Dec 14 09:52:50 xxxxx kernel: sd 0:0:0:3: [sdd] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
Dec 14 09:52:50 xxxxx kernel: sd 0:0:0:3: [sdd] tag#0 CDB: Test Unit Ready 00 00 00 00 00 00

Comment 11 RHEL Program Management 2023-09-23 13:55:55 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 12 RHEL Program Management 2023-09-23 13:58:51 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.