Bug 1976944 - Hotplugged Flash drive very slow to be recognized
Summary: Hotplugged Flash drive very slow to be recognized
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: udisks2
Version: 34
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Tomáš Bžatek
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-28 15:48 UTC by Steven A. Falco
Modified: 2022-06-08 00:56 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-06-08 00:56:27 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
At 11:11:58 system is quiescent (820.71 KB, image/jpeg)
2021-06-28 15:48 UTC, Steven A. Falco
no flags Details
At 11:12:47 plugged in Flash drive; MBR is recognized and is assigned /dev/sdc (611.57 KB, image/jpeg)
2021-06-28 15:49 UTC, Steven A. Falco
no flags Details
At 11:13:35 partitions are now recognized as sdc1, sdc2, sdc3 (449.87 KB, image/jpeg)
2021-06-28 15:50 UTC, Steven A. Falco
no flags Details
At 11:14:09 device notifier pops up and offers to mount (626.07 KB, image/jpeg)
2021-06-28 15:50 UTC, Steven A. Falco
no flags Details
journal log of flash drive insertion (2.52 MB, text/plain)
2021-07-02 16:54 UTC, Steven A. Falco
no flags Details
Log of kernel and udev messages (346.83 KB, text/plain)
2021-07-05 19:46 UTC, Steven A. Falco
no flags Details
udevadm.log (338.00 KB, text/plain)
2021-07-07 18:15 UTC, Steven A. Falco
no flags Details
udisksctl.log (6.06 KB, text/plain)
2021-07-07 18:16 UTC, Steven A. Falco
no flags Details
journalctl.log (3.12 KB, text/plain)
2021-07-07 18:16 UTC, Steven A. Falco
no flags Details
messages.log (2.65 KB, text/plain)
2021-07-07 18:16 UTC, Steven A. Falco
no flags Details
udevadm-with-gui.log (304.98 KB, text/plain)
2021-07-07 19:37 UTC, Steven A. Falco
no flags Details
udisksctl-with-gui.log (6.09 KB, text/plain)
2021-07-07 19:38 UTC, Steven A. Falco
no flags Details
journal-with-gui.log (5.54 KB, text/plain)
2021-07-07 19:38 UTC, Steven A. Falco
no flags Details
messages-with-gui.log (5.49 KB, text/plain)
2021-07-07 19:38 UTC, Steven A. Falco
no flags Details
logs from latest experiment (51.48 KB, application/gzip)
2021-07-12 18:00 UTC, Steven A. Falco
no flags Details

Description Steven A. Falco 2021-06-28 15:48:51 UTC
Created attachment 1795474 [details]
At 11:11:58 system is quiescent

Description of problem: In the past, when I hotplugged a flash drive, the KDE "Disks and Devices" would pop up quickly and allow me to mount the drive.  Lately, this takes close to two minutes.  Therefore, this appears to be a regression.


Version-Release number of selected component (if applicable):


How reproducible: 90% - occasionally the "Disks and Devices" will pop up with the new device quickly, but it almost always takes over one minute to pop up.


Steps to Reproduce: Plug in a formatted flash drive

Actual results: "Disks and Devices" pops up after a minute or two


Expected results: "Disks and Devices" pops up within a few seconds


Additional info: I can use udisksctl to manually mount the flash stick prior to the "Disks and Devices" app recognizing the inserted Flash drive, so the kernel appears to be ready to proceed.

I'll attach some screen photos taken via cell phone.  I would have used a screen capture tool, but I couldn't get the "Disks and Devices" pop up to stay up while taking a screen shot.

There will be four photos uploaded:

11:11:58 - before_insert.jpg - system is quiescent

11:12:47 - mbr_recognized.jpg - plugged in Flash drive; MBR is recognized and is assigned /dev/sdc

11:13:35 - partitions_recognized.jpg - partitions are now recognized as sdc1, sdc2, sdc3

11:14:09 - ready_to_mount.jpg - device notifier pops up and offers to mount

Comment 1 Steven A. Falco 2021-06-28 15:49:43 UTC
Created attachment 1795475 [details]
At 11:12:47 plugged in Flash drive; MBR is recognized and is assigned /dev/sdc

Comment 2 Steven A. Falco 2021-06-28 15:50:25 UTC
Created attachment 1795476 [details]
At 11:13:35 partitions are now recognized as sdc1, sdc2, sdc3

Comment 3 Steven A. Falco 2021-06-28 15:50:58 UTC
Created attachment 1795477 [details]
At 11:14:09 device notifier pops up and offers to mount

Comment 4 Steven A. Falco 2021-06-28 20:07:21 UTC
I've done some more testing.  I can reproduce this problem on a laptop running F34 (KDE), and I can also reproduce it on my desktop (both Gnome and KDE).  In both cases the machines are very lightly loaded.

I did some time checks:

At 15:46:15.577089074 the device was recognized, but the partitions were not yet set up:

[junk@saf ~]$ ls -ltr --full-time /dev/sd*
brw-rw---- 1 root disk 8, 176 2021-06-28 15:46:15.577089074 -0400 /dev/sdl

Roughly a second later, the partition entries appeared:

[junk@saf ~]$ ls -ltr --full-time /dev/sd*
brw------- 1 root root 8, 179 2021-06-28 15:46:16.404073724 -0400 /dev/sdl3
brw------- 1 root root 8, 178 2021-06-28 15:46:16.404073724 -0400 /dev/sdl2
brw------- 1 root root 8, 177 2021-06-28 15:46:16.404073724 -0400 /dev/sdl1
brw-rw---- 1 root disk 8, 176 2021-06-28 15:46:16.562070792 -0400 /dev/sdl

Next, I waited for the Gnome "Files" file manager app to show the drive as "mountable", and I checked the time.

[junk@saf ~]$ date +%T.%N
15:47:21.044922800

In this example, it took a little over a minute for the file manager app to notice that the device was available for mounting.

Here is an excerpt from /var/log/messagesi.  The kernel took about 3 seconds to recognize the device:

Jun 28 15:46:13 saf kernel: usb 3-1.2: new high-speed USB device number 28 using xhci_hcd
Jun 28 15:46:13 saf kernel: usb 3-1.2: New USB device found, idVendor=0781, idProduct=5530, bcdDevice= 2.00
Jun 28 15:46:13 saf kernel: usb 3-1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
Jun 28 15:46:13 saf kernel: usb 3-1.2: Product: SanDisk Cruzer
Jun 28 15:46:13 saf kernel: usb 3-1.2: Manufacturer: SanDisk
Jun 28 15:46:13 saf kernel: usb 3-1.2: SerialNumber: 057293116F125545
Jun 28 15:46:13 saf kernel: usb-storage 3-1.2:1.0: USB Mass Storage device detected
Jun 28 15:46:13 saf kernel: scsi host12: usb-storage 3-1.2:1.0
Jun 28 15:46:13 saf mtp-probe[321938]: checking bus 3, device 28: "/sys/devices/pci0000:00/0000:00:14.0/usb3/3-1/3-1.2"
Jun 28 15:46:13 saf mtp-probe[321938]: bus: 3, device: 28 was not an MTP device
Jun 28 15:46:13 saf mtp-probe[321953]: checking bus 3, device 28: "/sys/devices/pci0000:00/0000:00:14.0/usb3/3-1/3-1.2"
Jun 28 15:46:13 saf mtp-probe[321953]: bus: 3, device: 28 was not an MTP device
Jun 28 15:46:14 saf kernel: scsi 12:0:0:0: Direct-Access     SanDisk  SanDisk Cruzer   8.02 PQ: 0 ANSI: 0 CCS
Jun 28 15:46:14 saf kernel: sd 12:0:0:0: Attached scsi generic sg13 type 0
Jun 28 15:46:14 saf kernel: sd 12:0:0:0: [sdl] Attached SCSI removable disk
Jun 28 15:46:16 saf kernel: sd 12:0:0:0: [sdl] 31780863 512-byte logical blocks: (16.3 GB/15.2 GiB)
Jun 28 15:46:16 saf kernel: sdl: detected capacity change from 0 to 31780863
Jun 28 15:46:16 saf kernel: sdl: sdl1 sdl2 sdl3
Jun 28 15:46:25 saf winbindd[5359]: [2021/06/28 15:46:25.447890,  0] ../../librpc/rpc/dcesrv_core.c:3007(dcesrv_call_dispatch_local)
Jun 28 15:46:25 saf winbindd[5359]:  dcesrv_call_dispatch_local: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jun 28 15:47:45 saf systemd[308624]: Starting Tracker metadata extractor...
Jun 28 15:47:45 saf systemd[308624]: Started Tracker metadata extractor.
Jun 28 15:47:51 saf systemd[308624]: vte-spawn-2b3b610d-b6d3-400e-9630-c9df2d2ca087.scope: Deactivated successfully.
Jun 28 15:47:51 saf systemd[308624]: gnome-terminal-server.service: Deactivated successfully.
Jun 28 15:47:51 saf systemd[308624]: gnome-terminal-server.service: Consumed 3.081s CPU time.
Jun 28 15:47:53 saf kernel: usb 3-1.2: USB disconnect, device number 28

Comment 5 Steven A. Falco 2021-07-02 16:54:11 UTC
Created attachment 1797235 [details]
journal log of flash drive insertion

Comment 6 Steven A. Falco 2021-07-02 17:00:19 UTC
I've attached a journal log.

At 11:16:34 kernel notices the flash drive has been inserted.

At 11:16:37 kernel notices the partition (/dev/sdm1), and the various kde components receive a notification from kf.solid.backends.udisks2 announcing /dev/sdm.  But there is no notification yet for /dev/sdm1.

At 11:17:51 the various kde components finally receive a notification from kf.solid.backends.udisks2 for the /dev/sdm1 partition.

Thus, there is a gap of over a minute between the /dev/sdm and /dev/sdm1 notifications.

During that time, there are close to 30,000 udevd messages reporting that one SEQNUM is blocked by another.  It seems like all this repetitive traffic might be slowing down the desired notification.

Comment 7 Petr Pisar 2021-07-05 08:47:05 UTC
For me it also takes 2 or 3 seconds to detect the partitions.
Your journal log confirms that kf.solid.backends.udisks2 received the partition list immediately.

But then there is "su" sessions 8 seconds later.
And 19 seconds later "Suppressed 483495 messages from systemd-udevd.service" followed with "systemd-udevd[170508]: sdm: /usr/lib/udev/rules.d/61-sddm-plasmawayland.rules:2 RUN '/usr/bin/rm -f /etc/sddm/hide-wayland-sessions'" and many systemd-udevd messages.

I guess something is wrong between udev and KDE Plasma toolkit.


I will reassign this bug report to udisks2 because hotplug component in Bugzilla is an historical unmonitored relict.

Comment 8 Steven A. Falco 2021-07-05 19:45:25 UTC
Thanks for redirecting the bug, Petr.

I'm not sure how you conclude that kf.solid.backends.udisks2 received the partition list immediately.  In https://bugzilla.redhat.com/attachment.cgi?id=1797235, we have the USB device detected at 11:16:34.  At 11:16:35 kf.solid.backends.udisks2 sees /dev/sdm, but it doesn't report the partitions at that time.  Or am I just misinterpreting the log?

The first mention of /dev/sdm1 is from the kernel at 11:16:37, just a few seconds after insertion, which is great.  However, the partitions are not reported by kf.solid.backends.udisks2 until 11:17:51, which is over a minute after the kernel mentions /dev/sdm1.

The "su" is just me logging in on a second tty to poke around.

I see lots of "/usr/bin/rm -f /etc/sddm/hide-wayland-sessions" messages.  The first one is at 11:16:34, and altogether there are 124 messages announcing that '/usr/bin/rm -f /etc/sddm/hide-wayland-sessions' succeeded.

I'll attach another file - plug.log - which shows the kernel producing thousands of "change" messages for .../block/sdm.  Udev responds to all of them, and I can't help wondering if all these duplicate "change" messages from the kernel are the root cause of the slow response.

Comment 9 Steven A. Falco 2021-07-05 19:46:29 UTC
Created attachment 1798320 [details]
Log of kernel and udev messages

Comment 10 Steven A. Falco 2021-07-05 19:49:51 UTC
Looking at https://bugzilla.redhat.com/attachment.cgi?id=1798320 it appears that the kernel is producing "change" messages every few milliseconds or less.  That does not seem right.

Comment 11 Petr Pisar 2021-07-07 08:23:05 UTC
Kernel detected the partitions here:

Jul 01 11:16:37  kernel: sd 12:0:0:0: [sdm] 31780863 512-byte logical blocks: (16.3 GB/15.2 GiB)
Jul 01 11:16:37  kernel: sdm: detected capacity change from 0 to 31780863
Jul 01 11:16:37  kernel:  sdm: sdm1

A subsequent line is:

Jul 01 11:16:37  org_kde_powerdevil[6447]: kf.solid.backends.udisks2: "/org/freedesktop/UDisks2/block_devices/sdm" has new interfaces: ("org.freedesktop.UDisks2.PartitionTable")

I think that means that udisk2 recognizes that the device has partitions.

You speak about:

Jul 01 11:17:51  org_kde_powerdevil[6447]: kf.solid.backends.udisks2: "/org/freedesktop/UDisks2/block_devices/sdm1" has new interfaces: ("org.freedesktop.UDisks2.Block", "org.freedesktop.UDisks2.Filesystem", "org.freedesktop.UDisks2.Partition")

which, I believe, means that udisks2 finished scanning a content of the partition. Yes, the delay of more than a minute between these two events is striking. And something is wrong here.

But I'm not expert in udisks2. I even do not use udisk2. A person assigned to this bug should be more knowledgable.

Comment 12 Tomáš Bžatek 2021-07-07 14:56:04 UTC
The `kf.solid.backends.udisks2` is a KDE layer on top of UDisks2, receiving async events over D-Bus. It might be interesting to see the output of `udisksctl monitor` and `udevadm monitor` and match the timestamps against the kernel log and the journal log.

There's really a lot going on during the first probing, nothing that shouldn't be finished in a couple of seconds however (unless there are I/O errors from the device which I don't see here).

I believe this is caused by a stuck udev task:

Jul 01 11:16:36  systemd-udevd[1009]: sdm: Device ready for processing (SEQNUM=22696, ACTION=change)
Jul 01 11:16:36  systemd-udevd[170508]: sdm: Processing device (SEQNUM=22696, ACTION=change)
Jul 01 11:16:36  systemd-udevd[170508]: sdm: Removing watch
...
Jul 01 11:17:04  systemd-journald[993]: Suppressed 483495 messages from systemd-udevd.service
Jul 01 11:17:04  systemd-udevd[170508]: sdm: /usr/lib/udev/rules.d/61-sddm-plasmawayland.rules:2 RUN '/usr/bin/rm -f /etc/sddm/hide-wayland-sessions'
Jul 01 11:17:04  systemd-udevd[170508]: sdm: /usr/lib/udev/rules.d/80-udisks.rules:88 Importing properties from results of 'udisks-part-id /dev/sdm'
Jul 01 11:17:04  systemd-udevd[170508]: sdm: Device processed (SEQNUM=23214, ACTION=change)

If PID 170508 was the actual udev worker process then something was going on for 28 seconds without any relevant message logged. Notice the ratelimiting kicking in, discarding 483495 messages. Given that the following "Device processed" message was for SEQNUM=23214 while we were expecting to see SEQNUM=22696, the rateliming threw away precious data for us.

Since the sequential number difference was 518 and in case each number corresponds to a single uevent (that's a really wild hypothesis however!), something is stuck in a loop. Might be a broken udev rule trigerring another uevent. The monitor dumps (see above) would reveal more.


Further tips for debugging: 
 1. log out from your desktop session, plug the USB stick in, collect logs and see if there were that many tasks queued and how long did it take.
 2. stop and mask the udisksd.service
 3. temporarily remove /usr/lib/udev/rules.d/80-udisks.rules

(might not be related to udisks2 at all, just some ideas...)

Comment 13 Steven A. Falco 2021-07-07 16:12:24 UTC
Thanks!  I'll collect the info you've requested, but I want to make sure I understand what you want.

You say to log out of the desktop session and collect logs.  For "udisksctl monitor" and "udevadm monitor" I think I would have to log in on a tty, say via ctl-alt-f3, and start the logging up.  Is that correct?  Also, do you want any options on the "udevadm monitor"?  I could give it -k and -u, for example.

To get the journal log I can use "journalctl -f", but I'm not sure what you want for the kernel log - do you mean the output of "dmesg" after the experiment is complete, or do you want something else?

Since I won't be logged in on the desktop session, I'm not sure how I'll know when things have settled down, so I'll just capture for two minutes and see what I get.

Comment 14 Steven A. Falco 2021-07-07 18:15:48 UTC
Created attachment 1799387 [details]
udevadm.log

Comment 15 Steven A. Falco 2021-07-07 18:16:08 UTC
Created attachment 1799388 [details]
udisksctl.log

Comment 16 Steven A. Falco 2021-07-07 18:16:28 UTC
Created attachment 1799389 [details]
journalctl.log

Comment 17 Steven A. Falco 2021-07-07 18:16:49 UTC
Created attachment 1799390 [details]
messages.log

Comment 18 Steven A. Falco 2021-07-07 18:19:05 UTC
I ran the experiment on my laptop, and attached four files: journalctl.log  messages.log  udevadm.log  udisksctl.log

I was logged out of the gui, and as root started up the four monitors in the background on a tty.

They still show the 1.5 minute delay.  I'll try some of the other suggestions next.

Comment 19 Steven A. Falco 2021-07-07 19:37:50 UTC
Created attachment 1799400 [details]
udevadm-with-gui.log

Comment 20 Steven A. Falco 2021-07-07 19:38:09 UTC
Created attachment 1799401 [details]
udisksctl-with-gui.log

Comment 21 Steven A. Falco 2021-07-07 19:38:32 UTC
Created attachment 1799402 [details]
journal-with-gui.log

Comment 22 Steven A. Falco 2021-07-07 19:38:52 UTC
Created attachment 1799405 [details]
messages-with-gui.log

Comment 23 Steven A. Falco 2021-07-07 19:43:59 UTC
I added another set of 4 logs, this time while logged in via the gui: journal-with-gui.log  messages-with-gui.log  udevadm-with-gui.log  udisksctl-with-gui.log

Interestingly, when all the logs are running, the KDE "Disks & Devices" never does see the drive, but the "dolphin" file manager does.  I guess that is a side-effect of the logging.

I considered disabling udisksd.service and/or removing its udev rule, but the laptop doesn't have the udisks package.  It just has udisks2.

Comment 24 Tomáš Bžatek 2021-07-12 15:44:55 UTC
Thanks for the logs! The `udevadm monitor` clearly shows the issue, too many "change" events queued while probing the partition.

KERNEL[540.231991] add      /devices/pci0000:00/0000:00:14.0/usb1/1-5/1-5:1.0/host4/target4:0:0/4:0:0:0/block/sdc/sdc1 (block)
UDEV  [631.949042] add      /devices/pci0000:00/0000:00:14.0/usb1/1-5/1-5:1.0/host4/target4:0:0/4:0:0:0/block/sdc/sdc1 (block)

Could you please run one more test? Either remove the udisks2 package (temporarily breaking the dependencies, e.g. `rpm -e udisks2 --nodeps`, installing the package back after the test via `dnf reinstall udisks2`) or stop and mask the systemd service (udisks2.service) and remove/rename the /usr/lib/udev/rules.d/80-udisks2.rules file. Then `udevadm control --reload` or reboot to get rid of the active udisks2 udev rules. Also verify that the `udisksd` process is not running (there's no `udisksd2`).

The point is to exclude UDisks2 from the processing chain completely. The running udisksd daemon triggers extra uevents at many places, e.g. on device hotplug.

You should not have the 'udisks' package installed, it's a UDisks gen. 1.

And repeat the test without the GUI please (as udisksd would not be running, KDE would complain). The point is to decrease the amount of the "change" uevents.



If that doesn't help, then it might be some other foreign udev rule causing the uevent multiplication. In that case just go through the /usr/lib/udev/rules.d and /etc/udev/rules.d (if any) rules and gradually disable/rename/remove the rules from highest numbers down to 61 (don't go below), reload udev via `udevadm control --reload` after any change and see if it helps. Note that this is a potentially dangerous operation that might render your system unbootable unless the files are restored back after the test. This would be outside of the UDisks2 scope however.

Comment 25 Steven A. Falco 2021-07-12 18:00:42 UTC
Created attachment 1800917 [details]
logs from latest experiment

Comment 26 Steven A. Falco 2021-07-12 18:22:33 UTC
I ran the experiment, and attached the logs as a tar file, since there were 24 files :-)  Please see all.logs.tar.gz

I included a file "legend" to describe what was changed in each set of logs.

The tests began with:

rpm -e udisks2 --nodeps
reboot

In each case the udisksctl.log file is empty, because there was no longer a udisksctl program on the system after removing the udisks2 package.

log.1 was prior to removing any udev rules.  The device was inserted at time 104, and settled down at time 124, which is better than before, but still not great.

log.2 was after removing all rules from /etc/udev/rules.d - I only had two rules there for a Brother printer.  I ran udevadm control --reload prior to inserting the flash drive.  The settling time was unchanged.

log.3 was after removing all 9x rules from /usr/lib/udev/rules.d and running udevadm control --reload.  Similarly, log.4 was after removing all 8x rules, log.5 removed 7x, and log.6 removed 61-69 rules.  If anything, the 6th experiment took a bit longer to settle.

As part of experiment 6, I tried one other thing - in addition to the flash drive used in all the earlier tests today, I also tried a different flash drive.  You can see it being inserted at timestamp 985.573074 and the partition was recognized at 986.830286, which is a delay only 1.25 seconds - that is excellent.  So, it seems that the settling time can somehow be related to the particular device, or perhaps the contents of the partition table.

Both flash drives are SanDisk Crusers, but could well be of different vintage.

What can cause the kernel to keep issuing "change" events?  Does the kernel entirely control that, or does user-space have some role in the kernel "change events being generated?

Comment 27 Ben Cotton 2022-05-12 16:36:55 UTC
This message is a reminder that Fedora Linux 34 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 34 on 2022-06-07.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '34'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 34 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 28 Ben Cotton 2022-06-08 00:56:27 UTC
Fedora Linux 34 entered end-of-life (EOL) status on 2022-06-07.

Fedora Linux 34 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.