Created attachment 1475016 [details] Journal output at time of hang of the system Description of the problem: We are using Fedora 27 on a system that contains an eMMC controller and eMMC flash device (it is actually an Intel N2807 CPU). We install a version of Fedora Workstation on the system and boot from the eMMC. A boot with dracut attempt to detect the /dev/mmcblk0 device and partitions to mount root file system. During the detection of the /dev/mmcblk0 device content, the system may hangs indefinitely stating ‘A start job is running for dev-mmcblk0p2.device’. The error message is visible on the display console output of the system. Version-Release number of selected component (if applicable): kernel-4.16.7-200.fc27.x86_64 dracut-046-8.git20180105.fc27.x86_64 How reproducible: The problem is visible in a sequence of tests over warm reboot loop. The system may hang once in a while, approximately each 30 to 80 cycles. The problem has been tested against two different hardware platforms and three different eMMC flash device. Steps to reproduce: 1- Install Fedora Workstation on eMMC flash device. 2- Create GNOME user startup script that execute a delay of 15 seconds and then request a reboot. 3- Reboot to start cycle. This sequence of tests has been validated to work properly on Fedora 23 Workstation on the same test systems and eMMC flash devices. Actual results: (time of failure) Attached snapshot show the complete picture. We were able to externally connect USB disk, mount the USB with mount /dev/sda /myusb, and redirect the journalctl output to a file. The interesting lines of journal content in the problematic situation Aug 09 17:03:45 gtr321a95 kernel: mmcblk0: mmc0:0001 eMMC 28.8 GiB Aug 09 17:03:45 gtr321a95 kernel: mmcblk0boot0: mmc0:0001 eMMC partition 1 4.00 MiB ... Aug 09 17:03:45 gtr321a95 kernel: mmcblk0boot1: mmc0:0001 eMMC partition 2 4.00 MiB Aug 09 17:03:45 gtr321a95 kernel: mmcblk0rpmb: mmc0:0001 eMMC partition 3 4.00 MiB, chardev (241:0) ... ... Aug 09 17:03:45 gtr321a95 kernel: mmcblk0: p1 p2 ... Aug 09 17:03:45 gtr321a95 systemd-udevd[269]: handling device node '/dev/mmcblk0rpmb', devnum=c241:0, mode=0600, uid=0, gid=0 ... Aug 09 17:03:45 gtr321a95 systemd-udevd[269]: preserve permissions /dev/mmcblk0rpmb, 020600, uid=0, gid=0 ... Aug 09 17:03:45 gtr321a95 systemd-udevd[269]: creating symlink '/dev/char/241:0' to '../mmcblk0rpmb' ... Aug 09 17:03:45 gtr321a95 systemd-udevd[269]: created empty file '/run/udev/data/c241:0' for '/devices/pci0000:00/0000:00:17.0/mmc_host/mmc0/mmc0:0001/mmcblk0rpmb' ... Aug 09 17:03:45 gtr321a95 systemd-udevd[265]: probe /dev/mmcblk0 raid offset=0 ... Aug 09 17:03:45 gtr321a95 systemd-udevd[265]: handling device node '/dev/mmcblk0', devnum=b179:0, mode=0660, uid=0, gid=6 Aug 09 17:03:45 gtr321a95 systemd-udevd[265]: set permissions /dev/mmcblk0, 060660, uid=0, gid=6 Aug 09 17:03:45 gtr321a95 systemd-udevd[265]: creating symlink '/dev/block/179:0' to '../mmcblk0' Aug 09 17:03:45 gtr321a95 systemd-udevd[265]: creating link '/dev/disk/by-id/mmc-eMMC_0x150500d2' to '/dev/mmcblk0' Aug 09 17:03:45 gtr321a95 systemd-udevd[265]: creating symlink '/dev/disk/by-id/mmc-eMMC_0x150500d2' to '../../mmcblk0' Aug 09 17:03:45 gtr321a95 systemd-udevd[265]: created db file '/run/udev/data/b179:0' for '/devices/pci0000:00/0000:00:17.0/mmc_host/mmc0/mmc0:0001/block/mmcblk0' Aug 09 17:03:45 gtr321a95 systemd-udevd[265]: adding watch on '/dev/mmcblk0' Aug 09 17:03:45 gtr321a95 systemd-udevd[265]: created db file '/run/udev/data/b179:0' for '/devices/pci0000:00/0000:00:17.0/mmc_host/mmc0/mmc0:0001/block/mmcblk0' ... Aug 09 17:03:45 gtr321a95 systemd-udevd[259]: probe /dev/mmcblk0p1 raid offset=0 ... Aug 09 17:03:45 gtr321a95 systemd-udevd[269]: probe /dev/mmcblk0boot1 raid offset=0 ... Aug 09 17:03:45 gtr321a95 systemd-udevd[267]: probe /dev/mmcblk0boot0 raid offset=0 ... Aug 09 17:03:45 gtr321a95 systemd-udevd[265]: probe /dev/mmcblk0p2 raid offset=0 ... Aug 09 17:04:46 gtr321a95 systemd-udevd[232]: seq 1560 '/devices/pci0000:00/0000:00:17.0/mmc_host/mmc0/mmc0:0001/block/mmcblk0/mmcblk0p2' is taking a long time Aug 09 17:04:46 gtr321a95 systemd-udevd[232]: seq 1562 '/devices/pci0000:00/0000:00:17.0/mmc_host/mmc0/mmc0:0001/block/mmcblk0/mmcblk0boot1' is taking a long time Aug 09 17:04:46 gtr321a95 systemd-udevd[232]: seq 1559 '/devices/pci0000:00/0000:00:17.0/mmc_host/mmc0/mmc0:0001/block/mmcblk0/mmcblk0p1' is taking a long time Aug 09 17:04:46 gtr321a95 systemd-udevd[232]: seq 1564 '/devices/pci0000:00/0000:00:17.0/mmc_host/mmc0/mmc0:0001/block/mmcblk0/mmcblk0boot0' is taking a long time ... Aug 09 17:06:46 gtr321a95 systemd-udevd[232]: seq 1559 '/devices/pci0000:00/0000:00:17.0/mmc_host/mmc0/mmc0:0001/block/mmcblk0/mmcblk0p1' killed Aug 09 17:06:46 gtr321a95 systemd-udevd[232]: seq 1564 '/devices/pci0000:00/0000:00:17.0/mmc_host/mmc0/mmc0:0001/block/mmcblk0/mmcblk0boot0' killed Aug 09 17:06:46 gtr321a95 systemd-udevd[232]: seq 1560 '/devices/pci0000:00/0000:00:17.0/mmc_host/mmc0/mmc0:0001/block/mmcblk0/mmcblk0p2' killed Aug 09 17:06:46 gtr321a95 systemd-udevd[232]: seq 1562 '/devices/pci0000:00/0000:00:17.0/mmc_host/mmc0/mmc0:0001/block/mmcblk0/mmcblk0boot1' killed ... Additional information: Attached list of all OS components and version - componentversion.txt Attached capture of the journalctl at time when the system hangs – fullerrorjornal2.log Attached capture of the dmesg at time when the system hangs – dmesgerror2.log Attached capture of the dev at time when the system hangs – deverror2.log Attached capture of the lsmod at time when the system hangs – lsmoderror2.log We added the rd.debug rd.udev.debug rd.timeout=120 kernel parameters to assist in debug. We added ForwardToConsole=yes in journald.conf to assist in debug. The fullerrorjournal2.txt also contains the SysRq Backtrace and SysRq State output when hanged. Comparing the journal content between a situation of error and one that works let use believe that the ‘probe /dev/mmcblk0p2’ does not complete. As a reference, a bootokjournal.log file is attached to see the working condition We are working in order to let the mmc_core loadable module to print debug statements on transaction to see if the hang is produced by the mmc driver stack.
Created attachment 1475017 [details] Output of DMSEG at time of system hang
Created attachment 1475018 [details] List of /dev at time of hang of the system
Created attachment 1475019 [details] List of modules loaded at time of system hang
looks like a kernel bug to me
One common thing from fullerrorjournal2.txt Udev seq number 1560 1562 1559 1564 Udev process number 265 269 259 267 Process name system-udevd system-udevd system-udevd system-udevd Priority 120 120 120 120 Stack hang point blkdev_close blkdev_close blkdev_close blkdev_close Log file statement A B C D A-> match to - Aug 09 17:04:46 gtr321a95 systemd-udevd[232]: seq 1560 '/devices/pci0000:00/0000:00:17.0/mmc_host/mmc0/mmc0:0001/block/mmcblk0/mmcblk0p2' is taking a long time B-> match to - Aug 09 17:04:46 gtr321a95 systemd-udevd[232]: seq 1562 '/devices/pci0000:00/0000:00:17.0/mmc_host/mmc0/mmc0:0001/block/mmcblk0/mmcblk0boot1' is taking a long time C-> match to - Aug 09 17:04:46 gtr321a95 systemd-udevd[232]: seq 1559 '/devices/pci0000:00/0000:00:17.0/mmc_host/mmc0/mmc0:0001/block/mmcblk0/mmcblk0p1' is taking a long time D-> match to - Aug 09 17:04:46 gtr321a95 systemd-udevd[232]: seq 1564 '/devices/pci0000:00/0000:00:17.0/mmc_host/mmc0/mmc0:0001/block/mmcblk0/mmcblk0boot0' is taking a long time I am attaching a new capture of the issue after having changed some dynamic debug statement in static printk in the mmc_core.c file. I have annotated the mmc_journalerror2.log with the MMC commands and all of the commands appear to be ok up to the moment that the system-udevd declares 4 operations that are taking a long time. Some 45 seconds after error declaration, we can see the probe that are progressing but it appears to be too late for the boot process to happen. The addition of the MMC printk statements change a little bit the behavior as originally, I could not see the probe happenning at all.
Created attachment 1477253 [details] Journal output at time of hang of the system - with MMC logs added
I have been able to validate a work around for this error situation. I have forced udevd to operate with only one child process (rd.udev.children_max=1). This has been tested to work correctly for a complete 4 days long run on 3 same systems that showed the issue. Before work around, each system showed the issue more or less 5 times a day.
We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 27 kernel bugs. Fedora 27 has now been rebased to 4.18.10-100.fc27. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 28 or Fedora 29, and are still experiencing this issue, please change the version to Fedora 28 or 29. If you experience different issues, please open a new bug report for those.
This message is a reminder that Fedora 27 is nearing its end of life. On 2018-Nov-30 Fedora will stop maintaining and issuing updates for Fedora 27. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '27'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 27 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 27 changed to end-of-life (EOL) status on 2018-11-30. Fedora 27 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.
Got it. In attempt to shutdown the regular request for info by Bugzilla. I added this comment.